SP1 Multiscroll: Performance Baseline

We can’t optimize what we can’t measure, so my first step at multiscrolling is to do some Proof of Concept routines and establish a code harness where I can run some accounting in the background, and run any routine I want while measuring its performance. Of course this harness is also a testbed to debug the scrolling routines during its development.

As in my previous series on SP1 vertical scrolling examples, my performance measure is the Frame Per Second (FPS).

Also needed for measuring real performance is to have some way of running a “walk” over a given map by using scrolling, so I have also developed a scroll map system that decides what what to draw in the hidden borders to get the effect of a scrolling viewport over a bigger map. But this mapping system wil be described in detail in a future post.

Back to multidirectional scroll, I need to write specific routines for doing it in all 4 directions (and not just one scroll-down routine like for the vertical scroller series). This is a challenge, because although vertical scrolling is relatively simple, horizontal scrolling needs to be done bit by bit and even though there are specialized Z80 instructions for doing that, I found some obstacles in my way.

The other most important thing is the memory layout of the virtual framebuffer. In this regard, we are completely conditioned by the SP1 tile layout, since for SP1 to update the screen correctly, it expects the 8 bytes associated to a screen cell position, located contiguously in memory. This restriction heavily favours a vertical memory layout by columns, like the one we used with the vertical scrolling series.

One important problem that we’ll see is the wildly different performance between vertical and horizontal scrolling. We will need to sort this out and get both performances approximately on par, so that we can have a screen that moves at similar speed in all directions (i.e. an acceptable gameplay).

The speed difference stems from the different ways the scroll is done:

In both cases, one scroll run of the whole framebuffer is roughly proportional to the dimensions of the scrolling area, WIDTH x LINES (WIDTH measured in cells and LINES measured in pixels). But while the inner loop for vertical scroll is 16 T-states (LDI/LDD), for horizontal scroll it is 15+11+8+10=44 T-states (nearly 3 times bigger!). This is the main reason for the very different performance in each direction.

In the case of 4-pixel scrolling we have also a specialized instruction sequence: RRD (HL), which rotates 4 bits through the Accumulator. This instruction does not need the Carry flag, so we can do away with the PUSH/POP in the internal loop, and so RRD (HL) + ADD HL,DE only takes 18+8=24 T-states, which is quite near the 16 T-states for vertical scroll.

In the case of 8-pixel scroll, both horizontal and vertical inner loops can be expressed as LDI/LDD, and so the performance for both directions is identical. But 8-pixel scrolling might not be adequate for all games, though.

This currently makes 4-pixel scroll our best bet for multiscroll games… but we’ll see if we can further optimize the other routines. 4-pixel scroll can be OK for some games, but smooth scrolling is… well, The Right Thing, so let’s see.

These are the current FPS measurements in all scrolling directions, for a 16x16-cell scrolling viewport and for different pixel increments:

Pixels \ Direction U D L R UL UR DL DR
1-pixel 16 16 12 12 10 10 10 10
2-pixel 16 16 7 7 6 6 6 6
4-pixel 16 16 12 12 11 11 11 11
8-pixel 15 15 15 15 11 11 11 11

The additional drop in performance for diagonal movements is due to scrolling in both directions on the same move, e.g. scroll in UP-LEFT direction = scroll UP + scroll LEFT in sequence.

The code used for this measurements and performance baselines is in directory src/sp1-multi-map.

The animated demos and TAP files are ones provided in my previous post.