After getting sprites working in Lisp on the NES for our “What Remains” project, the next thing to figure out properly is the background tiles. With the sprites you simply have a block of memory you edit at any time, then copy the whole lot to the PPU each frame in one go – the tiles involve a bit more head scratching.
The PPU graphics chip on the NES was designed in a time where all TVs were cathode ray tubes, using an electron gun to build a picture up on a phosphor screen. As this scans back and forth across the screen the PPU is busy altering its signal to draw pixel colours. If you try and alter its memory while its doing this you get glitches. However, its not drawing all the time – the electron gun needs to reset to the top of the screen each frame, so you get a window of time (2273 cycles) to make changes to the PPU memory before it starts drawing the next frame.
(Trying out thematic images and some overlapping text via the display list)
The problem is that 2273 cycles is not very much – not nearly enough to run your game in, and only enough to update approx 192 background tiles per frame as DMA is a slow operation. It took me a while to figure out this situation – as I was trying to transfer an entire screenful in one go, which sort of works but leaves the PPU in an odd state.
The solution is a familiar one to modern graphics hardware – a display list. This is a buffer you can add instructions to at any time in your game, which are then acted on only in the PPU access window. It separates the game code from the graphics DMA, and is very flexible. We might want to do different things here, so we can have a set of ‘primitives’ that run different operations. Given the per-frame restriction the buffer can also limit the bandwidth so the game can add a whole bunch of primitives in one go, which are then gradually dispatched – you can see this in a lot of NES games as it takes a few frames to do things like clear the screen.
There are two kinds of primitives in the what remains prototype game engine so far, the first sets the tile data directly:
(display-list-add-byte 1)
(display-list-add-byte 2)
(display-list-add-byte 3)
(display-list-end-packet prim-tile-data 0 0 3)
This overwrites the first 3 tiles at the top left of the screen to patterns 1,2 and 3. First you add bytes to a ‘packet’, which can have different meanings depending on the primitive used, then you end the packet with the primitive type constant, a high and low 16 bit address offset for the PPU destination, and a size. The reason this is done in reverse is that this is a stack, read from the ‘top’ which is a lot faster – we can use a position index that is incremented when writing and decremented when reading.
We could clear a portion of the screen this way with a loop (a built in language feature in co2 Lisp) to add a load of zeros to the stack:
(loop n 0 255 (display-list-add-byte 0))
(display-list-end-packet prim-tile-data 0 0 256)
But this is very wasteful, as it fills up a lot of space in the display list (all of it as it happens). To get around this, I added another primitive called ‘value’ which does a kind of run length encoding (RLE):
(display-list-add-byte 128) ;; length
(display-list-add-byte 0) ;; value
(display-list-end-packet prim-tile-value 0 0 2)
With just 2 bytes we can clear 128 tiles – about the maximum we can do in one frame.