More PPU coding on the NES/Famicom

After getting sprites working in Lisp on the NES for our “What Remains” project, the next thing to figure out properly is the background tiles. With the sprites you simply have a block of memory you edit at any time, then copy the whole lot to the PPU each frame in one go – the tiles involve a bit more head scratching.

The PPU graphics chip on the NES was designed in a time where all TVs were cathode ray tubes, using an electron gun to build a picture up on a phosphor screen. As this scans back and forth across the screen the PPU is busy altering its signal to draw pixel colours. If you try and alter its memory while its doing this you get glitches. However, its not drawing all the time – the electron gun needs to reset to the top of the screen each frame, so you get a window of time (2273 cycles) to make changes to the PPU memory before it starts drawing the next frame.

(Trying out thematic images and some overlapping text via the display list)

The problem is that 2273 cycles is not very much – not nearly enough to run your game in, and only enough to update approx 192 background tiles per frame as DMA is a slow operation. It took me a while to figure out this situation – as I was trying to transfer an entire screenful in one go, which sort of works but leaves the PPU in an odd state.

The solution is a familiar one to modern graphics hardware – a display list. This is a buffer you can add instructions to at any time in your game, which are then acted on only in the PPU access window. It separates the game code from the graphics DMA, and is very flexible. We might want to do different things here, so we can have a set of ‘primitives’ that run different operations. Given the per-frame restriction the buffer can also limit the bandwidth so the game can add a whole bunch of primitives in one go, which are then gradually dispatched – you can see this in a lot of NES games as it takes a few frames to do things like clear the screen.

There are two kinds of primitives in the what remains prototype game engine so far, the first sets the tile data directly:

(display-list-add-byte 1)
(display-list-add-byte 2)
(display-list-add-byte 3)
(display-list-end-packet prim-tile-data 0 0 3)

This overwrites the first 3 tiles at the top left of the screen to patterns 1,2 and 3. First you add bytes to a ‘packet’, which can have different meanings depending on the primitive used, then you end the packet with the primitive type constant, a high and low 16 bit address offset for the PPU destination, and a size. The reason this is done in reverse is that this is a stack, read from the ‘top’ which is a lot faster – we can use a position index that is incremented when writing and decremented when reading.

We could clear a portion of the screen this way with a loop (a built in language feature in co2 Lisp) to add a load of zeros to the stack:

(loop n 0 255 (display-list-add-byte 0))
(display-list-end-packet prim-tile-data 0 0 256)

But this is very wasteful, as it fills up a lot of space in the display list (all of it as it happens). To get around this, I added another primitive called ‘value’ which does a kind of run length encoding (RLE):

(display-list-add-byte 128) ;; length
(display-list-add-byte 0) ;; value
(display-list-end-packet prim-tile-value 0 0 2)

With just 2 bytes we can clear 128 tiles – about the maximum we can do in one frame.

Ouya development experiments

The Ouya is a tiny game console which is designed for promoting indy games rather than traditional high budget productions. It’s cheap compared to standard games hardware, and all the games are free to play at least in demo form. It’s very easy to start making games with as it’s based on Android – you can just plug in a USB cable and treat it just the same as any other Android device. You also don’t need to sign anything to start building stuff – it’s just a case of adding one line to your AndroidManifest.xml to tell the Ouya that the program is a game:

 <category android:name="tv.ouya.intent.category.GAME"/>

and adding a 732×412 icon in “res/drawable-xhdpi/ouya_icon.png” so it shows up on the menu.


There is a lot to like about the Ouya’s philosophy, so I tried out some graphics experiments with it to get better acquainted:

This program was made using Jellyfish, part of my increasingly odd rendering stack which started out as a port of Fluxus to PS2. It’s a type of microcode written inside TinyScheme and running in a separate virtual machine that makes it possible to process a lot of geometry without garbage collection overheads. At some point I might write a compiler for this, but writing the code longhand at the moment means I can tweak the instruction set and get a better understanding of how to use it. Here is the microcode for the ribbons above, they run at 20,000 cycles per frame each (so about 1.2MHz):

;; register section
   8 20000 0 ;; control (pc, cycles, stack)
   mdl-size prim-tristrip 1 ;; graphics (size, primtype, renderhints)
   0 0 0 ;; pos
   0 0 0 ;; sensor addr

   ;; program data section
   mdl-start 0 0     ;; 4 address of current vertex
   mdl-start 0 0     ;; 5 address of accum vertex (loop)
   0 0 0             ;; 6 influence
   0 0 0             ;; temp

   ;; code section 
   ;; add up differences with every other vertex
   ldi  4  0         ;; load current vertex
   ldi  5  0         ;; load accum vertex
   sub  0  0         ;; get the difference
   lda  6  0         ;; load the accumulation
   add  0  0         ;; accumulate
   nrm  0  0         ;; normalise
   sta  6  0         ;; store accumulation
   add.x 5 1         ;; inc accum address

   ;; accumulation iteration 
   lda  5  0         ;; load accum address
   ldl  mdl-end 0    ;; load end address
   jlt  2  0         ;; exit if greater than model end (relative address)
   jmp  8  0         ;; return to accum-loop

   ;; end accum loop
   ;; push away from other verts
   lda  6  0         ;; load accum
   ldl -0.1 0        ;; reverse & make smaller
   mul 0 0

   ;; attract to next vertex
   ldi 4 0           ;; load current
   ldi 4 1           ;; load next
   sub 0 0           ;; get the difference
   ldl 0.4 0
   mul 0 0           ;; make smaller
   add 0 0           ;; add to accum result

   ;; do the animation
   ldi 4 0           ;; load current vertex
   add 0 0           ;; add the accumulation
   sti 4 0           ;; write to model data

   ;; reset the outer loop
   ldl 0 0           ;; load zero
   sta 6 0           ;; reset accum
   ldl mdl-start 0   
   sta 5 0           ;; reset accum address

   add.x 4 1         ;; inc vertex address
   lda 4  0          ;; load vertex address
   ldl mdl-end 0     ;; load end address
   jlt 2  0          ;; if greater than model (relative address)
   jmp 8  0          ;; do next vertex

   ;; end: reset current vertex back to start
   ldl mdl-start 0
   sta 4 0

   ;; waggle around the last point a bit
   lda mdl-end 0     ;; load vertex pos
   rnd 0 0           ;; load random vert
   ldl 0.5 0         
   mul 0 0           ;; make it smaller
   add 0 0           ;; add to vertex pos
   sta mdl-end 0     ;; write to model

   jmp 8 0

Genetic programming egg patterns in HTML5 canvas

Part of the ‘Project Nightjar’ camouflage work I’m doing for the Sensory Ecology group at Exeter University is to design citizen science games we can build to do some research. One plan is to create lots of patterns in the browser that we can run perceptual models on for different predator animals, and use an online game to compare them with human perception.

The problem is that using per-pixel processing in order to generate the variety of patterns we need is hard to do quickly in Javascript, so I’m looking at using genetic programming to build image processing trees using the HTML5 canvas image composition modes. You can see it in action here, and the source is here. Here are some example eggs built from very simple test images blended together, these are just random programs, but a couple of them are pretty good:


A nice aspect of this is that it’s easy to artistically control the patterns by changing the starting images, for example much more naturalistic patterns would result from noisier, less geometric base images and more earthy colours. This way we can have different ‘themes’ for levels in a game, for example.

I’m using the scheme compiler I wrote for planet fluxus to do this, and building trees that look like this:

    ("terminal" (124 57 0 1) "stripe-6")
    ("terminal" (42 23 0 1) "dots-6"))
    ("terminal" (36 47 0 1) "stripe-7")
    ("terminal" (8 90 1.5705 1) "red")))
  ("terminal" (108 69 0 1) "green"))

Ops are the blend mode operations, and terminals are images, which include translation and scale (not currently used). The egg trees get drawn with the function below, which shows the curious hybrid mix of HTML5 canvas and Scheme I’m using these days (and some people may find offensive 🙂 Next up is to actually do the genetic programming part, so mutating and doing crossover on the trees.

(define (draw-egg ctx x y program)
  (if (eq? (program-type program) "terminal")
        (set! ctx.fillStyle
               (find-image (terminal-image program) 
                           image-lib) "repeat"))

        ;; centre the rotation
        (ctx.translate 64 64)
            (transform-rotate (terminal-transform program)))
        (ctx.translate -64 -64)

        ;; make the pattern translate by moving, 
        ;; drawing then moving back
            (transform-x (terminal-transform program))
             (transform-y (terminal-transform program)))

            (- 0 (transform-x (terminal-transform program)))
            (- 0 (transform-y (terminal-transform program)))
            (* 127 2) (* 127 2))

            (- 0 (transform-x (terminal-transform program)))
            (- 0 (transform-y (terminal-transform program)))))
        ;; slightly overzealous context saving/restoring
        ;; set the composite operation
        (set! ctx.globalCompositeOperation (operator-type program))
        (draw-egg ctx x y (operator-operand-a program))
        (draw-egg ctx x y (operator-operand-b program))

Fast HTML5 sprite rendering

After quite a lot of experimentation with HTML5 canvas, I’ve figured out a way to use it with the kind of big isometric game worlds used for Germination X which are built from hundreds of overlapping sprites. There are lots of good resources out there on low level optimisations, but I needed to rethink my general approach in order to get some of these working.

It was quite obvious from the start that the simple clear screen & redraw everything way was going to be far too slow. Luckily HTML5 canvas gives us quite a lot of options for building less naive strategies.

A debug view of the game with 10 frames of changes shown with two plant spirits and one butterfly moving around.

The secret is only drawing the changes for each frame (called partial re-rendering in the link above). To do this we can calculate sprites which have changed and the ones they overlap with. The complication is maintaining the draw order and using clipping to keep the depth correct without needing to redraw everything else too.

In the game update we need to tag all the sprites which change position, rotation, scale, bitmap image, transparency etc.

Then in the render loop we build a list of all sprites that need redrawing, along with a list of bounding boxes for each overlapping sprite of the changed sprites that touch them. There may be more than one bounding box as a single sprite may need to be redrawn for multiple other changed sprites.

For each changed sprite:
    Get the bounding box for the changed sprite
    For each sprite which overlaps with this bounding box: 
        If overlapping sprite has already been stored:
            Add the bounding box to overlapping sprite's list 
            Store overlapping sprite and current bounding box.
    Add the changed sprite to the list.

Assuming the sprites have been sorted into depth order, we now draw them using the list we have made (we would only need to loop over the redraw list if we built it in depth sorted order).

For each sprite:
    If it's in the redraw list:
        If it's not one of the originally changed sprites:
            Set a clipping rect for each bounding box.
        Draw the sprite.
        Turn off the clipping, if it was used.

With complex scenes and multiple moving objects, this algorithm means we only need to redraw a small percentage of the total sprites visible – and we start to approach Flash in terms of performance for games (I suspect that flash is doing something similar to this under the hood). The code is here, currently written in HaXE, but will probably end up being ported to Javascript.