NES/Famicom game programming discoveries

Working on a NES game you are treading in the footsteps of programmers from the 80's, and going back to modern development feels strangely bloated and inefficient in comparison. This is a log of some of the things I've encountered writing game code for the What Remains project.

Firstly, although it seemed like a lunatic idea at the start, I'm very happy we decided to build a compiler first so I could use a high level language (Lisp) and compile it to 6502 assembler that the NES/Famicom needs to run. This has made it so much faster to get stuff done, and for example to be able to optimise the compiler for what is needed as we go along, without having to change any game code. I'm thinking about how to bring this idea to less projects on less esoteric hardware.

These are the odds and ends that I've built into the game, and some of the reasons for decisions we've made.



As well as sprite drawing hardware, later games machines (such as the Amiga) had circuitry to automatically calculate collisions between sprites and also background tiles. There is a feature on the NES that does this for only the first sprite – but it turns out this is more for other esoteric needs, for normal collisions you need to do your own checks. For background collisions to prevent walking through walls, we have a list of bounding boxes (originally obtained by drawing regions over screenshots in gimp) for each of the two map screens we're building for the demo. We're checking a series of 'hotspots' on the character against these bounding boxes depending on which way you are moving in, shown in yellow below.


You can also check for collisions between two sprites – all the sprites we're using are 2×2 'metasprites' as shown above as these are really the right size for players and characters, as used in most games. These collisions at the moment just trigger animations or a 'talk mode' with one of the characters.

Entity system

With the addition of scrolling and also thinking about how to do game mechanics, it became apparent that we needed a system for dealing with characters, including the player – normally this is called an entity system. The first problem is that with multiple screens and scrolling, the character position in the world is different to that of the screen position which you need to give to the sprites. The NES screens are 256 pixels wide, and we are using two screens side by side, so the 'world position' of a character in the x axis can be from 0 to 512. The NES is an 8 bit console, and this range requires a 16 bit value to store. We also need to be able to turn off sprites when they are not visible due to scrolling, otherwise they pop back on the other side of the screen. The way we do this is to store a list of characters, or entities, each of which contain five bytes:

  1. Sprite ID (the first of the 4 sprites representing this entity)
  2. X world position (low byte)
  3. X world position (high byte) – ends up 0 or 1, equivalent to which screen the entity is on
  4. Y world position (only one byte needed as we are not scrolling vertically)
  5. Entity state (this is game dependant and can be used for 8 values such as "have I spoken to the player yet" or "am I holding a key")

We are already buffering the sprite data in "shadow memory" which we upload to the PPU at the start of every frame, the entity list provides a second level of indirection. Each time the game updates, it loops through every entity converting the world position to screen position by checking the current scroll value. It can also see if the sprite is too far away and needs clipping – the simplest way to do that seems to be simply setting the Y position off the bottom of the screen. In future we can use the state byte to store animation frames or game data – or quite easily add more of these as needed.



Obviously I was itching to write a custom sound driver to do all sorts of crazy stuff, but time is very limited and it's better to have something which is compatible with existing tracker software. So we gave in and used an 'off the shelf' open source sound driver. First I tried ggsound but this was a bit too demanding on memory and I couldn't get it playing back music without glitching. The second one I tried was famitone which worked quite quickly, the only problem I had was with the python export program! It only requires 3 bytes of zero page memory, which is quite simple to get the Lisp compiler to reserve – and a bit of RAM which you can easily configure.

New display primitives

I wrote before about how you need to use a display list to jam all the graphics data transfer into the first part of each frame, the precious time while the PPU is inactive and can be updated. As the project went on there were a couple more primitive drawing calls I needed to add to our graphics driver.

To display our RPG game world tiles, the easiest way is putting big lists describing an entire screen worth of tiles into PRG-ROM. These take up quite a bit of space, but we can use the display list to transfer them chunk by chunk to the PPU without needing any RAM, just the ROM address in the display list. I'm expecting this will be useful for the PETSCII section too.

I also realised that any PPU transfer ideally needs to be scheduled by the display list to avoid conflicts, so I also added a palette switch primitive too. We are switching palettes quite a lot, between each game mode (we currently have 3, intro, RPG and the PETSCII demo) but we're also using an entirely black palette to hide the screen refresh between these modes. So the last thing we do in the game mode switch process (which takes several frames) is set the correct palette to make everything appear at once.

Memory mapping

By default you get 8K of data to store your graphics – this means 256 sprites and 256 tiles only. In practice this is not enough to do much more than very simple games, and we needed more than this even for our demo. The fact that games are shipped as hardware cartridges meant that this could be expanded fairly simply – so most games use a memory mapper to switch between banks of graphics data – and also code can be switched too.

There are hundreds of variants of this technique employing different hardware, but based on Aymeric's cartridge reverse engineering we settled on MMC1 – one of the most common mappers.

In use this turns out to be quite simple – you can still only use 256 tiles/sprites at a time, but you can switch between lots of different sets, something else that happens between game modes. With MMC1 you just write repeatedly to an address outside of the normal range to talk to the mapper via serial communication – different address ranges control different things. 


More PPU coding on the NES/Famicom

After getting sprites working in Lisp on the NES for our “What Remains” project, the next thing to figure out properly is the background tiles. With the sprites you simply have a block of memory you edit at any time, then copy the whole lot to the PPU each frame in one go – the tiles involve a bit more head scratching.

The PPU graphics chip on the NES was designed in a time where all TVs were cathode ray tubes, using an electron gun to build a picture up on a phosphor screen. As this scans back and forth across the screen the PPU is busy altering its signal to draw pixel colours. If you try and alter its memory while its doing this you get glitches. However, its not drawing all the time – the electron gun needs to reset to the top of the screen each frame, so you get a window of time (2273 cycles) to make changes to the PPU memory before it starts drawing the next frame.

(Trying out thematic images and some overlapping text via the display list)

The problem is that 2273 cycles is not very much – not nearly enough to run your game in, and only enough to update approx 192 background tiles per frame as DMA is a slow operation. It took me a while to figure out this situation – as I was trying to transfer an entire screenful in one go, which sort of works but leaves the PPU in an odd state.

The solution is a familiar one to modern graphics hardware – a display list. This is a buffer you can add instructions to at any time in your game, which are then acted on only in the PPU access window. It separates the game code from the graphics DMA, and is very flexible. We might want to do different things here, so we can have a set of ‘primitives’ that run different operations. Given the per-frame restriction the buffer can also limit the bandwidth so the game can add a whole bunch of primitives in one go, which are then gradually dispatched – you can see this in a lot of NES games as it takes a few frames to do things like clear the screen.

There are two kinds of primitives in the what remains prototype game engine so far, the first sets the tile data directly:

(display-list-add-byte 1)
(display-list-add-byte 2)
(display-list-add-byte 3)
(display-list-end-packet prim-tile-data 0 0 3)

This overwrites the first 3 tiles at the top left of the screen to patterns 1,2 and 3. First you add bytes to a ‘packet’, which can have different meanings depending on the primitive used, then you end the packet with the primitive type constant, a high and low 16 bit address offset for the PPU destination, and a size. The reason this is done in reverse is that this is a stack, read from the ‘top’ which is a lot faster – we can use a position index that is incremented when writing and decremented when reading.

We could clear a portion of the screen this way with a loop (a built in language feature in co2 Lisp) to add a load of zeros to the stack:

(loop n 0 255 (display-list-add-byte 0))
(display-list-end-packet prim-tile-data 0 0 256)

But this is very wasteful, as it fills up a lot of space in the display list (all of it as it happens). To get around this, I added another primitive called ‘value’ which does a kind of run length encoding (RLE):

(display-list-add-byte 128) ;; length
(display-list-add-byte 0) ;; value
(display-list-end-packet prim-tile-value 0 0 2)

With just 2 bytes we can clear 128 tiles – about the maximum we can do in one frame.

Pixel Quipu

The graphviz visualisations we’ve been using for quipu have quite a few limitations, as they tend to make very large images, and there is limited control over how they are drawn. It would be better to be able to have more of an overview of the data, also rendering the knots in the right positions with the pendants being the right length.

Meet the pixelquipu!


These are drawn using a python script which reads the Harvard Quipu Database and renders quipu structure using the correct colours. The knots are shown as a single pixel attached to the pendant, with a colour code of red as single knot, green for a long knot and blue as a figure of eight knot (yellow is unknown or missing). The value of the knot sets the brightness of the pixel. The colour variations for the pendants are working, but no difference between twisted and alternating colours, also no twist direction is visualised yet.


Another advantage of this form of rendering is that we can draw data entropy within the quipu in order to provide a different view of how the data is structured, as a attempt to uncover hidden complexity. This is done hierarchically so a pendant’s entropy is that of its data plus all the sub-pendants, which seemed most appropriate given the non-linear form that the data takes.



We can now look at some quipus in more detail – what was the purpose of the red and grey striped pendants in the quipu below? They contain no knots, are they markers of some kind? This also seems to be a quipu where the knots do not follow the decimal coding pattern that we understand, they are mostly long knots of various values.


There also seems to be data stored in different kinds of structure in the same quipu – the collection of sub-pendants below in the left side presumably group data in a more hierarchical manner than the right side, which seems much more linear – and also a colour change emphasises this.


Read left to right, this long quipu below seems very much like you’d expect binary data to look – some kind of header information or preamble, followed by a repeating structure with local variation. The twelve groups of eight grey pendants seem redundant – were these meant to be filled in later? Did they represent something important without containing any knots? We will probably never know.


The original thinking of the pixelquipu was to attempt to fit all the quipus on a single page for viewing, as it represents them with the absolute minimum pixels required. Here are both pendant colour and entropy shown for all 247 quipu we have the data for:



Procedural weave rendering

We’ve been working on new approaches to 3D rendering ancient weaves, using Alex’s new behavioural language (which describes a weave from the perspective of a single thread) as the description for our modelling. This new approach allows us to build a fabric out of a single geometric shape, where warp and weft are part of the same thread.


This is mix of tabby and 2:2 twill, created by this code:

warp 12 24 ++ [TurnIn] ++ threadWeftBy'' Odd (rot 3) ([Over,Under]) 12 12 ++ threadWeftBy'' Odd (rot 3) ([Over,Over,Under,Under]) 12 12

I’m still learning this language, but more on that soon. This line produces an large list of instructions the weave renderer uses to build it’s model, turning the thread and shifting it up and down as it crosses itself.

In the video in his last post Alex describes using this to mix two separate weaving techniques together, which is one of our main reasons for developing this language – existing weave simulations cannot replicate the weaving technology of the ancient Greeks who for example, combined tablet and warp weighted weaving in the same fabric.

The second problem with weave simulations is shown by the following screenshot from a popular existing system:


Fabrics modelled in this way are considered to infinitely repeating sections with chopped off threads. There is no consideration for the selvedge at the edge of the fabric – which as we’ve shown in our past research is almost like a completely separate weave system of it’s own, and rarely considered by notation systems or modelling (and often left to the weaver to ‘livecode’). Here is a different view of the same fabric:


We can also now introduce other changes to the yarn structure, for example modifying the width using a sine wave.


I still have a few glitches to fix as you can see above, but here is a video of the development process from the first script, getting the polygons lined up, fixing the turning, adding over/under, reading Alex’s code and finally lining everything up.

3D warp weighted loom simulation

One of the main objectives of the weavecoding project is to provide a simulation of the warp weighted loom to use for demonstrations and exploration of ancient weaving techniques. Beyond the 4 shaft loom dyadic calculator we need to show the actual process of weaving to explain how the structures and patterns emerge. Weaving is very much a 3D process and these visualisations fail to show that well. It also needs to be able to be driven by the flotsam tangible livecoding hardware so running on a Raspberry Pi is another requirement.

Sketch and rendering

I’ve decided to make use of the Jellyfish procedural renderer to build something fast and flexible enough, while remaining cross platform. Jellyfish is a lisp-like language which compiles to a vector processing virtual machine written in C++, and approaches speeds of native code (with no garbage collection) while remaining very creative to work with, similar to fluxus livecoding. Previously I’ve only used it for small experiments rather than production like this, so I’ve needed to tighten up the compiler quite a bit. One of the areas which needed work (along with function arguments which were coming out backwards!) were the conditional statements, which I removed and replaced with a single if. Here is the compiler code at the lowest level which emits all the instructions required:

;; compiler code to output a list of instructions for (if pred true-expr false-expr)
(define (emit-if x)
  (let ((tblock (emit-expr (caddr x))) ;; compile true expression to a block
        (fblock (emit-expr (cadddr x)))) ;; compile false expression to block
     (emit-expr (cadr x)) ;; predicate - returns true or false
     (emit (vector jmz (+ (length tblock) 2) 0)) ;; if false skip true block
     (emit (vector jmr (+ (length fblock) 1) 0)) ;; skip false block

Then I can implement cond (which is a list of different options to check rather than one) as a purely syntactic form with a pre-processor function to create a series of nested ifs before compiling them:

;; preprocessor to take a cond list and convert to nested ifs 
(define (preprocess-cond-to-if x)
  (define (_ l)
      ((null? l) 0)          ;; a cond without an else returns 0 
      ((eq? (caar l) 'else)  ;; check for else clause to do
          (cons 'do (pre-process (cdr (car l)))))
      (else (list 'if (pre-process (caar l)) ;; build an if
          (cons 'do (pre-process (cdr (car l))))
                  (_ (cdr l)))))) ;; keep going
  (_ (cdr x))) ;; ignores the 'cond'

Here’s an example of the if in use in the loom simulation at the ‘top’ level – it gets the current weaving draft value for the weft and warp thread position and uses it to move the weft polygons forward or back (in the z) a tiny amount to show up on the correct side of the warp.

(define calc-weft-z
    (lambda ()
        (set! weft-count (+ weft-count 1))
        (set! weft-z
              (if (> (read-draft) 0.5)
                  (vector 0 0 0.01)
                  (vector 0 0 -0.01)))))

One of the reasons I’m writing about all these levels of representation is that they feel close to the multiple representations present in weaving from draft to heddle layout, lift plan, fabric structure and resulting pattern.

Evolving butterflies game released!


The Heliconius Butterfly Wing Pattern Evolver game is finished and ready for it’s debut as part of the Butterfly Evolution Exhibit at the Royal Society Summer Exhibition 2014. Read more about the scientific context on the researcher’s website, and click the image above to play the game.

The source code is here, it’s the first time I’ve used WebGL for a game, and it’s using the browser version of fluxus. It worked out pretty well, even to the extent that the researchers could edit the code themselves to add new explanation screens for the genetics. Like any production code it has niggles, here’s the function to render a butterfly:

(define (render-butterfly s)
   ;; set tex based on index
   (texture (list-ref test-tex (butterfly-texture s)))  
   ;; move to location
   (translate (butterfly-pos s))                        
   ;; point towards direction
   (maim (vnormalise (butterfly-dir s)) (vector 0 0 1)) 
   (rotate (vector 0 90 90))      ;; angle correctly
   (scale (vector 0.5 0.5 0.5))   ;; make smaller
   (draw-obj 4)                   ;; draw the body
   (with-state          ;; draw the wings in a new state
    (rotate (vector 180 0 0))                         
    (translate (vector 0 0 -0.5))  ;; position and angle right
    ;; calculate the wing angle based on speed
    (let ((a (- 90 (* (butterfly-flap-amount s)         
                      (+ 1 (sin (* (butterfly-speed s)  
                                   (+ (butterfly-fuzz s) 
       (rotate (vector 0 0 a))
       (draw-obj 3))              ;; draw left wing
       (scale (vector 1 -1 1))    ;; flip
       (rotate (vector 0 0 a))
       (draw-obj 3))))))          ;; draw right wing

There is only immediate mode rendering at the moment, so the transforms are not optimised and little things like draw-obj takes an id of a preloaded chunk of geometry, rather than specifying it by name need to be fixed. However it works well and the thing that was most successful was welding together the Nightjar Game Engine (HTML5 canvas) with fluxus (WebGL) and using them together. This works by having two canvas elements drawn over each other – all the 2D (text, effects and graphs) are drawn using canvas, and the butterflies are drawn in 3D with WebGL. The render loops are run simultaneously with some extra commands to get the canvas pixel coordinates of objects drawn in 3D space.

News from egglab

9,000 players, 20,000 games played and 400,000 tested egg patterns later we have over 30 generations complete on most of our artificial egg populations. The overall average egg difficulty has risen from about 0.4 seconds at the start to 2.5 seconds.

Thank you to everyone who contributed their time to playing the game! We spawned 4 brand new populations last week, and we’ll continue running the game for a while yet.

In the meantime, I’ve started working on ways to visualise the 500Mb of pattern generating code that we’ve evolved so far – here are all the eggs for one of the 20 populations, each row is a generation of 127 eggs starting at the top and ordered in fitness score from left to right:


This tree is perhaps more useful. The ancestor egg at the top is the first generation and you can see how mutations happen and successful variants get selected.


Egglab – pattern generation obsession

I’m putting the final pieces together for the release of the all new Project Nightjar game (due in the run up to Easter, of course!) and the automatic pattern generation has been a focus right up to this stage. The challenge I like most about citizen science is that along with all the ‘normal’ game design creative restrictions (is it fun? will it work on the browser?) you also have to satisfy the fairly whopping constraints of the science itself, determining which decisions impact on the observations you are making – and being sure that they will be robust to peer review in the context of publication – I never had to worry about that with PlayStation games :)



With this game, similar to the last two, we want to analyse people’s ability to recognise types of pattern in a background image. Crucially, this is a completely different perception process from recognition of a learned pattern (a ‘search image’), so we don’t want to be generating the same exact egg each time from the same description – we don’t want people to ‘learn’ them. This also makes sense in the natural context of course, in that an individual bird’s eggs will not be identical, due to there being many many additional non-deterministic processes happening that create the pattern.

The base images we are using are wrapped Perlin noise at different scales, and with different thresholds applied. These are then rotated and combined with each other and plain colours with the browser’s built in composite operations. Ideally we would generate the noise each time we need it with a different random seed to make them all unique, but this is way too slow for HTML5 Canvas to do (pixel processing in Javascript is still painful at this scale). To get around this we pre-render a set of variations of noise images, the genetic program picks one of four scales, and one of two thresholds (and one without threshold) and we randomly pick a new variation of this each time we render the egg. The image at the top shows the variation that happens across 6 example programs. Below are some of the noise images we’re using: