A 6502 lisp compiler, sprite animation and the NES/Famicom

For our new project “what remains”, we’re regrouping the Naked on Pluto team to build a game about climate change. In the spirit of the medium being the message, we’re interested in long term thinking as well as recycling e-waste – so in keeping with a lot of our work, we are unraveling the threads of technology. The game will run on the NES/Famicom console, which was originally released by Nintendo in 1986. This hardware is extremely resilient, the solid state game cartridges still work surprisingly well today, compared to fragile CDROM or the world of online updates. Partly because of this, a flourishing scene of new players are now discovering them. I’m also interested that the older the machine you write software for, the more people have access to it via emulators (there are NES emulators for every mobile device, browser and operating system).

nes
Our NES with everdrive flashcart and comparatively tiny sdcard for storing ROMs.

These ideas combine a couple of previous projects for me – Betablocker DS also uses Nintendo hardware and although much more recent, the Gameboy DS has a similar philosophy and architecture to the NES. As much of the machines of this era, most NES games were written in pure assembly – I had a go at this for the Speccy a while back and while being fun in a mildly perverse way, it requires so much forward planning it doesn’t really encourage creative tweaking – or working collaboratively. In the meantime, for the weavingcodes project I’ve been dabbling with making odd lisp compilers, and found it very productive – so it makes sense to try one for a real processor this time, the 6502.

The NES console was one of the first to bring specialised processors from arcade machines into people’s homes. On older/cheaper 8 bit machines like the Speccy, you had to do everything on the single CPU, which meant most of the time was spent drawing pixels or dealing with sound. On the NES there is a “Picture Processing Unit” or PPU (a forerunner to the modern GPU), and an “Audio Processing Unit” or APU. As in modern consoles and PCs, these free the CPU up to orchestrate a game as a whole, only needing to sporadically update these co-processors when required.

You can’t write code that runs on the PPU or APU, but you can access their memory indirectly via registers and DMA. One of the nice things we can do if we’re writing a language for a compiling is building optimised calls that do specific jobs. One area I’ve been thinking about a lot is sprites – the 64 8×8 tiles that the PPU draws over the background tiles to provide you with animated characters.

spriteemu
Our sprite testing playpen using graphics plundered from Ys II: Ancient Ys Vanished.

The sprites are controlled by 256 bytes of memory that you copy (DMA) from the CPU to the PPU each frame. There are 4 bytes per sprite – 2 for x/y position, 1 for the pattern id and another for color and flipping control attributes. Most games made use of multiple sprites stuck together to get you bigger characters, in the example above there are 4 sprites for each 16×16 pixel character – so it’s handy to be able to group them together.

Heres an example of the the compiler code generation to produce the 6502 assembly needed to animate 4 sprites with one command by setting all their pattern IDs in one go – this manipulates memory which is later sent to the PPU.

(define (emit-animate-sprites-2x2! x)
  (append
   (emit-expr (list-ref x 2)) ;; compiles the pattern offset expression (leaves value in register a)
   (emit "pha")               ;; push the resulting pattern offset onto the stack
   (emit-expr (list-ref x 1)) ;; compile the sprite id expression (leaves value in a again)
   (emit "asl")               ;; *=2 (shift left)      
   (emit "asl")               ;; *=4 (shift left) - sprites are 4 bytes long, so = address
   (emit "tay")               ;; store offset calculation in y
   (emit "iny")               ;; +1 to get us to the pattern id byte position of the first sprite
   (emit "pla")               ;; pop the pattern memory offset back from the stack
   (emit "sta" "$200,y")      ;; sprite data is stored in $200, so add y to it for the first sprite
   (emit "adc" "#$01")        ;; add 1 to a to point to the next pattern location
   (emit "sta" "$204,y")      ;; write this to the next sprite (+ 4 bytes)
   (emit "adc" "#$0f")        ;; add 16 to a to point to the next pattern location
   (emit "sta" "$208,y")      ;; write to sprite 2 (+ 8 bytes)
   (emit "adc" "#$01")        ;; add 1 to a to point to the final pattern location
   (emit "sta" "$20c,y")))    ;; write to sprite 4 (+ 12 bytes)

The job of this function is to return a list of assembler instructions which are later converted into machine code for the NES. It compiles sub-expressions recursively where needed and (most importantly) maintains register state, so the interleaved bits of code don't interfere with each other and crash. (I learned about this stuff from Abdulaziz Ghuloum's amazing paper on compilers). The stack is important here, as the pha and pla push and pop information so we can do something completely different and come back to where we left off and continue.

The actual command is of the form:

(animate-sprites-2x2 sprite-id pattern-offset)

Where either arguments can be sub-expressions of their own, eg.:

(animate-sprites-2x2 sprite-id (+ anim-frame base-pattern))

This code uses a couple of assumptions for optimisation, firstly that sprite information is stored starting at address $200 (quite common on the NES as this is the start of user memory, and maps to a specific DMA address for sending to the PPU). Secondly there is an assumption how the pattern information in memory is laid out in a particular way. The 16 byte offset for the 3rd sprite is simply to allow the data to be easy to see in memory when using a paint package, as it means the sprites sit next to each other (along with their frames for animation) when editing the graphics:

spritepatternoffset

You can find the code and documentation for this programming language on gitlab.

A cryptoweaving experiment

Archaeologists can read a woven artifact created thousands of years ago, and from its structure determine the actions performed in the right order by the weaver who created it. They can then recreate the weaving, following in their ancestor’s ‘footsteps’ exactly.

This is possible because a woven artifact encodes time digitally, weft by weft. In most other forms of human endeavor, reverse engineering is still possible (e.g. in a car or a cake) but instructions are not encoded in the object’s fundamental structure – they need to be inferred by experiment or indirect means. Similarly, a text does not quite represent its writing process in a time encoded manner, but the end result. Interestingly, one possible self describing artifact could be a musical performance.

Looked at this way, any woven pattern can be seen as a digital record of movement performed by the weaver. We can create the pattern with a notation that describes this series of actions (a handweaver following a lift plan), or move in the other direction like the archaeologist, recording a given notation from an existing weave.

example
A weaving and its executable code equivalent.

One of the potentials of weaving I’m most interested in is being able to demonstrate fundamentals of software in threads – partly to make the physical nature of computation self evident, but also as a way of designing new ways of learning and understanding what computers are.

If we take the code required to make the pattern in the weaving above:

(twist 3 4 5 14 15 16)
(weave-forward 3)
(twist 4 15)
(weave-forward 1)
(twist 4 8 11 15)

(repeat 2
 (weave-back 4)
 (twist 8 11)
 (weave-forward 2)
 (twist 9 10)
 (weave-forward 2)
 (twist 9 10)
 (weave-back 2)
 (twist 9 10)
 (weave-back 2)
 (twist 8 11)
 (weave-forward 4))

We can “compile” it into a binary form which describes each instruction – the exact process for this is irrelevant, but here it is anyway – an 8 bit encoding, packing instructions and data together:

8bit instruction encoding:

Action  Direction  Count/Tablet ID (5 bit number)
0 1         2              3 4 5 6 7 

Action types
weave:    01 (1)
rotate:   10 (2)
twist:    11 (3)

Direction
forward: 0
backward: 1

If we compile the code notation above with this binary system, we can then read the binary as a series of tablet weaving card flip rotations (I’m using 20 tablets, so we can fit in two instructions per weft):

0 1 6 7 10 11 15
0 1 5 7 10 11 14 15 16
0 1 4 5 6 7 10 11 13
1 6 7 10 11 15
0 1 5 7 11 17
0 1 5 10 11 14
0 1 4 6 7 10 11 14 15 16 17
0 1 2 3 4 5 6 7 11 12 15
0 1 4 10 11 14 16
1 6 10 11 14 17
0 1 4 6 11 16
0 1 4 7 10 11 14 16
1 2 6 10 11 14 17
0 1 4 6 11 12 16
0 1 4 7 10 11 14 16
1 5

If we actually try weaving this (by advancing two turns forward/backward at a time) we get this mess:

close

The point is that (assuming we’ve made no mistakes) this weave represents *exactly* the same information as the pattern does – you could extract the program from the messy encoded weave, follow it and recreate the original pattern exactly.

The messy pattern represents both an executable, as well as a compressed form of the weave – taking up less space than the original pattern, but looking a lot worse. Possibly this is a clue too, as it contains a higher density of information – higher entropy, and therefore closer to randomness than the pattern.

3D warp weighted loom simulation

One of the main objectives of the weavecoding project is to provide a simulation of the warp weighted loom to use for demonstrations and exploration of ancient weaving techniques. Beyond the 4 shaft loom dyadic calculator we need to show the actual process of weaving to explain how the structures and patterns emerge. Weaving is very much a 3D process and these visualisations fail to show that well. It also needs to be able to be driven by the flotsam tangible livecoding hardware so running on a Raspberry Pi is another requirement.

Sketch and rendering

I’ve decided to make use of the Jellyfish procedural renderer to build something fast and flexible enough, while remaining cross platform. Jellyfish is a lisp-like language which compiles to a vector processing virtual machine written in C++, and approaches speeds of native code (with no garbage collection) while remaining very creative to work with, similar to fluxus livecoding. Previously I’ve only used it for small experiments rather than production like this, so I’ve needed to tighten up the compiler quite a bit. One of the areas which needed work (along with function arguments which were coming out backwards!) were the conditional statements, which I removed and replaced with a single if. Here is the compiler code at the lowest level which emits all the instructions required:

;; compiler code to output a list of instructions for (if pred true-expr false-expr)
(define (emit-if x)
  (let ((tblock (emit-expr (caddr x))) ;; compile true expression to a block
        (fblock (emit-expr (cadddr x)))) ;; compile false expression to block
    (append
     (emit-expr (cadr x)) ;; predicate - returns true or false
     (emit (vector jmz (+ (length tblock) 2) 0)) ;; if false skip true block
     tblock
     (emit (vector jmr (+ (length fblock) 1) 0)) ;; skip false block
     fblock)))

Then I can implement cond (which is a list of different options to check rather than one) as a purely syntactic form with a pre-processor function to create a series of nested ifs before compiling them:

;; preprocessor to take a cond list and convert to nested ifs 
(define (preprocess-cond-to-if x)
  (define (_ l)
    (cond
      ((null? l) 0)          ;; a cond without an else returns 0 
      ((eq? (caar l) 'else)  ;; check for else clause to do
          (cons 'do (pre-process (cdr (car l)))))
      (else (list 'if (pre-process (caar l)) ;; build an if
          (cons 'do (pre-process (cdr (car l))))
                  (_ (cdr l)))))) ;; keep going
  (_ (cdr x))) ;; ignores the 'cond'

Here’s an example of the if in use in the loom simulation at the ‘top’ level – it gets the current weaving draft value for the weft and warp thread position and uses it to move the weft polygons forward or back (in the z) a tiny amount to show up on the correct side of the warp.

(define calc-weft-z
    (lambda ()
        (set! weft-count (+ weft-count 1))
        (set! weft-z
              (if (> (read-draft) 0.5)
                  (vector 0 0 0.01)
                  (vector 0 0 -0.01)))))

One of the reasons I’m writing about all these levels of representation is that they feel close to the multiple representations present in weaving from draft to heddle layout, lift plan, fabric structure and resulting pattern.