Betablocker DS: Table feedback after Claudius Maximus

After getting some dates for gigs with Betablocker DS, I am spending some time looking into audio algorithms, and implementing them on the Gameboy DS. Last Thursday I spent some time at the TAI studio/bunker with Till Bovermann investigating these PD patches by Claudius Maximus:

The algorithm uses feedback to create sounds that take some time to play through a wide range of frequencies. It works by writing into the same buffer it’s playing, but at a different rate/position – the resulting ugliness suits the style of BBDS very much. As an aid to our understanding Till converted the algorithm to Supercollider, then over the next couple of days I managed to get it running with an inner loop of 9 instructions on the DS (could probably be optimised further, but I’m still a beginner):

@ ----------------------------------------------------
@ qrz_maximus: *dst, length, readpos, writepos [freq, *tab]
@ ----------------------------------------------------
        .global qrz_maximus
	.type   qrz_maximus, %function
        push    {r4,r5,r6,r7}       @ push the registers we need
        ldr     r4, [r13,#20]       @ load freq from stack into r4 
        ldr     r5, [r13,#24]       @ load *tab from stack into r5 
        ldr     r6, .tablength      @ load the tablen into r6
        ldrh    r7, [r5,r2]         @ load the sample into r7
        strh    r7, [r0], #2        @ write output: *dst=r7 dst++
        strh    r7, [r5,r3]         @ feedback into tab[writepos]=r7 
        add     r2, r2, r4          @ readpos+=freq
        and     r2, r2, r6          @ mask readpos into range
        add     r3, r3, #2          @ writepos++
        and     r3, r3, r6          @ mask writepos into range
        subs    r1, r1, #1          @ length--
        bne     .maximus_loop       @ repeat until length zero
        mov     r0, r2              @ return readpos
        pop     {r4,r5,r6,r7}   
        bx      lr                  @ exit
        .word   0x00003FF    

And here it is running on autopilot with a test program in Betablocker:

PS2 vu1 rendering

Some more work on upcycling second hand PS2’s into cheap fluxus machines. The next step is to embrace the vector units – strange little processors for doing things with points lines and colours extremely fast.

This is quite a daunting task for various reasons, not only can you run floating point and integer calculations in parallel, the VU’s also have instruction pipelining where a calculation can take a number of instructions to finish. The advantage of this is that you can interleave work that you are doing while you wait for things to complete, and make extremely fast programs – but it takes some time to get your head around.

Luckily it’s made a whole lot easier by a free software tool called OpenVCL. This lets you write fairly straight forward assembler and it at least makes sure it should run correctly by spacing it out for you – future versions may also start optimising the code automatically.

This is my first very basic renderer, which simply applies the world->screen transform to each vertex in an object. It’s job is to load data put in it’s memory by the CPU, process and send it to the “graphics synthesizer” to draw the gouraud shaded triangles. It’s not only much faster than doing the same job on the CPU, but it leaves it free for other processing (such as running a Scheme interpreter).

        .syntax new
        .name vu1_unlit        
        ; load the matrix row by row
        lq      world_screen_row0, 0(vi00)
        lq      world_screen_row1, 1(vi00)
        lq      world_screen_row2, 2(vi00)
        lq      world_screen_row3, 3(vi00)
        ; load the params and set the addresses for
        ; the giftag and vertex data
        lq      params, 4(vi00)
        iaddiu  giftag_addr, vi00, 5
        iaddiu  vertex_data, vi00, 6
        ; move the vertex count to an integer 
        ; register so we can loop over it
        mtir    vertex_index, params[x]
        ; load the colour (just increments vertex_data)
        lqi     colour, (vertex_data++)
        ; load vertex position
        lq      vertex, 0(vertex_data)
        ; apply the transformation
        mul     acc, world_screen_row0, vertex[x]
        madd    acc, world_screen_row1, vertex[y]
        madd    acc, world_screen_row2, vertex[z]
        madd    vertex, world_screen_row3, vertex[w]
        div     q, vf00[w], vertex[w] vertex, vertex, q
        ; convert to fixed point
        ftoi4   vertex, vertex
        ; overwrite the old vertex with the transformed one
        sqi     vertex, (vertex_data++)
        ; decrement and loop
        iaddi   vertex_index, vertex_index, -1
        ibne    vertex_index, vi00, vertex_loop
        ; send to gs
        xgkick  giftag_addr

Rainy sunday

A rainy Sunday, with only the dog for company, so in between walks I thought I’d try and learn some assembler. I’ve been unhappy with triggering samples with Betablocker DS (I prefer synthesis) and I’ve heard good things about ARM asm – so it seemed like a good opportunity to attempt a small, fast and dirty synth.

I found some really nice tutorials here and here. I’ve done a tiny bit of this sort of thing before with microcontrollers, but this is a more of a respectable flavour of assembler, on a decent RISC processor (which derives from the Acorn Archimedes and is now used on IPhones, Androids and Gameboys). Here is a white noise generator:

; white_noise(r0=*dst, r1=clock, r2=length, r3=freq)
        push    {r4,r5,r6}          ; need to restore registers we use
        mov     r4, r1              ; r4 is the rand state (start with clock)
        ldr     r5, .rnd_data       ; r5 is the multiplier value
        ldr     r6, .rnd_data+1     ; r6 is the addition value
        mla     r4, r5, r4, r6      ; the maths bit: r4 = (r6 + (r5 * r4))
        strh    r4, [r0], #2        ; *dst++ = clock; 
        subs    r2, r2, #1          ; length--;
        bne     .noise_loop         ; branch if length not zero
        pop     {r4,r5,r6}
        bx      lr                  ; return
        .word   0x000343FD          ; nicked from ansi c rand()
        .word   0x00269EC3          ; need to keep large numbers (>8bit) as data

This code is based on the ansi C rand() function that basically looks like this:

randnum = randnum * 214013 + 2531011;

Which we can do in a single instruction – mla (multiply with accumulate). Of course, gcc would presumably optimise much better code than mine from C++, but there is something more satisfying about doing it this way. I certainly prefer the sound – and over half the cpu usage remains unused with 5 voices and the interface running. The rest of the code is here.

PS2 homebrew #5

Getting stuff to work on PS2 wasn’t quite as easy as I probably made it sound in the last homebrew post. The problem with loading code from usb stick is that there is no way to debug anything, no remote debugging, no stdout – not even any way to render text unless you write your own program to do that.

The trick is to use the fact that we are rendering a CRT TV signal and that you can control what gets rendered in the overscan area (think 8bit loading screens). There is a register which directly sets the background colour of the scanline – this macro is all you need:

#define gs_p_bgcolor        0x120000e0    // Set CRTC background color

#define GS_SET_BGCOLOR(r,g,b) \
            *(volatile unsigned long *)gs_p_bgcolor =    \
        (unsigned long)((r) & 0x000000FF) <<  0 | \
        (unsigned long)((g) & 0x000000FF) <<  8 | \
        (unsigned long)((b) & 0x000000FF) << 16

Which you can use to set the background to green for example:


Its a good idea to change this at different points in your program. When you get a crash the border colour it’s frozen with will tell you what area it was last in, allowing you to track down errors.

There is also a nice side effect that this provides a visual profile of your code at the same time. Rendering is synced to the vertical blank – when the CRT laser shoots back to the top of the screen a new frame is started and you have a 50th of a second (PAL) to get everything done. In the screenshot below you can see how the frame time breaks down rendering 9 animated primitives – and why it might be a good idea to use some of these other processors:

PS2 homebrew #4

Getting things to render on the PS2 is a little more complicated than using OpenGL and it’s also a very different system to a PC. On the right you can see a block diagram of the Emotion Engine – it consists of the EE core, the CPU on the left and the GS – Graphics Synthesiser, on the right. In between are 2 other processors called Vector Units – very fast processors designed to do things to vectors – points, colours etc.

All the Graphics Synthesiser can do is rasterise 2D shapes with points given in screen space and do your texturing & gouraud shading for you. It can draw points, lines, triangles or quads in various configurations (similar to OpenGL). However all the 3D transformations and lighting calculations have to happen elsewhere – in one, or both of the Vector Units or the CPU.

So how do we get the GS to render something? Well you send it chunks of data, called GS Packets, that look a little like this:

GIF Tag 1
Primitive data
GIF Tag 2
Primitive data

The GIF Tags contain information on what sort of primitive it should draw and how the primitive data is laid out. The primitive data is the same as the primitive data in fluxus – vertex positions, colours, texture coordinates, texture data etc.

Once I had tinyscheme and the basic scenegraph working that the minimal fluxus build uses, I wrote a very simple renderer running on the EE core to apply the transformation matrices to the primitives (with similar push and pop to OpenGL). It doesn’t calculate lighting at the moment, so it’s just setting the vertex colours to the normal’s values for debugging. This is a literal photographic screen shot of my PS2 running exactly the same test fluxus script as the android was running:

PS2 homebrew #3

The next thing I wanted to do was see if I could compile the minimal android version of fluxus for the PS2. All the PS2SDK examples are written in C, and when I tried the C++ compiler at link time I got a bunch of these odd errors:

ps2-main.cpp: undefined reference to `__gxx_personality_v0′

It turns out the C++ compiler does not support exceptions so you need to add this line to the makefile:

EE_CXXFLAGS += -fno-exceptions

The other thing to get used to is one of the side effects of a machine with so many processors is that you need send lots of data around between them using DMA transfer, or direct memory access. DMA works on chunks of memory at a time, so your data needs to be aligned on particular byte boundaries. This sounds a lot more complicated than it is in practice (although it does lead to really obscure bugs if you get it wrong).

For instance, when making arrays on the heap you can do this:

struct my_struct
    int an_int;
    float my_array[8] __attribute__((__aligned__(16)));
    float a_float;

Which tells gcc to sort it out for you by forcing the pointer to my_array to fall on a 16 byte boundary.

When allocating from the heap the EE kernel provides you with a memalign version of malloc:

float *some_floats=(float*)memalign(128, sizeof(float) * 100);

The pointer some_floats will be aligned to a 128 byte boundary. This works as normal with free().

At this point, other than a few changes to tinyscheme for string functions that don’t exist on the PS2 libraries, most of the fluxus code was building. The only problem was the OpenGL ES code, as although the PS2 has some attempts at libraries that work a bit like OpenGL, the real point of playing with this machine is to write your own realtime renderer. a bit more on that next…

PS2 homebrew #2

Once you have a system for running unauthorised code, you can test it with some well used homebrew – I used uLaunchElf for this purpose. Using swapmagic you need to put the executable on your usb stick called SWAPMAGIC/SWAPMAGIC.ELF and it should boot automatically. Not all usb sticks work, I had to try a few before I found a suitable one.

The next task is installing the toolchain of compilers and related utilities for making your own executables. At this point I found that the ps2dev site was offline along with it’s svn repo, but I managed to find a backup here, which I am now mirroring here. When you have the svn repo recreated, follow the instructions in ps2dev/trunk/ps2toolchain/readme.txt to build and install, you just need to set up some paths to tell it where to install (I use ~/opt) and run a script. The script tries to update from the unresponsive official svn, so I also had to change ps2toolchain/scripts/ to get it to complete.

You get a load of sample code in ps2dev/ps2sdk/samples, all you need to do is run make to build them. Just as with the Nintendo DS and Android devices, the compiler is gcc – or rather “ee-gcc”. “ee” stands for the Emotion Engine, Sony’s name for the main MIPS processor.

The result of the make command in the exciting screenshot above is a file called “cube.elf”. ELF stands for Executable and Linkable Format, a unix standard (debugging variants used to be called DWARFs “Debugging With Attributed Record Formats”, I make no comment).

Copy this to your USB stick and you should see something like this:

PS2 homebrew #1

Reuse is the better form of recycling, and the machines that we all increasingly use are fabulous concoctions of various plastics, metals and ceramic compounds. It’s simply shortsighted to treat them as throwaway items. With this in mind, and lacking a computer I can use at gigs with a GPU for a while now, I thought I’d postpone the inevitable cost by doing a little obsolete tech archeology.

I think homebrew and it’s associated arts – as a creative subversion (rather than a route to piracy) is increasingly relevant, so I’m going to try and log my notes here. The information out there can be a bit challenging to understand, but I’d also like to show that it’s not that much of a dark art (compared to official development of, for example android which I’ve been exploring lately). I must admit an advantage with PS2, as some time ago I was doing this “the proper way” for things like this, so perhaps some useful information can leak out of long dead NDA’s too.

The first problem to overcome with homebrew is how to get the machine to run code it’s not supposed to run. You need special hardware to burn PS2 bootable disks, and the hardcore approach to getting around that is known as the swap trick. You block the various sensors that detect if a disk is present, make a copy of a game on a CDR, replacing the code with your own. Boot using the original, then find a place in the game where the software isn’t paying attention and carefully switch it with the CDR. Once you can run “unofficial code”, you can make use of one of the various discovered exploits to (for example) make a bootable memory card to launch your own software from then on. A simpler approach is to give your memory card to someone who has already done this and get them to do it for you.

I didn’t know anyone who had done this, and I am also not that hardcore, so I spent a little bit of cash on a bootable CD designed for this purpose (which is legal to produce, because of homebrew). Along with the CD you get a variety of small shaped pieces of plastic for circumventing the drive opening detectors. These are only really needed for piracy though AFAICT, as you can boot into homebrew on a normal USB stick, so I haven’t needed to bother with them.