Viruscraft: Genetic model connected to a tree visualisation

The genetic model we were working on previously has now been ported into a browser compatible form and connected to a new tree visualisation that displays the species that emerge as the host population adapts to a virus infection. It’s still a prototype with rough edges, but have a play with it here, some example pics:

tree7

This is one of the earlier attempts, which I like the look of, but later cleaned up versions are a bit clearer to read what is going on.

tree17

tree18

Firstly the genetic part is working in that the population evolves to get gradually better at coping with the virus infection (the fitness score increases) – it takes a bit too long at the moment, but it’s great to be able to see this working in realtime as it happens with new species branching off older ones.

One early observation is that this has the potential to show why diversity is beneficial. If you modify the virus (fitness function) at a point when there are lots of different species present, the chances are that a few of them will be resilient enough to the infection to expand into the new environmental niche, and things eventually continue as before. If you alter the virus when there is only a single really well evolved species that is a bit too good at coping with the existing virus – the chances are that you will cause the population to go extinct as it won’t be able to adapt. This is analogous to the situation with bananas: “To carry on growing the same genetic banana is stupid”.

A big chunk of the work here was actually spent optimising the code. It’s pretty amazing how developed browsers are for development – I’ve been using the profiler in the chromium browser to locate slowness and keep the frame rate as near 60 frames per second as possible. I just noticed taking the screenshot for this post the slowest single part seems to be the debug text rendering, strangely enough.

prof

Viruscraft: building a ‘reasonably accurate’ genetic game world simulation

The concept for the viruscraft game is to have a realtime genetic model or simulation of the host evolution which is adapting to the properties of a virus you are building (either on screen or via a tangible interface as part of an exhibit). This model needs to be realistic, but only up to a point – it can be more of a caricature of biology than a research model would need to be, as our intentions are educational rather than biological research.

Using our previous species prototype as a starting point, we have a network of connected locations that can be inhabited by organisms. These organisms can jump to neighbouring locations and be infected by others in the same place at the same time. Now we need to figure out how different species of these organisms could emerge over time that evolve immunity to a virus – so we can build up a family tree (phylogeny) similar to the ones we created for the egglab game but that is responding to the viruses that you create in realtime as you play. The evolution itself also has to happen fast enough that you can see effects of your actions ‘quickly enough’, but we’ll worry more about that later.

For a job like this we need to move back from fancy visualisations and graphics and try to get some fundamental aspects working, using standard tools like graphviz to understand what is going on to save time.

The first thing to do is to add a fixed length genetic string to each individual organism, this is currently 40 elements long and is made from biologically based A,T,C and G nucleotide symbols. We chose these so we can use biological analysis tools to test the system as we go along just like any other genetic process (more on that below). The organisms can also reproduce by spawning copies of themselves. When they do this they introduce random errors in the genetic code of their offspring which represents mutation.

Previously we were using a ‘SIRS’ model for virus infection (susceptible -> infected -> recovered -> susceptible), based on 4 global parameters that determined the probability of jumping from one state to the next. Using the genetics, the probability of infection is now different for every individual based on:

1. Is a virus infected individual in the current location?

2. If so, use our genetic code to determine the probability of catching it. Currently we use the ratio of A’s to T’s in the genetic string as a totally arbitrary place-holder ‘fitness function’, the lower the number the better. AAAAAAT is bad (fitness: 6) while TTTTTTA is good (fitness: 0.1666) – so we would expect the A’s to disappear over time and the T’s increase in the genetic strings. This number also determines the probability of dying from the disease and (inversely) the probability of gaining immunity to it.

3. A very small ‘background infection’ probability which overrides this, so the virus is always present at a low level and can’t die out.

The next thing we need is a life cycle for the organism – this needs to include the possibility of death and the disease model is now a ‘SIR’ one, as once recovered, individuals cannot go back to being susceptible again.

state

All the other non virus related probabilities in the simulation (spawning offspring, moving location, natural death) are currently globally set – to make sure we are seeing evolution based only on disease related behaviour for now.

This model as it is could form the foundation of a world level visualisation – seeing organisms running around from place to place catching and spreading your virus and evolving resistance to it. However this is only half the story we want to tell in the game, as it doesn’t include our time based ‘phylogenetic’ family tree view. For this, we still need to figure out how to group individuals into species so we can fully visualise the effects of your virus on the evolution of all the populations as a whole.

First we need to decide exactly what a species is – which turns out to be quite an arbitrary concept. The rather course approach that seems to work here is to say that two organisms represent two distinct species if more than a quarter of their genes are different between them.

We can now check each organism as it’s born – and compare its genome against a ‘blueprint’ one that represents the species that it’s parent belongs to. If it’s similar enough we add it to its parents species, if it’s too different we create a new species for it. This new species will have a copy of its genome as the ‘blueprint’ to compare all its descendants with. This should mean we can build up a set of related species over time.

If we run the simulation for 5000 time steps we can generate a phylogenetic family tree at the end, using the branch points between species to connect them. We are hiding species with only 1 member to make it simpler, and the population is started off with 12 unique individuals. Only one of which (species 10) is successful – all the later species are descendants of that one:

test

The numbers here are the ID, fitness and size of population for each species. The colours are an indication of population size. The fitness seems to increase towards the right (as the number drops) – which is what we’d expect if new species are emerging that cope better with the virus. You can imagine changing the virus will cause all this to shift dramatically. The “game mechanic” for viruscraft will all be about tinkering with the virus in different ways that changes the underlying fitness function of the host, and thus the evolution of the populations.

As we used standard biological symbols for our genetic code, we can also convert each species into an entry in a FASTA format text file. These are used by researchers to determine population structure from limited information contained in genetic samples:


> 1 0.75 6
TGCTCTTGCGTACTAGACTGTTGACATCTCCACCGGATAA
> 3 0.46153846153846156 5
TGGTTTTCTGCTGTGGGGATAACCTGCCACTCAGTGGTGA
> 5 0.6153846153846154 171
CACTATCGCTCATTGCACTGTCGTGGTTTTAGTAACGAGC
...

In the FASTA file in the example above, the numbers after the ‘>’ are just used as identifiers and are the same as the tree above. The second line is the blueprint genome for the species (its first individual). We can now visualise these with one of many online tools for biological analysis:

phylo_tree

This analysis is attempting to rebuild the first tree in a way, but it doesn’t have as much information to go on as it’s only looking at similarity of the genetic code. Also 40 bases is not really enough to do this accurately with such a high mutation rate – but I think it’s a good practice to keep information in such a way that it can be analysed like this.

Red King – listening to coevolution

Scientific models are used by researchers in order to understand interactions that are going on around us all the time. They are like microscopes – but rather than observing objects and structures, they focus on specific processes. Models are built from the ground up from mathematical rules that we infer from studying ecosystems, and they allow us to run and re-run experiments to gain understanding, in a way which is not possible using other methods.

I’ve managed to reproduce many of the patterns of co-evolution between the hosts and parasites in the red king model by tweaking the parameters, but the points at which certain patterns emerge is very difficult to pin down. I thought a good way to start building an understanding of this would be to pick random parameter settings (within viable limits) and ‘sweep’ paths between them – looking for any sudden points of change, for example:

random_path-21-0-big

This is a row of simulations which are each run for 600 timesteps, with time running downwards. The parasite is red and the host is blue, and both organism types are overlayed so you can see them reacting to each other through time. Each run has a slightly different parameter setting, gradually changing between two settings as endpoints. Halfway through there is a sudden state change – from being unstable it suddenly locks into a stable situation on the right hand side.

I’ve actually mainly been exploring this through sound so far – I’ve built a setup where the trait values are fed into additive synthesis (adding sine waves together). It seems appropriate to keep the audio technique as direct as possible at this stage so any underlying signals are not lost. Here is another parameter sweep image (100 simulations) and the sonified version, which comprises 2500 simulations, overlapped to increase sound density.

random_path-23-0-big

You can hear quite a few shifts and discontinuities between different branching patterns that emerge at different points – writing this I realise an animated version might be a good idea to try too.

Stereo is done by slightly changing one parameter (the host tradeoff curve) across the left and right channels – so it gives the changes a sense of direction, and you are actually hearing 5000 simulations being run in total, in both ears. All the code so far (very experimental at present) is here. The next thing to do is to take a step back and think about the best way to invite people in to experience this strange world.

Here are some more tests:

random_path-21-0-big

random_path-21-0-big

random_path-38-0-big

Red King: Host/Parasite co-evolution citizen science

A new project begins, on the subject of ecology and evolution of infectious disease. This one is a little different from a lot of Foam Kernow’s citizen science projects in that the subject is theoretical research – and involves mathematical simulations of populations of co-evolving organisms, rather than the direct study of real ones in field sites etc.

The simulation, or model, we are working with is concerned with the co-evolution of parasites and their hosts. Just as in more commonly known simulations of predators and prey, there are complex relationships between hosts and parasites – for example if parasites become too successful and aggressive the hosts start to die out, in turn reducing the parasite populations. Hosts can evolve to resist infection, but this has an overhead that starts to become a disadvantage when most of a population is free of parasites again.

graph
Example evolution processes with different host/parasite trade-offs.

Over time these relationships shift and change, and this happens in different patterns depending on the starting conditions. Little is known about the categorisation of these patterns, or even the range of relationships possible. The models used to simulate them are still a research topic in their own right, so in this project we are hoping to explore different ways people can both control a simulation (perhaps with an element of visual live programming), and also experience the results in a number of ways – via a sonifications, or game world. The eventual, ambitious aim – is to provide a way for people to feedback their discoveries into the research.

sketch

Robot nightjar eggshibition at the Poly, Falmouth

As part of this year’s Fascinate festival we took over the bar at Falmouth’s Poly with visualisations of the camouflage pattern evolution process from the egglab game.

IMG_20140828_103921

tree-125-fs-part

This was a chance to do some detective work on the massive amount of genetic programming data we’ve amassed over the last few months, figure out ways to visualise it and create large prints of the egg pattern generation process. I selected family trees of eggs where mutations caused new features that made them difficult for people to spot, and thus resulted in large numbers of descendants. Then I printed examples of the eggs at different stages to see how they progressed through the generations.

IMG_20140828_104022

We also ran the egglab game in the gallery on a touch screen which accidentally coincided with some great coverage in the Guardian and Popular Science, but the game kept running (most of the time) despite this.

IMG_20140829_151430

IMG_20140829_152451

IMG_20140830_112644

The Poly (or Royal Cornwall Polytechnic Society) was really the perfect place for this exhibition, with its 175 year history of promoting scientists, engineers and artists and encouraging innovation by getting them together in different ways. Today this seems very modern (and would be given one of our grand titles like ‘cross-displinary’) but it’s quite something to see that in a lot of ways the separation between these areas is currently bigger than it ever has been, and all the more urgent because of this. The Poly has some good claims to fame, being the first place Alfred Nobel demonstrated nitro‐glycerine in 1865! Here are some pages from the 1914 report, a feel for what was going on a century ago amongst other radical world changes:

IMG_20140830_105017

IMG_20140830_104857

Butterfly wing pattern evolution

I’ve been working lately with the Heliconius research group at the University of Cambridge on a game to explain the evolution of mimicry in butterfly wing patterns. It’s for use at the Summer Science Exhibition at the Royal Society in London, where it’ll be run on a large touch screen for school children and visiting academics to play.

game

This is my first game to merge the WebGL fluxus port (for rendering the butterflies) and the nightjar HTML5 canvas game engine – which takes care of the 2D elements. Both are making use of my ad-hoc Scheme to Javascript compiler and are rendered as two canvas elements on top of each other, which seems to work really well.

The game models biological processes for education purposes (as opposed to the genetic programming as used on the camouflage egg game), and the process of testing this, and deciding what simplifications are required has become a fascinating part of the design process.

end

In biosciences, genetics are modelled as frequencies of specific alleles in a given population. An allele is a choice (a bit like a switch) encoded by a gene, so a population can be represented as a list of genes where each gene is a list of frequencies of each allele. In this case the genetics consists of choices of wing patterns. The game is designed to demonstrate the evolution of an edible species mimicking a toxic one – we’ll be publishing the game after the event. A disclaimer, my terminology is probably misaligned in the following code, still working on that.

;; an allele is just a string id and a probability value
(define (allele id probability)
  (list id probability))

;; a gene is simply a list of alleles

;; return the id of an allele chosen based on probability
(define (gene-express gene)
  (let ((v (rndf)))
    (car
     (cadr
      (foldl
       (lambda (allele r)
         (let ((segment (+ (car r) (allele-probability allele))))
           (if (and (not (cadr r))
                    (< v segment))                (list segment allele)                (list segment (cadr r)))))        (list 0 #f)        gene)))))  ;; a chromosome is simple list of genes ;; returns a list of allele ids from the chromosome based on probability (define (chromosome-express chromo)   (map gene-express chromo)) 

When an individual is removed from the population, we need to adjust the probabilities by subtracting based on the genetics of the eaten individual, and the adding to the other alleles to keep the probabilities summing to one:

;; prevents the probability from 'fixing' at 0% or 100%
;; min(p,(1-p))*0.1
(define (calc-decrease p)
  (* (min p (- 1 p)) allele-decrease))

;; remove this genome from the population
(define (gene-remove-expression gene genome)
  (let ((dec (calc-decrease (allele-probability (car gene)))))
    (let ((inc (allele-increase dec (length gene))))
      (map
       (lambda (allele)
         (if (eq? (allele-id allele) genome)
             (allele-modify-probability 
                allele (- (allele-probability allele) dec))
             (allele-modify-probability 
                allele (+ (allele-probability allele) inc))))
       gene))))

News from egglab

9,000 players, 20,000 games played and 400,000 tested egg patterns later we have over 30 generations complete on most of our artificial egg populations. The overall average egg difficulty has risen from about 0.4 seconds at the start to 2.5 seconds.

Thank you to everyone who contributed their time to playing the game! We spawned 4 brand new populations last week, and we’ll continue running the game for a while yet.

In the meantime, I’ve started working on ways to visualise the 500Mb of pattern generating code that we’ve evolved so far – here are all the eggs for one of the 20 populations, each row is a generation of 127 eggs starting at the top and ordered in fitness score from left to right:

viz-2-cf-0-small

This tree is perhaps more useful. The ancestor egg at the top is the first generation and you can see how mutations happen and successful variants get selected.

viz-2-cf-1-6-tree-small

Codeclub and Bioinformatics

I’m swatting up on my scratch skills for the first codeclub at Troon Primary School in Cambourne tomorrow afternoon! It’s exciting to finally head to the frontlines of algorithmic literacy in education.

codeclub

Also on Wednesday I present at talk about FoAM and cross-disciplinary working at Exeter University’s Biomedical Informatics Hub, I’ll be talking about Borrowed Scenery, The Lobster DorisMap, Hapstar and Lirec, specifically concentrating on things that happen when people work together across many disciplines.

Hapstar graphs in the wild

Some examples of graphs that scientists have created and published using Hapstar, all these images were taken from the papers that cite the hapstar publication, with links to them below. I think the range of representations of this genetic information indicate some exciting new directions we can take the software in. There are also some possibilities regarding the minimum spanning tree, finding ways to visualise and explore the range of possible MST’s for a given graph.

IVENS, ABF, et al. “Reproduction and dispersal in an ant‐associated root aphid community.” Molecular Ecology (2012).

Wielstra, Ben, and Jan Arntzen. “Postglacial species displacement in Triturus newts deduced from asymmetrically introgressed mitochondrial DNA and ecological niche models.” BMC Evolutionary Biology 12.1 (2012): 161.

Kesäniemi, J. E., Rawson, P. D., Lindsay, S. M. and Knott, K. E. (2012), Phylogenetic analysis of cryptic speciation in the polychaete Pygospio elegans. Ecology and Evolution, 2: 994–1007. doi: 10.1002/ece3.226

Vos M, Quince C, Pijl AS, de Hollander M, Kowalchuk GA (2012) A Comparison of rpoB and 16S rRNA as Markers in Pyrosequencing Studies of Bacterial Diversity. PLoS ONE 7(2): e30600. doi:10.1371/journal.pone.0030600