## Sunday, 20 January 2013

### RBNs with NumPy, sorted

I've been using Python for a little while now, and love the ease of programming in it.  I also use Matlab, which is wonderful for programming scientific things, particularly with arrays.  But Matlab is expensive, so I only have access to it at work.  Python is free.

I heard that NumPy, the numerical package for Python, had Matlab-like array operations, so thought I'd give it a try.  This weekend I finally had some time (that is, I needed a displacement activity from marking), so I gave it a go.  I decided to do a comparison of something I'd already implemented in Matlab: a Random Boolean Network (RBN) tool.

RBNs were invented by Stuart Kauffman as a simplified model of gene regulatory networks. They have fascintating properties for so simple a construction.  An RBN has the following components:
• $N$ binary-state nodes: $n_1 .. n_N$
• each node can be off or on (in state 0 or 1), $s_1 .. s_N$
• each node has $K$ input connections from $K$ randomly chosen different nodes in the network, $c_{11} .. c_{1K} .. c_{N1} .. c_{NK}$
• each node has a randomly chosen boolean function of $K$ variables, $b_1 .. b_N$
An example of an $N=5, K=2$ RBN is

Here the node colours represent the different boolean functions $b_i$, and the numbers label the nodes from $0 .. N-1$.

You start the network in some initial state of the binary nodes.  Each timestep each node receives the state of its $K$ connected neighbours, combines them with its boolean function, and sets its next state to that value:
$$s_i(t+1) = b_i(s_{c_{i1}}, s_{c_{i2}}, ... s_{c_{iK}})$$
The marvellous thing about $K=2$ RBNs is that, despite being set up to be as random as possible, and having a total of $2^N$ possible states they could be in, they rapidly settle down into an attractor cycle of length $O(\sqrt N)$.

This establishment of order from seeming randomness is fascinating, but really needs to be demonstrated to be appreciated.  Hence NumPy.

Here's the code:
import matplotlib.pyplot as plt
from numpy import *

K = 2       # number of connections
N = 500     # number of nodes, indexed 0 .. N-1
T = 200     # timesteps

Pow = 2**arange(K) # [ 1 2 4 ... ], for converting inputs to numerical value
Con = apply_along_axis(random.permutation, 1, tile(range(N), (N,1) ))[:, 0:K]
Bool = random.randint(0, 2, (N, 2**K))

State = zeros((T+1,N),dtype=int)
State[0] = random.randint(0, 2, N)
for t in range(T):  # 0 .. T-1
State[t+1] = Bool[:, sum(Pow * State[t,Con],1)].diagonal()

plt.imshow(State, cmap='Greys', interpolation='None')
plt.show()

There's essentially only three lines doing much substantive, which is the joy of working directly with arrays: no fiddly, wordy iterations.  Con holds the random connections, Bool holds the random functions, and the loop over t calculates the next State each timestep.

I dare say there's more elegant ways to do this, but I am still learning NumPy's capabilities. But what exactly is going on here?

For the Con array we need to choose K random inputs for each node.  These need to be distinct inputs, so we can't just choose them at random, because there might be collisions.  We could keep choosing, and keep throwing away collisions, but there's another way to do it:
• range(N) gives a list [0, 1, .., N-1].  Let's take N=5 here
• tile(...) makes an array of 5 stacked copies of this:
• [ [ 0 1 2 3 4 5 ]  [ 0 1 2 3 4 5 ]  [ 0 1 2 3 4 5 ]  [ 0 1 2 3 4 5 ]  [ 0 1 2 3 4 5 ] ]
• apply_along_axis(random.permutation, ...) applies a random permutation to each row individually
• [ [ 2 0 1 3 5 4 ]  [ 0 2 3 4 5 1 ]  [ 1 3 2 5 0 4 ]  [ 3 1 4 2 0 5 ]  [ 4 2 0 5 1 3 ] ]
• ...[:, 0:K] takes the first K items from each row.  Here K = 2
• [ [ 2 0 ]  [ 0 2 ]  [ 1 3 ]  [ 3 1 ]  [ 4 2 ] ]
This gives the array of node connections: node 0 has inputs from node 2 and itself; node 1 has inputs from node 0 and node 2, and so on.  See the figure earlier.

Bool has the $N$ random boolean functions.  Here each function is stored as a lookup table: a list of $2^K$ ones and zeros.

For the State update
• State[t,Con] gets the inputs from the connections
• sum(...) converts this array of ones and zeros into an index $0..2^K-1$
• Bool[: ...].diagonal() looks up the next state value from this index
And that's it!  We can use this to plot the time evolution of an RBN.
The string of nodes is drawn as a horizontal line at each timestep, and  time increases down the page. You can see it has some random structure for the first few timesteps, then rapidly settles down into regular behaviour.

Well, it's not that easy to see the regular behaviour.  It's a bit of a jumble really.  We can do better.

The problem is, since an RBN is random, there's appears to be no obvious order to write down the nodes.  The picture above uses the order as first given, which is ... random.  However, an RBN does have structure; it has a "frozen core" of nodes that settle down into a "frozen" state of always on, or always off.  If we sort the nodes by their overall activity, it highlights the structure better.

So, there's a little bit of extra code, sitting just before the loop updating the state.
SRun = 5     # sorting runs
ST = 200     # sorting timesteps
State = zeros((ST+1,N),dtype=int)
Totals = State[0]

for r in range(SRun):
for t in range(ST):
State[t+1] = Bool[:, sum(Pow * State[t,Con],1)].diagonal()
Totals = Totals + State[t+1]
State[0] = random.randint(0, 2, N) # new initial random state

Index = argsort(Totals)    # permutation indexes for sorted order
Bool = Bool[Index]         # permute the boolean functions
Con = Con[Index]           # permute the connections

InvIndex = argsort(Index)  # inverse permutation
Con = InvIndex[Con]        # relabel the connections

This extra code runs the RBN several times (from different initial conditions, each potentially leading to different attractor cycles involving different patterns of node activity), totalling up the number of times each node is active.  Sorting this array puts more inactive nodes towards the start, and more active nodes towards the end.  argsort() doesn't return the sorted array, however; it returns a permutation of the indexes corresponding to this sort.  This Index array can then be used to sort the Con and Bool arrays into the same order.  This results in something like:

Having done this, we need to relabel the nodes and the connection indexes.  This requires using the inverse sort permutation, InvIndex.  So we get
Running the modified code gives a much clearer picture of the RBN's dynamic behaviour:

So, after all this, what do I think of NumPy?

It's excellent.  Everything I needed (random permutations, sorting, applying functions across arrays, indexing arrays with other arrays, whatever), it's all there.  The code produced is very compact. (So compact that I've commented it quite liberally in the source file.)

The online documentation is mostly adequate, and whenever I puzzled over how to do something, a quick Google usually got me to a forum where my question had already been answered.

NumPy also has a big advantage over Matlab (in addition to the price!).  With Matlab, many functions and operations (such as array indexing) can be applied only to array literals, not to array expressions.  This makes it hard to build up compound operations without having to have a lot of intermediate variables.  With NumPy, you can just build up the expression in one go.  That makes for a more natural style of programming (although I suspect it could also make for some spectacular write-only code).

Anyhow, I'm please with my experiment, and will be delving further into NumPy in the future.