I heard that NumPy, the numerical package for Python, had Matlab-like array operations, so thought I'd give it a try. This weekend I finally had some time (that is, I needed a displacement activity from marking), so I gave it a go. I decided to do a comparison of something I'd already implemented in Matlab: a Random Boolean Network (RBN) tool.
RBNs were invented by Stuart Kauffman as a simplified model of gene regulatory networks. They have fascintating properties for so simple a construction. An RBN has the following components:
- +++N+++ binary-state nodes: +++n_1 .. n_N+++
- each node can be off or on (in state 0 or 1), +++s_1 .. s_N+++
- each node has +++K+++ input connections from +++K+++ randomly chosen different nodes in the network, +++c_{11} .. c_{1K} .. c_{N1} .. c_{NK}+++
- each node has a randomly chosen boolean function of +++K+++ variables, +++b_1 .. b_N+++
An example of an +++N=5, K=2+++ RBN is
Here the node colours represent the different boolean functions +++b_i+++, and the numbers label the nodes from +++0 .. N-1+++.
You start the network in some initial state of the binary nodes. Each timestep each node receives the state of its +++K+++ connected neighbours, combines them with its boolean function, and sets its next state to that value:
$$s_i(t+1) = b_i(s_{c_{i1}}, s_{c_{i2}}, ... s_{c_{iK}})$$
The marvellous thing about +++K=2+++ RBNs is that, despite being set up to be as random as possible, and having a total of +++2^N+++ possible states they could be in, they rapidly settle down into an attractor cycle of length +++O(\sqrt N)+++.
This establishment of order from seeming randomness is fascinating, but really needs to be demonstrated to be appreciated. Hence NumPy.
Here's the code:
There's essentially only three lines doing much substantive, which is the joy of working directly with arrays: no fiddly, wordy iterations.
I dare say there's more elegant ways to do this, but I am still learning NumPy's capabilities. But what exactly is going on here?
For the
For the
The problem is, since an RBN is random, there's appears to be no obvious order to write down the nodes. The picture above uses the order as first given, which is ... random. However, an RBN does have structure; it has a "frozen core" of nodes that settle down into a "frozen" state of always on, or always off. If we sort the nodes by their overall activity, it highlights the structure better.
So, there's a little bit of extra code, sitting just before the loop updating the state.
This extra code runs the RBN several times (from different initial conditions, each potentially leading to different attractor cycles involving different patterns of node activity), totalling up the number of times each node is active. Sorting this array puts more inactive nodes towards the start, and more active nodes towards the end.
Having done this, we need to relabel the nodes and the connection indexes. This requires using the inverse sort permutation, InvIndex. So we get
Running the modified code gives a much clearer picture of the RBN's dynamic behaviour:
So, after all this, what do I think of NumPy?
It's excellent. Everything I needed (random permutations, sorting, applying functions across arrays, indexing arrays with other arrays, whatever), it's all there. The code produced is very compact. (So compact that I've commented it quite liberally in the source file.)
The online documentation is mostly adequate, and whenever I puzzled over how to do something, a quick Google usually got me to a forum where my question had already been answered.
NumPy also has a big advantage over Matlab (in addition to the price!). With Matlab, many functions and operations (such as array indexing) can be applied only to array literals, not to array expressions. This makes it hard to build up compound operations without having to have a lot of intermediate variables. With NumPy, you can just build up the expression in one go. That makes for a more natural style of programming (although I suspect it could also make for some spectacular write-only code).
Anyhow, I'm please with my experiment, and will be delving further into NumPy in the future.
Here the node colours represent the different boolean functions +++b_i+++, and the numbers label the nodes from +++0 .. N-1+++.
You start the network in some initial state of the binary nodes. Each timestep each node receives the state of its +++K+++ connected neighbours, combines them with its boolean function, and sets its next state to that value:
$$s_i(t+1) = b_i(s_{c_{i1}}, s_{c_{i2}}, ... s_{c_{iK}})$$
The marvellous thing about +++K=2+++ RBNs is that, despite being set up to be as random as possible, and having a total of +++2^N+++ possible states they could be in, they rapidly settle down into an attractor cycle of length +++O(\sqrt N)+++.
This establishment of order from seeming randomness is fascinating, but really needs to be demonstrated to be appreciated. Hence NumPy.
Here's the code:
import matplotlib.pyplot as plt from numpy import * K = 2 # number of connections N = 500 # number of nodes, indexed 0 .. N-1 T = 200 # timesteps Pow = 2**arange(K) # [ 1 2 4 ... ], for converting inputs to numerical value Con = apply_along_axis(random.permutation, 1, tile(range(N), (N,1) ))[:, 0:K] Bool = random.randint(0, 2, (N, 2**K)) State = zeros((T+1,N),dtype=int) State[0] = random.randint(0, 2, N) for t in range(T): # 0 .. T-1 State[t+1] = Bool[:, sum(Pow * State[t,Con],1)].diagonal() plt.imshow(State, cmap='Greys', interpolation='None') plt.show()
Con
holds the random connections, Bool
holds the random functions, and the loop over t
calculates the next State
each timestep.I dare say there's more elegant ways to do this, but I am still learning NumPy's capabilities. But what exactly is going on here?
For the
Con
array we need to choose K
random inputs for each node. These need to be distinct inputs, so we can't just choose them at random, because there might be collisions. We could keep choosing, and keep throwing away collisions, but there's another way to do it:range(N)
gives a list[0, 1, .., N-1]
. Let's takeN=5
heretile(...)
makes an array of 5 stacked copies of this:[ [ 0 1 2 3 4 5 ]
[ 0 1 2 3 4 5 ]
[ 0 1 2 3 4 5 ]
[ 0 1 2 3 4 5 ]
[ 0 1 2 3 4 5 ] ]apply_along_axis(random.permutation, ...)
applies a random permutation to each row individually[ [ 2 0 1 3 5 4 ]
[ 0 2 3 4 5 1 ]
[ 1 3 2 5 0 4 ]
[ 3 1 4 2 0 5 ]
[ 4 2 0 5 1 3 ] ]...[:, 0:K]
takes the firstK
items from each row. HereK = 2
[ [ 2 0 ]
[ 0 2 ]
[ 1 3 ]
[ 3 1 ]
[ 4 2 ] ]
This gives the array of node connections: node 0 has inputs from node 2 and itself; node 1 has inputs from node 0 and node 2, and so on. See the figure earlier.
Bool
has the +++N+++ random boolean functions. Here each function is stored as a lookup table: a list of +++2^K+++ ones and zeros.For the
State
updateState[t,Con]
gets the inputs from the connectionssum(...)
converts this array of ones and zeros into an index +++0..2^K-1+++Bool[: ...].diagonal()
looks up the next state value from this index
And that's it! We can use this to plot the time evolution of an RBN.
The string of nodes is drawn as a horizontal line at each timestep, and time increases down the page. You can see it has some random structure for the first few timesteps, then rapidly settles down into regular behaviour.
Well, it's not that easy to see the regular behaviour. It's a bit of a jumble really. We can do better.
The problem is, since an RBN is random, there's appears to be no obvious order to write down the nodes. The picture above uses the order as first given, which is ... random. However, an RBN does have structure; it has a "frozen core" of nodes that settle down into a "frozen" state of always on, or always off. If we sort the nodes by their overall activity, it highlights the structure better.
So, there's a little bit of extra code, sitting just before the loop updating the state.
SRun = 5 # sorting runs ST = 200 # sorting timesteps State = zeros((ST+1,N),dtype=int) Totals = State[0] for r in range(SRun): for t in range(ST): State[t+1] = Bool[:, sum(Pow * State[t,Con],1)].diagonal() Totals = Totals + State[t+1] State[0] = random.randint(0, 2, N) # new initial random state Index = argsort(Totals) # permutation indexes for sorted order Bool = Bool[Index] # permute the boolean functions Con = Con[Index] # permute the connections InvIndex = argsort(Index) # inverse permutation Con = InvIndex[Con] # relabel the connections
argsort()
doesn't return the sorted array, however; it returns a permutation of the indexes corresponding to this sort. This Index
array can then be used to sort the Con
and Bool
arrays into the same order. This results in something like:Having done this, we need to relabel the nodes and the connection indexes. This requires using the inverse sort permutation, InvIndex. So we get
Running the modified code gives a much clearer picture of the RBN's dynamic behaviour:
So, after all this, what do I think of NumPy?
It's excellent. Everything I needed (random permutations, sorting, applying functions across arrays, indexing arrays with other arrays, whatever), it's all there. The code produced is very compact. (So compact that I've commented it quite liberally in the source file.)
The online documentation is mostly adequate, and whenever I puzzled over how to do something, a quick Google usually got me to a forum where my question had already been answered.
NumPy also has a big advantage over Matlab (in addition to the price!). With Matlab, many functions and operations (such as array indexing) can be applied only to array literals, not to array expressions. This makes it hard to build up compound operations without having to have a lot of intermediate variables. With NumPy, you can just build up the expression in one go. That makes for a more natural style of programming (although I suspect it could also make for some spectacular write-only code).
Anyhow, I'm please with my experiment, and will be delving further into NumPy in the future.
No comments:
Post a Comment