Sunday 27 January 2013

it's dead, Jim

I arrived home on Friday evening, after a four hour train journey back from a meeting in Manchester. [Small rant: as we were coming into Nottingham, there was an announcement: "for those of you travelling beyond Nottingham, this train splits here.  Please make sure you are in the rear mumble coaches if you want to travel beyond Nottingham." It wouldn't have helped me even if I could have made out the mumble.  I had no idea which coach I was in (a very long train arrived at Manchester; I got on it somewhere; the train changed direction at Sheffield, confusing my already weak grasp on reality.)  I have a suggestion for these sorts of trains: make the announcement only in the "wrong" coaches, and tell people there that they have to move.]

Anyhow, I got home. I had tea.  I went to my study to switch on my computer.  Nothing.  Did all the obvious things.  Still nothing.  Dead as a nit.

Well, it is nearly 8 years old, so I'm probably due an upgrade.  So I spent a pleasant time speccing up a dream replacement, then down-speccing it to a more reasonable price.

But ... I'd arrived home (near Cambridge) by train.  The car is in still at work, in York.  I can't get to the computer shop until next weekend.  I'm having to making do with this little laptop that has none of my standard software on it.  Grump.

Oh, what first world problems we suffer!

Still, being without a computer this weekend isn't the disaster it might have been.  I have a pile of exam scripts to mark, and no time to surf do important computery stuff, anyway.

Thursday 24 January 2013

faceless

I was just uploading my photo to my Google profile.  The drag-and-drop interface is nice (although it's very irritating to be told of the minimum 250x250 pixel size only after uploading a smaller pic).  However, I was a little disconcerted to see:



No, the problem isn't that I have acquired a large red halo!  That's just there to highlight the surprising warning box:

Are you sure people will recognize you in this photo? It doesn't seem to have a face in it.

I'm not sure whether to be concerned at my lack of face, or to be pleased that I'm immune to Google's face recognition software...

Monday 21 January 2013

more snow

We had a couple of inches of snow last night.  The view along the path now shows the bamboo bowed down under its weight.


Sunday 20 January 2013

RBNs with NumPy, sorted

I've been using Python for a little while now, and love the ease of programming in it.  I also use Matlab, which is wonderful for programming scientific things, particularly with arrays.  But Matlab is expensive, so I only have access to it at work.  Python is free.

I heard that NumPy, the numerical package for Python, had Matlab-like array operations, so thought I'd give it a try.  This weekend I finally had some time (that is, I needed a displacement activity from marking), so I gave it a go.  I decided to do a comparison of something I'd already implemented in Matlab: a Random Boolean Network (RBN) tool.

RBNs were invented by Stuart Kauffman as a simplified model of gene regulatory networks. They have fascintating properties for so simple a construction.  An RBN has the following components:
  • +++N+++ binary-state nodes: +++n_1 .. n_N+++
  • each node can be off or on (in state 0 or 1), +++s_1 .. s_N+++
  • each node has +++K+++ input connections from +++K+++ randomly chosen different nodes in the network, +++c_{11} .. c_{1K} .. c_{N1} .. c_{NK}+++
  • each node has a randomly chosen boolean function of +++K+++ variables, +++b_1 .. b_N+++
An example of an +++N=5, K=2+++ RBN is


Here the node colours represent the different boolean functions +++b_i+++, and the numbers label the nodes from +++0 .. N-1+++.

You start the network in some initial state of the binary nodes.  Each timestep each node receives the state of its +++K+++ connected neighbours, combines them with its boolean function, and sets its next state to that value:
$$s_i(t+1) = b_i(s_{c_{i1}}, s_{c_{i2}}, ... s_{c_{iK}})$$
The marvellous thing about +++K=2+++ RBNs is that, despite being set up to be as random as possible, and having a total of +++2^N+++ possible states they could be in, they rapidly settle down into an attractor cycle of length +++O(\sqrt N)+++.

This establishment of order from seeming randomness is fascinating, but really needs to be demonstrated to be appreciated.  Hence NumPy.

Here's the code:
import matplotlib.pyplot as plt
from numpy import *

K = 2       # number of connections
N = 500     # number of nodes, indexed 0 .. N-1
T = 200     # timesteps

Pow = 2**arange(K) # [ 1 2 4 ... ], for converting inputs to numerical value
Con = apply_along_axis(random.permutation, 1, tile(range(N), (N,1) ))[:, 0:K]
Bool = random.randint(0, 2, (N, 2**K))

State = zeros((T+1,N),dtype=int)
State[0] = random.randint(0, 2, N)
for t in range(T):  # 0 .. T-1
    State[t+1] = Bool[:, sum(Pow * State[t,Con],1)].diagonal()

plt.imshow(State, cmap='Greys', interpolation='None')
plt.show()
There's essentially only three lines doing much substantive, which is the joy of working directly with arrays: no fiddly, wordy iterations.  Con holds the random connections, Bool holds the random functions, and the loop over t calculates the next State each timestep.

I dare say there's more elegant ways to do this, but I am still learning NumPy's capabilities. But what exactly is going on here?

For the Con array we need to choose K random inputs for each node.  These need to be distinct inputs, so we can't just choose them at random, because there might be collisions.  We could keep choosing, and keep throwing away collisions, but there's another way to do it:
  • range(N) gives a list [0, 1, .., N-1].  Let's take N=5 here
  • tile(...) makes an array of 5 stacked copies of this:
    • [ [ 0 1 2 3 4 5 ]
        [ 0 1 2 3 4 5 ]
        [ 0 1 2 3 4 5 ]
        [ 0 1 2 3 4 5 ]
        [ 0 1 2 3 4 5 ] ]
  • apply_along_axis(random.permutation, ...) applies a random permutation to each row individually
    • [ [ 2 0 1 3 5 4 ]
        [ 0 2 3 4 5 1 ]
        [ 1 3 2 5 0 4 ]
        [ 3 1 4 2 0 5 ]
        [ 4 2 0 5 1 3 ] ]
  • ...[:, 0:K] takes the first K items from each row.  Here K = 2
    • [ [ 2 0 ]
        [ 0 2 ]
        [ 1 3 ]
        [ 3 1 ]
        [ 4 2 ] ]
This gives the array of node connections: node 0 has inputs from node 2 and itself; node 1 has inputs from node 0 and node 2, and so on.  See the figure earlier.

Bool has the +++N+++ random boolean functions.  Here each function is stored as a lookup table: a list of +++2^K+++ ones and zeros.

For the State update
  • State[t,Con] gets the inputs from the connections
  • sum(...) converts this array of ones and zeros into an index +++0..2^K-1+++
  • Bool[: ...].diagonal() looks up the next state value from this index 
And that's it!  We can use this to plot the time evolution of an RBN.
The string of nodes is drawn as a horizontal line at each timestep, and  time increases down the page. You can see it has some random structure for the first few timesteps, then rapidly settles down into regular behaviour.

Well, it's not that easy to see the regular behaviour.  It's a bit of a jumble really.  We can do better.

The problem is, since an RBN is random, there's appears to be no obvious order to write down the nodes.  The picture above uses the order as first given, which is ... random.  However, an RBN does have structure; it has a "frozen core" of nodes that settle down into a "frozen" state of always on, or always off.  If we sort the nodes by their overall activity, it highlights the structure better.

So, there's a little bit of extra code, sitting just before the loop updating the state.
SRun = 5     # sorting runs
ST = 200     # sorting timesteps
State = zeros((ST+1,N),dtype=int)
Totals = State[0]

for r in range(SRun):
    for t in range(ST):
        State[t+1] = Bool[:, sum(Pow * State[t,Con],1)].diagonal()
        Totals = Totals + State[t+1]
    State[0] = random.randint(0, 2, N) # new initial random state
    
Index = argsort(Totals)    # permutation indexes for sorted order
Bool = Bool[Index]         # permute the boolean functions
Con = Con[Index]           # permute the connections

InvIndex = argsort(Index)  # inverse permutation
Con = InvIndex[Con]        # relabel the connections
This extra code runs the RBN several times (from different initial conditions, each potentially leading to different attractor cycles involving different patterns of node activity), totalling up the number of times each node is active.  Sorting this array puts more inactive nodes towards the start, and more active nodes towards the end.  argsort() doesn't return the sorted array, however; it returns a permutation of the indexes corresponding to this sort.  This Index array can then be used to sort the Con and Bool arrays into the same order.  This results in something like:

Having done this, we need to relabel the nodes and the connection indexes.  This requires using the inverse sort permutation, InvIndex.  So we get
Running the modified code gives a much clearer picture of the RBN's dynamic behaviour:


So, after all this, what do I think of NumPy?

It's excellent.  Everything I needed (random permutations, sorting, applying functions across arrays, indexing arrays with other arrays, whatever), it's all there.  The code produced is very compact. (So compact that I've commented it quite liberally in the source file.)

The online documentation is mostly adequate, and whenever I puzzled over how to do something, a quick Google usually got me to a forum where my question had already been answered.

NumPy also has a big advantage over Matlab (in addition to the price!).  With Matlab, many functions and operations (such as array indexing) can be applied only to array literals, not to array expressions.  This makes it hard to build up compound operations without having to have a lot of intermediate variables.  With NumPy, you can just build up the expression in one go.  That makes for a more natural style of programming (although I suspect it could also make for some spectacular write-only code).

Anyhow, I'm please with my experiment, and will be delving further into NumPy in the future.

Monday 14 January 2013

first snow

The first snow of the year fell last night: only a few centimetres, enough to make things pretty but not cause any problems.  More snow is forecast, so the problems are to come!


a view along the path, through the bamboo

Harry Lauder's Walking Stick

Sunday 13 January 2013

not mailing but drowning

Email takes out a large slice of my day.  On my first serious day back at work after the Christmas break, I spent all morning on the backed up emails (despite having been monitoring and keeping up to some degree); at lunchtime I had one more email in my inbox than I'd had first thing: I wasn't even keeping up!

Someone at work has suggested that we all adopt the Email Charter.  It's got much excellent advice, and a few things I disagree with, which I'll explain here.

1. Respect Recipients' Time. This is the fundamental rule. As the message sender, the onus is on YOU to minimize the time your email will take to process. Even if it means taking more time at your end before sending.
This is the fundamental rule of all writing. Don't just dash something off, expecting your reader to pick through the verbiage to extract your meaning.  This is particularly true if your writing is to be read by more than one person: the total reading time is multiplied by the number of readers.  Hence the important role of reviewers and editors in published writing: to reduce the amount of time people waste reading nonsense.  Be your own editor: proofread and edit down your emails. Don't fall back on Pascal's excuse: "I am sorry for the length of my letter, but I had not the time to write a short one."

2. Short or Slow is not Rude.  Let's mutually agree to cut each other some slack. Given the email load we're all facing, it's OK if replies take a while coming and if they don't give detailed responses to all your questions. No one wants to come over as brusque, so please don't take it personally. We just want our lives back! 
Way back in the day, before email was ubiquitous, before personal computers were common, we used to get our business letters typed up by secretaries.  I heard back then (I'm talking mid 1980s here) via the grapevine that one of the secretaries thought that my business letters were "short, to the point of rudeness".  I always forget to put in all that initial polite handshaking part, and just cut to the chase.  So email is perfect for me!

Slow, though?  Well, that depends on what you mean by "slow".  More than 10 minutes, that's not slow.  More than 24 hours -- I'm beginning to worry that your spam filter ate my email, and might think about sending it again.  A simple "ack" can help allay this worry.

3. Celebrate Clarity. Start with a subject line that clearly labels the topic, and maybe includes a status category [Info], [Action], [Time Sens] [Low Priority]. Use crisp, muddle-free sentences. If the email has to be longer than five sentences, make sure the first provides the basic reason for writing. Avoid strange fonts and colors.
This confuses two issues: the subject line, and the body, of the email. The body part is really covered by point 1. Subject lines should be a clear "title" about the topic of the email. There are a few points I think are important about subject lines.
  1. Your reader doesn't have the same context that you do, so the subject line that's best for you isn't necessarily the best for them. Let's say you are a project manager where all your emails are currently about project BettaWidget.  You email financial support with a query.  Subject "Financial query" is great for you, but pretty useless for them, as most of their email is about financial queries!  Subject "BettaWidget" is more informative to them, but won't mean much to you when you get the reply.  So, try "BettaWidget financial query".  It's essentially the union of your tag set and your recipient's tag set. 
  2. Good subject lines mean one topic per email : the one related to the subject line.  Two topics leads to lack of clarity, and the danger that one topic will get overlooked.  In the past, it also meant irritation of not being able to file it in the right place. But now we have tags rather than hierarchical folders, this isn't so pressing.
  3. Change the topic, change the subject line!  If your conversation has drifted off onto something else, change the subject line to reflect this.  This links to point 6 below. 

4. Quash Open-Ended Questions. It is asking a lot to send someone an email with four long paragraphs of turgid text followed by "Thoughts?". Even well-intended-but-open questions like "How can I help?" may not be that helpful. Email generosity requires simplifying, easy-to-answer questions. "Can I help best by a) calling b) visiting or c) staying right out of it?!"
Maybe a further "d) other (please specify)" is needed to allow for the creative solution that you are secretly hoping for!

5. Slash Surplus cc's.  cc's are like mating bunnies. For every recipient you add, you are dramatically multiplying total response time. Not to be done lightly! When there are multiple recipients, please don't default to 'Reply All'. Maybe you only need to cc a couple of people on the original thread. Or none.
It can be difficult to know who needs to see an email. Sometimes there's a few people who need to see it and do something, and a few who need it for information only.  The first should be recipients, the second should be the "cc"s.  But it can be hard to distinguish these categories: Gmail doesn't do this unless you look at "details", for example.  It might be worth having a couple of lines of intro:
  1. Hi X, Y, Z (for action)
  2. cc A, B, C (fyi)
I haven't tried this (yet), but it might help address the problem.  If you have people in the list who are neither for action nor for info, why are they there?

The "to all" message that starts out "To all of you who haven't done X" is the most problematic.  I confess that I have been guilty of this on occasion, usually when pressed for time, but I wince when I do it.  I hate receiving these, as I'm left thinking "have I forgotten to do X?", and have to waste even more time checking that, yes, I have done X.  Don't send to "all" with the disclaimer; send it just to those who haven't done X.   It may take less extra time than you think---you will be saved reading all those aggrieved emails from people telling you that actually, they have done X---and it saves the time of everyone who has done X (not least because they won't be sending said aggrieved emails!).

6. Tighten the Thread. Some emails depend for their meaning on context. Which means it's usually right to include the thread being responded to. But it's rare that a thread should extend to more than 3 emails. Before sending, cut what's not relevant. Or consider making a phone call instead. 
This isn't so much of a problem -- we can read as far through the historical provenance as we need to.  It saves deleting something that's actually needed.  And the academic in me feels bad at removing references: it's almost verging on plagiarism.

"Or make a phone call instead."  No.  No, no, no.  Absolutely not. In case I wasn't clear about this: no, do not make a phone call instead.  The glorious, wonderful thing about emails is that the recipient is in control of when, and whether, to respond. If I don't respond to your email, it's because:
  1. I'm busy in a meeting. Some of my colleagues are bemused that I don't answer my phone when I'm in a meeting with them.  When I do answer, the person calling often asks if I am free to talk: yes, I am, if I wasn't, I wouldn't have answered the phone.
  2. I'm busy doing some real work, and don't want to be disturbed.  It takes 15-20 minutes to get into a concentration "flow" needed for technical work; it takes a second to be broken out of it.  My phone is on such a soft ring I can ignore it in flow.
  3. I need to do something, find something out, contact someone else, read a document, even just think a bit, before I can give you a meaningful response, so it will take some time.
  4. I'm ignoring you for now.
  5. (unlikely) It's lodged in my spam filter.
The phone is ideal for a conversation that needs some back and forth discussion.  An example happened last week.  A colleague emailed to say that their group had been having a planning discussion, and she needed to tell me about it, and it was easier to talk through than write down.  Using email, we set up a time to talk.  The call lasted 50 minutes (it would indeed have been hard to write down!)  If you want to talk to me on the phone, email me and ask me to call you (tell me when, so that I don't disturb you in turn).

7. Attack Attachments.  Don't use graphics files as logos or signatures that appear as attachments. Time is wasted trying to see if there's something to open. Even worse is sending text as an attachment when it could have been included in the body of the email. 
Absolutely!  Central admin in particular seems to delight in circulating emails that say "please read the enclosed", which results in you spending 10-15 seconds reading that instruction, scrolling down to the attachment, downloading it, and opening it, only to be confronted with a few lines of text that could easily have been pasted into the email.  10-15 seconds, multiplied by every recipient, multiplied by every such email, amounts to a lot of wasted effort.  Especially when the attachment has been forgotten in the first place!  (Maybe those helpful little "did you mean to send an attachment?" popups, which sometimes appear when you forget a promised attachment, should be complemented with "do you really need to send this attachment?", when you don't!)

8. Give these Gifts: EOM NNTR.  If your email message can be expressed in half a dozen words, just put it in the subject line, followed by EOM (= End of Message). This saves the recipient having to actually open the message. Ending a note with "No need to respond" or NNTR, is a wonderful act of generosity. Many acronyms confuse as much as help, but these two are golden and deserve wide adoption.
EOM isn't even needed if you have an email client like that shows the beginning of the body text, or not, if there isn't any.  OTOH, NNTR seems to add extra acronym burden when there's already the well-known and perfects acceptable FYI.

9. Cut Contentless Responses. You don't need to reply to every email, especially not those that are themselves clear responses. An email saying "Thanks for your note. I'm in." does not need you to reply "Great." That just cost someone another 30 seconds.
This is another slightly tricky one.  Again, sometimes there's a need for an "ack", or at least the recipient feels the need to say "thanks", whatever.  Maybe an extra "RSVP regrets" or some explicit opt out scheme is needed in such cases?

10. Disconnect! If we all agreed to spend less time doing email, we'd all get less email! Consider calendaring half-days at work where you can't go online. Or a commitment to email-free weekends. Or an 'auto-response' that references this charter. And don't forget to smell the roses.

tl;dr: let's follow the Email Charter (except that phone call business, of course!) and make our, and our recipients', inboxes hold only short, content-full, attachment-less, relevant subject-lined, pertinent emails!

This may be a short term problem, though.  Apparently, teens don't use email. The next generation will need a Text Charter instead.

steamy success

On Friday I finished preparing a large document that had been consuming most of my time over the Christmas "vacation" (when I wasn't reading, watching TV, or eating, of course).  I was looking forward to a relaxing weekend, so was pleased to see that a new level of Cut theRope was ready for my phone.

This has a new "Steam Box", with little 3-state steam valves that you can use to hold a candy in vertical place, or give it a horizontal push.  For some reason the steam doesn't burst the floaty bubbles...

In between bits of (other) pottering around yesterday, I completed all but three of the 25 new levels.  Then I got stuck, so put it to one side for the evening.  Picking it up again this morning, I stared at the tricky level, went, "oh, of course", and polished off the whole thing.



Relaxation accomplished!

Tuesday 8 January 2013

films I watched on my hols

This Christmas there were a bunch of watchable films on TV -- so we watched them.
That's probably more films than we'll watch in the rest of the year put together.

Sunday 6 January 2013

all gone

A faithful cohort of four ducks stayed with us over the New Year.  But yesterday evening none returned.  So we are, again, a duck free zone.  No more early morning quacking to wake us!

Tuesday 1 January 2013

wet 2012

April was wet. Very wet.  For April.

But what about the whole of 2012?  It certainly felt wet.  How wet was it?

Well, since records began (our records, that is, which began in mid-2005), 2012 was indeed the wettest year ever:

annual rainfall, in mm
It certainly bucked the trend of the previous two "drought" years.  No hosepipe bans for a few months more, I expect.  So, April was wet; 2012 was wet.  Was April a particularly wet month in 2012?

mean, median, and 2012 monthly rainfall, in mm
So, April was only the third wettest month, beaten by December, and washed away by July. Strangely, when the north of England was being flooded out in September, down our way (near Cambridge) it was one of the driest months of the year!

Those mean/median figures don't tell anything like the whole story, though.  Here's more information:

min, lower quartile, median, upper quartile, max, and 2012 monthly rainfall, in mm
The box plots summarise the 2005-2011 data, and the blue bars are the 2012 data.  A blue bar within the lower/upper quartile box is nothing special: half the data falls there.  So May, September (the flooding month!), and November were fairly typical.

A blue box between the min/max lines is a little bit special, as only a quarter of the data falls in the first or last quartile (hence the name!).  So January and March were a bit on the dry side, while June, October and December were a bit wet.  (Well, December was quite a bit wet, being nearly at the maximum.)

I love that difference between the May and June data: very similar medians, minima and maxima, but wildly different lower and upper quartiles.  May is essentially bimodal -- wet or dry (this year was one of the dry ones) -- whilst June is middling damp with a couple of outliers (this year was an outlier, too).

A blue box outside the min/max lines is a driest/wettest seen so far.  Four months in 2012 managed this (but since we have only 8 years of prior data, I'm not going to over-interpret the significance of this).  2012 had the driest February (despite it being a longer leap-year month!) and  August, and the wettest April and July (by far!) since our records began.

So, a bit difficult to make any general statements about the rainfall, then.  This is why in the UK were talk about the weather all the time.  It changes, all the time.  All we can conclude from above is the lovely statement I saw in a newspaper many years ago: "it's usual to have unusual weather this time of year" -- or any time, really!