Sunday 2 June 2013

my first d3

Way back when I was an undergraduate, I stumbled across a book called Curve Tracing, by Percival Frost.  The college library had a first edition, published in 1872.  Several years later, I came across a newer edition in a bookshop, and grabbed at it.

This beautiful little book has tons of curves defined by equations, and shown as graphs.

Plate IV from Curve Tracing
This was all originally done in the days before computers.  It's been a goal of mine to reimplement some of this work, in an interactive form, because many of these curves have parameters that affect their detailed shape.  So, I've been looking for a suitable tool.

Also, I'm interested in data visualisation (and have long been a disciple of Tufte).  I've tried to follow the "no chartjunk" ethos in my own work.  For example, to plot our rainfall statistics, I had to struggle with Excel charts to remove most of the garish "ink" provided by default. But it's still not perfect.  Additionally, I have some specific work I want to do with modified parallel coordinates, for which there is no existing library.  So, I've been looking for a suitable tool.

Last week a colleague of mine mentioned d3, a JavaScript library for Data-Driven Documents. I browsed the gallery for a while, and fell in love.  I spent yesterday playing around with evaluating it, on simple charts, to see what it could do.  This meant reading Scott Murray's d3 tutorial, and implementing a few simple charts to show the rainfall data. (Oh, and learning my first JavaScript.)

First off, I tried a simple bar chart, starting from Murray's tutorial example (when learning a new language, I usually find it easier to modify existing code than start from an empty file). After fiddling around to get the ordinal x-axis working, I got:
d3 chart of 2012 rainfall in mm
I think that looks suitably chartjunk-free and minimal.  It also has the nice feature of combining the actual numerical values into the bars, giving what Tufte calls both a macro-reading (the bars: gosh, July was wet!) and a micro-reading (the numbers: July had 113mm of rain) in one chart.

What's nice about d3 is the way the axes scale automatically.  Exactly the same code produced these charts (the only difference is the July data value: 13, 113, 233):

automatic axis scaling with changing data values
But, of course, we have several years of rainfall data.  A little more hacking coding gave me a grouped bar chart:
rainfall, June 2005 -- May 2013
Now it's possible to see how wet April 2012 was, compared to normal Aprils, even if it wasn't as wet as that July!  My implementation of this is a little kludgy, with too much hard-wiring, since my goal was to evaluate the capability, not (yet) learn the entire language; my next task is to code it more elegantly.

I've always found grouped bar charts rather cluttered, and so I wondered if there was a better way to show the data.  Rather than use some sort of surface plot, I decide to try a projection where the size of a spot is related to the amount of rainfall.  With a surprisingly small change to the code, the grouped bar chart metamorphosed into a "blob" chart:

(left): blob area proportional to monthly rainfall; (right) blob radius proportional to monthly rainfall
This enables comparisons in both dimensions (years, or months), depending on whether you view rows or columns.

So, based on a day's work, I'm very impressed with d3.  However, there are a lot more d3 facilities I need to get up to speed with before I can start my reFrost project in earnest:

  • csv data import -- currently the data is hard-wired into the scripts (ugh)
  • data manipulation -- to calculate medians and quartiles for box and whisker plots
  • lines -- to draw graphs rather than charts
  • maths -- to calculate the functions: and, presumably I'll need a lot more JavaScript
  • interaction -- so parameter values can be chosen by the user
  • transitions -- so the graphs will smoothly change as parameters are varied
  • more -- stuff I don't know about yet, but will need

So, a way to go, but I think I might have identified the tools I need.

No comments:

Post a Comment