Sunday 23 June 2013

boxes, whiskers, and violins

My experiments with the charting language d3 continue.  I'm looking at different ways to display book review statistics.  For example, I'm interested in how long books sit on the shelf between being bought, and being read.  It can be a long time; but how long?

A scatter plot shows all the data:


Here I have colour-coded book types as purple for non-fiction, green for science fiction, and orange for other fiction.  The existence of the line is clear: I can't read books before I acquire them.  (The green dot below the line in 1998 is an anomaly: either the acquisition date or the review date is wrong.  I have investigated, but can't determine which, so have left it as it is.)

The scatter plot shows all the data, and provides a visual clue that maybe the time between acquisition and reading isn't too long.  (Of course, this only shows the books I've acquired and read, not the ones I've acquired and are still languishing on the unread shelves!)

In order to better visualise the time delay, I plotted the data as box-and-whisker plots (showing median, quartiles, outliers, and here also the mean), overlayed with violin plots, (showing a more detailed estimate of the underlying distribution):


This was all relatively easy to do using the d3 chart library, and Jason Davies' science.js library for the kernel density estimator needed for the violin plot.

So from this I can see that I tend to read fiction almost as soon as I buy it, SF quite soon after, and that non-fiction sticks around on the shelves for longer.  This probably indicates that I buy non-fiction partly as an investment (it's my pension fund!).

Maybe I should spend more time reading them, and less time learning new languages so I can analyse how fast I'm reading them?

No comments:

Post a Comment