Saturday 25 November 2017

then four come along at once

One of the tasks of an academic is examining research students.  This involves reading a thesis, then discussing it with the student in a viva.  In the UK the viva is held behind closed doors, and usually involves only two examiners, one internal (same university as the student) and one external.  The scheme is different on the continent: the defence is public, following a lecture, and there may be multiple “opponents”, or even an entire panel, of examiners.

If every research student needs (a minimum of) two examiners, that implies for every student I supervise, I should examine at least two others (one as internal, one as external, to keep the numbers balanced).  In fact, since many universities like to have “senior” external examiners, I should probably do a few more than that.

Each year, I usually take on one or two new students.  So, each year, I should probably examine about three to five students.  So far this year I have examined only two: both external continental defences, one in Trondheim in April, one in Lyon in July.

That implies I should have two or three more to do this year (if it is an average year).  And sure enough, in November a pile of theses started raining down in a seemingly never-ending stream on my desk.


So the total number is about right, but all at once is a bit of a nightmare.  Two internal students, two external students, all to be examined before the end of January.  I know what I’m going to be reading over Christmas.


Thursday 23 November 2017

Fun with Python generator pipelines

When I was first looking at Python generators, I discovered David Beazley’s wonderful presentation on Generator Tricks for Systems Programmers, where he gives a lovely example of using generators in a Unix-pipeline fashion, each generator piping its output into the next.

Before I start using this approach, I’m going to need some debugging help. I like to write-a-line-test-a-line, and I use print() statements a lot (I was weaned on Fortran IV). But printing a generator’s content consumes that content, messing up the following code (unless you comment out and rerun, which is just nasty).

We can use tee() to build a generator debugging printer. Tee the generator, print from the first element, and return the second (untouched) element (see my previous post on generators for the definition of print_for()):
from itertools import * 

def print_debug(gen, n=10, sep=', ', end=", ...\n", enum=False):
    gen_tee = tee(gen)
    if enum: # add a count to the items
        print(*list(islice(enumerate(gen_tee[0],1),n)), sep=sep, end=end)
    else:
        print(*list(islice(gen_tee[0],n)), sep=sep, end=end)
    return gen_tee[1]

counter = count(1)
print_for(counter)
print_for(counter)

counter = count(1)
counter = print_debug(counter)
print_for(counter)
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ...
11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ...
Okay, back to Unix pipelines. There’s a lovely little paper from three decades ago.
Bentley, J., Knuth, D. & McIlroy, D., 1986. Programming Pearls - A Literate Program. CACM, 29(6), pp.471–483.
Available at: https://www.cs.princeton.edu/courses/archive/spring17/cos333/knuth-mcilroy.pdf.
In that paper, the following problem is posed:
Given a text file and an integer k, print the k most common words in the file (and the number of their occurrences) in decreasing frequency.
This is a deliberately vague problem specification from Bentley. Knuth tightens up the definition, and then gives a several page (42 numbered paragraphs!) literate programming solution in Pascal, including a novel data structure, the trie.  McIlroy then critiques Knuth’s solution, and suggests a 6 stage Unix pipeline to do the same job:
(1) tr -cs A-Za-z'
    ' | 
(2) tr A-Z a-z | 
(3) sort | 
(4) uniq -c |
(5) sort -rn |
(6) sed ${1}q
accompanied by the following helpful gloss for those unfamiliar with Unix shell arcanae:
  1. Make one-word lines by transliterating the complement (-c) of the alphabet into newlines (note the quoted newline), and squeezing out (-s) multiple newlines.
  2. Transliterate upper case to lower case.
  3. Sort to bring identical words together.
  4. Replace each run of duplicate words with a single representative and include a count (-c).
  5. Sort in reverse (-r) numeric (-n) order.
  6. Pass through a stream editor; quit (q) after printing the number of lines designated by the script’s first parameter (${1}).
Using Beazley’s ideas, Python generators can give us a midway between extreme Pascal verbosity and extreme Unix terseness, as follows.

Let’s use some online text files for our test input.
textlorum = "http://www.sample-videos.com/text/Sample-text-file-10kb.txt"
textgreatexpectations = "http://www.gutenberg.org/files/1400/1400-0.txt"
The lorem ipsum file has the advantage of containing only words without hyphens (no words split over lines) or apostrophes (all words are simply strings of letters) or numbers or other weirdnesses, but there are commas and semicolons and full stops (periods). The Gutenberg Great Expectations has more clutter, but since this isn’t an exercise in parsing text, I will take the simpler properties for granted.

Let’s open the file, and print out the first few lines, to see what we have. (The raw input is binary, so we used decode() to convert to strings; again to do all this properly is a bit more work, but not the point of this discussion.)
import urllib.request

infile = urllib.request.urlopen(textlorum)
lines = (line.decode() for line in infile)

# have a peek at the first 3 lines, numbered
lines = print_debug(lines,n=3, sep='\n', end='\n', enum=True) 
(1, 'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus condimentum sagittis
lacus, laoreet luctus ligula laoreet ut. Vestibulum ullamcorper accumsan velit vel 
vehicula. Proin tempor lacus arcu. Nunc at elit condimentum, semper nisi et, condimentum 
mi. In venenatis blandit nibh at sollicitudin. Vestibulum dapibus mauris at orci maximus 
pellentesque. Nullam id elementum ipsum. Suspendisse cursus lobortis viverra. Proin et 
erat at mauris tincidunt porttitor vitae ac dui.\r\n')
(2, '\r\n')
(3, 'Donec vulputate lorem tortor, nec fermentum nibh bibendum vel. Lorem ipsum dolor sit 
amet, consectetur adipiscing elit. Praesent dictum luctus massa, non euismod lacus. 
Pellentesque condimentum dolor est, ut dapibus lectus luctus ac. Ut sagittis commodo arcu.
Integer nisi nulla, facilisis sit amet nulla quis, eleifend suscipit purus. Class aptent 
taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. 
Aliquam euismod ultrices lorem, sit amet imperdiet est tincidunt vel. Phasellus dictum 
justo sit amet ligula varius aliquet auctor et metus. Fusce vitae tortor et nisi pulvinar 
vestibulum eget in risus. Donec ante ex, placerat a lorem eget, ultricies bibendum purus. 
Nam sit amet neque non ante laoreet rutrum. Nullam aliquet commodo urna, sed ullamcorper 
odio feugiat id. Mauris nisi sapien, porttitor in condimentum nec, venenatis eu urna. 
Pellentesque feugiat diam est, at rhoncus orci porttitor non.\r\n')
So each paragraph in the text file is a single line terminated with '\r\n', and paragraphs are separated by blank lines. What the first stage of the McIlroy’s Unix pipeline does is output single word lines, by turning everything that is not a letter (here, only '.,;' and newlines) into newlines, and squashing multiple newlines into single ones.

We can do the same by making use of the Python re (regular expression) module. We build a generator that takes in a line of data, and yields the words (defined using a regular expression as any sequence of upper and lower case letter) one at a time, translated to lower case (using lower()). We then embed that in a second generator, to iterate over all the lines of data.
import re
words = (w.group(0).lower() for line in lines for w in re.finditer(r"[a-zA-Z]+", line) )

# have a peek at the first 100 words
words = print_debug(words,100) 
lorem, ipsum, dolor, sit, amet, consectetur, adipiscing, elit, vivamus, condimentum, 
sagittis, lacus, laoreet, luctus, ligula, laoreet, ut, vestibulum, ullamcorper, accumsan,
velit, vel, vehicula, proin, tempor, lacus, arcu, nunc, at, elit, condimentum, semper,
nisi, et, condimentum, mi, in, venenatis, blandit, nibh, at, sollicitudin, vestibulum,
dapibus, mauris, at, orci, maximus, pellentesque, nullam, id, elementum, ipsum,
suspendisse, cursus, lobortis, viverra, proin, et, erat, at, mauris, tincidunt,
porttitor, vitae, ac, dui, donec, vulputate, lorem, tortor, nec, fermentum, nibh,
bibendum, vel, lorem, ipsum, dolor, sit, amet, consectetur, adipiscing, elit, praesent,
dictum, luctus, massa, non, euismod, lacus, pellentesque, condimentum, dolor, est, ut,
dapibus, lectus, luctus, ac, ...
So this is generating the words, ignoring newlines and blank lines properly. We have now done the first two lines of the Unix pipe, with three lines of (in the last case admittedly rather complex!) Python.

Next in McIlroy’s pipeline is the sort. This is not a generator friendly operation, as it needs the entire file of words in memory to be sorted. But we can do the job of the following line, uniq -c, without having to sort first, by building a dictionary of the occurrences. We still need to read the entire file to do this, but because we iterate over the word generator, we can do this incrementally. So we don’t need to store the entire file in memory, which becomes important if the file is large.

We can use a default dictionary to save us needing special code when a new word is discovered.
from collections import defaultdict
def build_word_dict(word_gen):
    # construct and return a dictionary of (word,count) items
    wdict = defaultdict(int) 
    for word in word_gen:
        wdict[word] += 1
    return wdict

word_dict = build_word_dict(words)
Let's have a look at some properties of the constructed dictionary:
# print words with more than 15 occurrences
print({k:v for k,v in word_dict.items() if v>15}) 
print('total number of words =', sum([v for k,v in word_dict.items()])) 
print('number of unique words =', len(word_dict))
{'sit': 19, 'sed': 25, 'ut': 18, 'tincidunt': 19, 'metus': 16, 'nulla': 21, 'et': 30, 
'amet': 19, 'at': 19, 'ipsum': 21, 'non': 21, 'lorem': 21, 'in': 25, 'pellentesque': 17}
total number of words = 1368
number of unique words = 175
Opening up the input file in Chrome and searching for pellentesque gives 17 hits, so that’s looking good.

The dictionary (175 items) is much smaller than the entire file (1368 words), drastically saving on memory. We can now sort on the dictionary values to get a list of words in frequency order, then slice off the top k of them.
k = 10  # the "integer k" in the spec; k most common words
top_word_keys = sorted(word_dict, key=word_dict.get, reverse=True)
for key in islice(top_word_keys,k):
    print(word_dict[key], key)
30 et
25 in
25 sed
21 non
21 lorem
21 nulla
21 ipsum
19 sit
19 tincidunt
19 at
And we’re done! Putting it all together, and running on Great Expectations:
from itertools import *
import urllib.request
import re
from collections import defaultdict

textsource = "http://www.sample-videos.com/text/Sample-text-file-10kb.txt"
textgreatexpectations = "http://www.gutenberg.org/files/1400/1400-0.txt"
k = 20  # the "integer k" in the spec; k most common words

def build_word_dict(word_gen):
    # construct and return a dictionary of (word,count) items
    wdict = defaultdict(int) 
    for word in word_gen:
        wdict[word] += 1
    return wdict

infile = urllib.request.urlopen(textgreatexpectations)
lines = (line.decode() for line in infile)
words = (w.group(0).lower() for line in lines for w in re.finditer(r"[a-zA-Z]+", line) )

word_dict = build_word_dict(words)
top_word_keys = sorted(word_dict, key=word_dict.get, reverse=True)
for key in islice(top_word_keys,k):
    print(word_dict[key], key)
8321 the
7166 and
6666 i
5235 to
4557 of
4112 a
3090 that
3083 in
2837 was
2811 it
2382 you
2263 he
2093 had
2070 my
1998 me
1860 his
1807 with
1786 as
1654 at
1432 on
So that’s the problem solved, using generators in many places. It’s not as compact as the Unix shell script, but it’s still quite remarkably compact. And it improves the memory usage by not requiring the whole file to be read in to memory at once in order to sort the individual words:
print('total number of words =', sum([v for k,v in word_dict.items()])) 
print('number of unique words =', len(word_dict))
total number of words = 192016
number of unique words = 10989
Although that’s the problem as stated solved, it might not be what you are expecting. All the most common words are … common words. Any large text will give similar results. We can remove the ‘uninteresting’ common words – called stop words – from the dictionary first:
# get full stop word list from https://tinyurl.com/yce6ttyc
stop_words = [ "a", "about", "above", "after", "again", "against", "all", "am", ... ];

word_dict_stopped = { k:v for k,v in word_dict.items() if k not in stop_words }

print('number of stop words =', len(stop_words))
print({k:v for k,v in word_dict_stopped.items() if v>600}) 
print('total number of words after stop =', sum([v for k,v in word_dict_stopped.items()]))
print('number of unique words after stop =', len(word_dict_stopped))
number of stop words = 153
{'said': 1349, 't': 682, 'no': 655, 'joe': 747, 'not': 1088, 's': 1194, 'mr': 711}
total number of words after stop = 89529
number of unique words after stop = 10870
We can then look at the most frequent words after the stop words have been removed:
top_word_keys = sorted(word_dict_stopped, key=word_dict.get, reverse=True)
for key in islice(top_word_keys,k):
    print(word_dict[key], key)
1349 said
1194 s
1088 not
747 joe
711 mr
682 t
655 no
514 one
453 now
392 know
383 miss
375 come
374 time
371 little
368 upon
341 pip
327 like
325 looked
321 man
318 havisham
That is a bit more interesting: we can see pip and havisham, which are special for this particular text. Also, there is clearly a lot of conversation: said is the most common non-stop word.

We also see a couple of rather strange ‘words’: s and t. This is because the original assumptions of no apostrophes (and also of no hyphens) doesn’t actually hold for the Great Expectations file, so a more sophisticated definition of a ‘word’ is needed. But that’s another story for another time.



Wednesday 22 November 2017

view from a hotel window

I’m in Bristol, attending a “community consultation workshop” on the Review of Complexity Science as an EPSRC Research Area.  I travelled down by train last night, and walked to the hotel.  Google maps wanted me to cut across some weird footbridge over a river, and wander through back streets.  Since it was 10 pm, pitch dark, and very windy, I thanked it kindly, and walked straight up the well-lit main road.

This is the lovely autumnal view that greeted me this morning:


Now off to talk about complexity science, complex systems, systems thinking, and cross-disciplinarity.

Tuesday 21 November 2017

Fun with Python generator expressions

In the past, I have rhapsodised over Python’s list comprehensions. I had also heard of its generators, but never looked at them seriously. Recently I have been thinking about a problem in stream programming, where I will need the generators’ lazy evaluation. So I have been looking at them in more detail. And discovered that (in Python 3 at least) the list comprehension is just syntactic sugar for a generator expression output turned into a list. That is: [ <expr> for i in <iter> if <cond> ] is just syntactic sugar for list( <expr> for i in <iter> if <cond> ). Let’s look at this in more detail.

With a list comprehension, I can construct, say, a list of the first 10 square numbers.
squares = [n*n for n in range(1,11)]
print(squares)
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
Since this is just syntactic sugar, it is the same as writing:
squares = list(n*n for n in range(1,11))
print(squares)
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
So far, so trivial. But the advantage of using the generator form is that it lazily iterates (it "generates") its items one at a time, as they are asked for. If you do not ask, it does not produce. This is useful if you don’t know how many items you want, so cannot specify the stop value of the range(). A generator can be unbounded.
from itertools import *
squares = (n*n for n in count(1))
for s in squares:
    print(s, end=", ")
    if s >= 100:
        print('...')
        break
1, 4, 9, 16, 25, 36, 49, 64, 81, 100, ...
Here count() imported from itertools is a iterable like range() except that it does not have a stop value. It is a generator that just keeps incrementing until you stop asking it for more values.

The module itertools has many such useful functions you can use with potentially infinite generators. For example, we can islice() off the first few items of a potentially unbounded generator, to give a bounded generator that returns only those items. Here we slice off the first 10 values:
squares = (n*n for n in count(1))
firstfew = islice(squares,10)
print(list(firstfew))
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
This is fine if we know how many items we want. Sometimes instead we want all the items up to a particular value. We can use takewhile() for this, here to get the squares less than 150.
squares = (n*n for n in count(1))
firstfew = takewhile(lambda n: n<=150, squares)
print(list(firstfew))
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144]
Let’s use islice() and takewhile to define a couple of "pretty print" functions:
# print the first n items of the generator, comma separated
def print_for(gen,n=10):
    print(*list(islice(gen,n)), sep=', ', end=", ...\n")

# print the generator up to value nmax, comma separated
def print_while(gen,nmax=250):
    print(*list(takewhile(lambda n: n<=nmax, gen)), sep=', ', end=", ...\n")

print_for(n*n for n in count(1))
print_while(n*n for n in count(1))
1, 4, 9, 16, 25, 36, 49, 64, 81, 100, ...
1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, ...
squares = (n*n for n in count(1))
print_for(squares)

squares = (n*n for n in count(1))
print_while(squares)
1, 4, 9, 16, 25, 36, 49, 64, 81, 100, ...
1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, ...
There are analogous functions dropwhile(), which drops items until its predicate is True, and filterfalse(), which drops all items for which its predicate is false.
print_for(dropwhile(lambda n: n<=5, count(1)))  # drop items until they get to 5
print_for(filterfalse(lambda n: n%3, count(1))) # filter out items not divisible by 3
6, 7, 8, 9, 10, 11, 12, 13, 14, 15, ...
3, 6, 9, 12, 15, 18, 21, 24, 27, 30, ...

Restarting generators

I have been quite careful above to keep redefining squares. This is because once an item is consumed, it is gone. If I take a generator instance, like squares, and slice off the first ten item, then slice again, I get the next ten items. For example:
squares = (n*n for n in count(1))

print_for(squares)
print_for(squares)  # continues from next value of same generator

print_for(n*n for n in count(1))
print_for(n*n for n in count(1)) # restarts, with new generator
1, 4, 9, 16, 25, 36, 49, 64, 81, 100, ...
121, 144, 169, 196, 225, 256, 289, 324, 361, 400, ...
1, 4, 9, 16, 25, 36, 49, 64, 81, 100, ...
1, 4, 9, 16, 25, 36, 49, 64, 81, 100, ...
You do need to ensure you consume the generated items for this to occur. Consider:
squares = (n*n for n in count(1))
# slice off the first 10?
islice(squares, 10)
print_for(squares)  # maybe not what is expected
1, 4, 9, 16, 25, 36, 49, 64, 81, 100, ...
The islice() produced a generator. But since nothing accessed that generator, nothing actually got sliced off squares. Compare:
squares = (n*n for n in count(1))
# slice off and consume the first 10
list(islice(squares, 10))
print_for(squares)
121, 144, 169, 196, 225, 256, 289, 324, 361, 400, ...

tee() for two

Once it’s gone, it’s gone. But what if you want it back? You could always iterate again. But what if the items are expensive to compute? For example, you may be reading a large file, and don’t want to read it all again. Here tee() is useful for "remembering" earlier items. (Note: this doesn’t make +++n+++ copies of an iterator, rather it gives +++n+++ pointers into a single iterator.)
sq_ptr = tee((n*n for n in count(1)))

print_for(sq_ptr[0])
print_for(sq_ptr[0],5)
print_for(sq_ptr[1])
print_for(sq_ptr[1])
print_for(sq_ptr[1],5)
print_for(sq_ptr[0])
1, 4, 9, 16, 25, 36, 49, 64, 81, 100, ...
121, 144, 169, 196, 225, ...
1, 4, 9, 16, 25, 36, 49, 64, 81, 100, ...
121, 144, 169, 196, 225, 256, 289, 324, 361, 400, ...
441, 484, 529, 576, 625, ...
256, 289, 324, 361, 400, 441, 484, 529, 576, 625, ...
One application is building a "sliding window" over the iterated data. Here is an example for a window of size 3:
def triples(iterable):
    a, b, c = tee(iterable,3)
    next(b)
    list(islice(c,2))
    return zip(a, b, c)

print_for(triples(count(1)))
print_for(triples('abcdefghijklmnopqrstuvwxyz'))
(1, 2, 3), (2, 3, 4), (3, 4, 5), (4, 5, 6), (5, 6, 7), (6, 7, 8), (7, 8, 9), (8, 9, 10),
(9, 10, 11), (10, 11, 12), ...
('a', 'b', 'c'), ('b', 'c', 'd'), ('c', 'd', 'e'), ('d', 'e', 'f'), ('e', 'f', 'g'), 
('f', 'g', 'h'), ('g', 'h', 'i'), ('h', 'i', 'j'), ('i', 'j', 'k'), ('j', 'k', 'l'), ...
and here it is for a user-defined window size:
def sliding_window(iterable,n=2):
    windows = tee(iterable,n)
    for i in range(n):
        list(islice(windows[i],i))
    return zip(*windows)

print_for(sliding_window(count(1),4))
print_for(sliding_window('abcdefghijklmnopqrstuvwxyz',3))
(1, 2, 3, 4), (2, 3, 4, 5), (3, 4, 5, 6), (4, 5, 6, 7), (5, 6, 7, 8), (6, 7, 8, 9), 
(7, 8, 9, 10), (8, 9, 10, 11), (9, 10, 11, 12), (10, 11, 12, 13), ...
('a', 'b', 'c'), ('b', 'c', 'd'), ('c', 'd', 'e'), ('d', 'e', 'f'), ('e', 'f', 'g'), 
('f', 'g', 'h'), ('g', 'h', 'i'), ('h', 'i', 'j'), ('i', 'j', 'k'), ('j', 'k', 'l'), ...

Sums and products

The itertools module has a function accumulate() that takes an iterator, and returns a generator that comprises the sum of the values up to that point. We could use this to implement a simple version of count() "the hard way", by using repeat(), which simply repreats its argument endlessly:
print_for(repeat(3))
print_for(accumulate(repeat(1)))
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ...
The triangle numbers are 1, 1+2, 1+2+3, ... We can use count() and accumulate() to generate these:
print_for(accumulate(count(1)))
1, 3, 6, 10, 15, 21, 28, 36, 45, 55, ...
accumulate() defaults to summing the items, but other functions can be used.
import operator
factorial = accumulate(count(1), operator.mul)
print_for(factorial)
1, 2, 6, 24, 120, 720, 5040, 40320, 362880, 3628800, ...

Arithmetic series and products

We can use accumulate() to calculate infinite sums and products (or at least, give us the first howevermany terms).

sum of reciprocal squares

$$ \sum_{n=1}^\infty \frac{1}{n^2} = \frac{\pi^2}{6} $$
import math

recip_squares = (1/(n*n) for n in count(1))
print_for(accumulate(recip_squares))

# or all in one go, now with 30 terms:
print_for(accumulate(1/(n*n) for n in count(1)),30)

# it converges rather slowly
print('limit =', math.pi*math.pi/6)
1.0, 1.25, 1.3611111111111112, 1.4236111111111112, 1.4636111111111112, 1.4913888888888889,
1.511797052154195, 1.527422052154195, 1.5397677311665408, 1.5497677311665408, ...
1.0, 1.25, 1.3611111111111112, 1.4236111111111112, 1.4636111111111112, 1.4913888888888889,
1.511797052154195, 1.527422052154195, 1.5397677311665408, 1.5497677311665408, 
1.558032193976458, 1.5649766384209025, 1.5708937981842162, 1.5759958390005426, 
1.580440283444987, 1.584346533444987, 1.587806741057444, 1.5908931608105303, 
1.5936632439130234, 1.5961632439130233, 1.5984308176091684, 1.6004969333116477, 
1.6023872924798896, 1.6041234035910008, 1.6057234035910009, 1.6072026935318293, 
1.6085744356443121, 1.6098499458483937, 1.6110390064904865, 1.6121501176015975, ...
limit = 1.6449340668482264

sum of reciprocal powers of 2

$$ \sum_{n=1}^\infty \frac{1}{2^n} = 1 $$
print_for(accumulate(1/(2**n) for n in count(1)),20)
0.5, 0.75, 0.875, 0.9375, 0.96875, 0.984375, 0.9921875, 0.99609375, 0.998046875, 
0.9990234375, 0.99951171875, 0.999755859375, 0.9998779296875, 0.99993896484375, 
0.999969482421875, 0.9999847412109375, 0.9999923706054688, 0.9999961853027344, 
0.9999980926513672, 0.9999990463256836, ...

factorial

$$ \prod_{i=1}^n i = n! $$
print_for(accumulate((n for n in count(1)), operator.mul))
1, 2, 6, 24, 120, 720, 5040, 40320, 362880, 3628800, ...

a product for +++\pi+++

$$ \prod_{n=1}^\infty \left( \frac{4n^2}{4n^2-1} \right) = \frac{\pi}{2} $$
print_for(accumulate((1/(1-1/(4*n**2)) for n in count(1)), operator.mul),20)

# it also converges rather slowly
print('limit =', math.pi/2)
1.3333333333333333, 1.422222222222222, 1.4628571428571429, 1.4860770975056687, 
1.5010879772784533, 1.51158509600068, 1.5193368144417092, 1.525294998027755, 
1.5300172735634447, 1.533851903321749, 1.5370275801402207, 1.539700671583943, 
1.5419817096159192, 1.5439510349155563, 1.5456684442981097, 1.5471793616434646, 
1.5485189108743247, 1.549714678373069, 1.5507886317191353, 1.5517584807696163, ...
limit = 1.5707963267948966
This is all very neat and nifty, and we have barely scratched the surface of what can be done. But that's probably (more than) enough for now.


Saturday 18 November 2017

jumping Jehosaphat!

I for one welcome our new jumping robot overlords.




For all my social networking posts, see my Google+ page

Monday 13 November 2017

Inspired by Nature

Hard copies of our new book have just arrived!  Now to send them off to each of the chapter authors (and one to Julian Miller, of course!).

Happy Birthday, Julian!

Here are the bibliographic details, and the preface, of this Festschrift:

Susan Stepney, Andrew Adamatzky (eds).  Inspired by Nature: Essays Presented to Julian F. Miller on the Occasion of his 60th Birthday, Springer, 2018
This book is a tribute to Julian Francis Miller’s breadth of ideas and achievements in computer science, evolutionary algorithms and genetic programming, electronics, unconventional computing, artificial chemistry, and theoretical biology. Well-known for both Cartesian Genetic Programming and evolution in materio, Julian has further interests from quantum computing to artificial chemistries. He has over 200 refereed publications; here, we highlight just a few of his major accomplishments.

Julian started his life in science as mathematical physicist working on the interaction of solitons in various nonlinear partial differential equations such as the sine-Gordon equation, and the modified Korteweg-de Vries equation. He entered classical computer science with his paper on synthesis and optimisation of networks implemented with universal logic modules. Julian’s interest in optimisation led him to genetic algorithms, which he employed for optimisation of field-programmable arrays, Reed-Muller logical functions, finite-state machines, and evolving combinatorial logic circuits and non-uniform cellular automata.

Julian combined his interests in physics and computer science in work on constant complexity algorithm for solving Boolean satisfiability problems on quantum computers, and quantum algorithm for finding multiple matches. Julian’s ideas in optimisation of circuits and quantum computing are reflected in Younes’ Chapter “Using Reed-Muller Expansions in the Synthesis and Optimization of Boolean Quantum Circuits”.

Julian’s interest in combining natural processes and computation expanded from physics to include the exciting world of biological processes, such as evolution and morphogenesis. He used principles of morphogenesis to evolve computing circuits and programs. These aspects of Julian’s work are reflected in Chapters “Evolvable Hardware Challenges: Past, Present and the Path to a Promising Future” by Haddow and Tyrell, “Artificial Development” by Kuyucu et al., and Banzhaf’s “Some Remarks on Code Evolution with Genetic Programming”.

In 2000, Julian, together with Peter Thomson, presented a fully developed concept of Cartesian Genetic Programming (CGP). There, a program is genetically represented as a directed graph, including automatically defined functions and self-modifying operators. This approach has become very popular, because it allows the discovery of efficient solutions across a wide range of mathematical problems and algorithms. Several chapters of the book manifest the success of CGP in diverse application areas: “Designing Digital Systems Using Cartesian Genetic Programming and VHDL” by Henson et al.; “Breaking the Stereotypical Dogma of Artificial Neural Networks with Cartesian Genetic Programming” by Khan and Ahmad; “Approximate Computing: An Old Job for Cartesian Genetic Programming?” by Sekanina; “Medical Applications of Cartesian Genetic Programming” by Smith and Lones; “Multi-step Ahead Forecasting Using Cartesian Genetic Programming” by Dzalbs and Kalganova; “Cartesian Genetic Programming for Control Engineering” by Clarke; “Bridging the Gap Between Evolvable Hardware and Industry Using Cartesian Genetic Programming” by Vasicek; “Combining Local and Global Search: A Multi-objective Evolutionary Algorithm for Cartesian Genetic Programming” by Kaufmann and Platzner.

In 2001, Miller and Hartman published “Untidy evolution: Evolving messy gates for fault tolerance”. Their ideas of exploiting of “messiness” to achieve “optimality”—“natural evolution is, par excellence, an algorithm that exploits the physical properties of materials”—gave birth to a new field of unconventional computing: evolution in materio. The evolution in materio approach has proved very successful in discovering logical circuits in liquid crystals, disordered ensembles of carbon nanotubes (Chapter “Evolution in Nanomaterio: The NASCENCE Project” by Broersma), slime mould (Chapter “Discovering Boolean Gates in Slime Mould” by Harding et al.), living plants (Chapter “Computers from Plants We Never Made: Speculations” by Adamatzky et al.), and reaction-diffusion chemical systems (“Chemical Computing Through Simulated Evolution” by Bull et al.).

Julian’s inspiration from nature has not neglected the realm of chemistry: he has exploited chemical ideas in the development of a novel form of artificial chemistry, used to explore emergent complexity. Chapter “Sub-Symbolic Artificial Chemistries” by Faulkner et al. formalises this approach.

The book will be a pleasure to explore for readers from all walks of life, from undergraduate students to university professors, from mathematicians, computers scientists, and engineers to chemists and biologists.


Sunday 12 November 2017

film review: Blade Runner 2049

The IMDB plot summary for the original 1982 Blade Runner starts: “In the futuristic year of 2019, Los Angeles has become a dark and depressing metropolis, filled with urban decay.” It is now thirty years later: the the entire planet has undergone ecological collapse, and is a wasteland populated with Wallace’s synthetic protein farms to feed the millions living in the cities. (Are there also carb, fibre, and vitamin farms?) The urban decay has also worsened, post the mysterious Blackout that wiped or corrupted most digital data, and is filled with those who cannot afford to migrate to a better life off-world. Despite the bankruptcy of the Tyrell corporation after the previous trouble with replicants, Wallace [Jared Leto] has started up a new line of more obedient replicants. Police agent KD6-3.7 [Ryan Gosling] is one of these new replicants, a Blade Runner hunting down the earlier, dangerous models. His latest mission finds an old replicant living on a protein farm, but what he discovers there leads to an explosive revelation that could destroy humanity. He needs to find Deckard [Harrison Ford] to solve the mystery.

K walks away from shot The new film is long (2hr 44mins) and has a rather dream-like quality. There are a few action scenes, but this is mostly K uncovering dark secrets, moving slowly through huge bleak strangely-lit exteriors, or densely-detailed strangely-lit interiors. This is surprisingly effective at generating a mood of quiet desperation. Gosling plays the part of a seemingly unemotional and obedient replicant very well: without cracking a facial expression, he nevertheless manages to convey the loneliness and misery of being a despised un-person. And Wallace, as both original saviour and now potential destroyer of humanity, is an interesting villain.

Joshi and Luv Does it pass the Bechdel test? Well, K’s boss police Lieutenant Joshi [Robin Wright] and Wallace’s spooky replicant enforcer Luv [Sylvia Hoeks] do talk about recent events at the forensics lab, but that’s the only brief scene I can recall; the short conversation between prostitute Mariette [Mackenzie Davis] and sex hologram Joi [Ana de Armas] is about K. And, of course, a major female character is “refrigerated” before the film even starts.

While watching, the film is very engaging, pulling you along with the plot, despite its slow unfolding. It also manages a great plot twist that subverts a lot of the initial setup. It is only afterwards that some of the more ridiculous elements become clear. Why is Wallace blind? He owns replicant technology; surely he can make himself some eyes? The Blackout doesn’t seem to have been a very drastic event integrated into the past history, but only there to make tracking down details of earlier replicants more difficult. The deeper systemic effect of a digital purge is handled more effectively in Dark Angel. Given replicants are so dangerous and difficult to spot, why have their only identifying mark a serial number in one eye? Why not just give them, say, a bright blue skin, and tattoo their serial number across their foreheads? The one scene that did jar at the time was the comment that “no two humans have the same DNA”; I guess they’ve never hear of identical twins? Yet we were supposed to think twins, despite them being a boy and a girl.

car in dark city
However, for all these niggles (which occur in essentially any SF), the good more than outweighs the bad, and I enjoyed the film. With the evocation of a decaying future, and the underlying questions about what makes someone human, this makes a good sequel to the original masterpiece.




For all my film reviews, see my main website.

Saturday 11 November 2017

sequestering carbon, several books at a time LXXVII

The latest batch (although there are some covert deliveries of books being smuggled into hidden parts of the house, in preparation for slightly-post-solstice celebrations):




Thursday 9 November 2017

why not use trees?

BBC news headline: Plant captures CO2 out of the air.  Well, yes, that’s what they do!

Oh, okay.  So it’s not quite as snappy as the apocryphal newspaper headline Shell found on beach, but it requires the same sort of anti-default processing to understand.




For all my social networking posts, see my Google+ page

Wednesday 8 November 2017

pendulous apples

Best use of “golden delicious” apples ever.
High school physics teacher shows his awesome home made marble tracks




For all my social networking posts, see my Google+ page

Tuesday 7 November 2017

predictive biography

Apparently, typing “I was born” into a predictive text system, and then letting it write the rest of your life is a thing.  So I tried it in Gmail on my phone:
I was born in the future of Dr Who changed so much more like a football match and the fist to three months in a bubble of people needed to maintain knowledge ...
It starts off very plausibly, and the Dr Who bit I can understand.  The football match, less so.




For all my social networking posts, see my Google+ page

Wednesday 1 November 2017

Weinberg anecdotes

Gerald Weinberg on Project Mercury: We had to create an entire sub-project to locate Australia.

Ah, those were the days!




For all my social networking posts, see my Google+ page