I have great difficulty identifying tunes, even if I know them well. I have no way of asking for help, since all notes I sing are the same, and flat, even though I can hear the tune perfectly well inside my head.
So, many years ago I was excited to hear of a book that would help: it lists all tunes in order of whether their notes go up, down, or stay the same. I didn't know the author or title, but in 1990 I found copy --- it's "The Directory of Tunes" by Denys Parsons --- whilst browsing bookshelves. I excitedly bought it, thinking all my troubles were over (provided that the tune had been written before 1975, the book's publication date, of course).
How does it work? Well, consider a well known song, like "Do You Know the Way to San Jose".
Write a star for the first note: "*". Then, since the next note goes down, write a D: "*D". The third note goes up, so that's a U: "*DU". Keep going: "*DUUUD", etc. Then look up the resulting string in the book:
Perfect! Problem solved.
Except for one tiny thing. I can hear the tune in my head. But I can't tell if the notes go up, down, or stay the same (unless they change a
lot). So I can't construct the string! The book has sat, unused, on my shelf for the last 20 years...
Last week I was round at some friends, and talk moved to discussing the Web, as it does. One of them was saying how much easier it was now that so many things were available to be looked up. I mentioned an anecdote told by Bertrand Meyer in 1999, about using the web
to identify an opera he was listening to, and said that would only help if you have lyrics (and it would help
me only if they were in English, and if I didn't fall foul of a
Mondegreen).
The daughter of the house looked at me pityingly, and told me about Shazam. You point your phone in the direction of the music and "tag" it; the app records for about 10 seconds, sends it off, and the answer comes back. Perfect!
So, naturally, I
downloaded the app then and there (well, after having a brisk discussion about whether it was Superman or
Captain Marvel who said
Shazam!), and we all spent the next few minutes playing random bits of music at it, and seeing what it could identify (despite the background noise of an excited budgie). At first I wondered if it was using a similar system to the book, but quickly realised it had to be quite different: you don't have to start at the beginning of the tune, and it not only tells you the song title, but also the artist -- and in the case of a piece of Mozart, the orchestra and conductor. So it must be matching against the actual recording. How does it work?
I found an article in the August 2006 issue of
CACM which gives a brief explanation -- more technical detail can be found following the links from the
wikipedia article. Essentially it looks for spectrogram peaks, takes adjacent pairs of these, does some hashing to increase the entropy, and matches the results against the music database. Many of these peaks are just noise, and so don't match. But enough do, with the added constraint that different pairs have to match at the right time intervals, to get a high quality matching system. So, a combination of a really clever algorithm and a massive database give a fantastic ability to match tunes.
But it's matching, not "recognition" as such. So it doesn't work with live music, including amateur singing (and I don't consider the noises I make to be singing as such). It's not quite the perfect system. But it's still mind-bogglingly amazingly useful.