Xylophone gag project
From enfascination
I wrote a small program to help you blow people up on the piano. This project was inspired by the classic cartoon "Xylophone Gag"
Youtube broadcast, starting at 5:45, until 6:51http://www.youtube.com/watch?v=YnJl6qAYdLs(if this specific video gets pulled, search for any of the cartoon titles identifying the gag in this article)
The goal was to apply machine learning techniques to digital audio to identify a player's mistakes playing a known melody. This is a course project[3] in only about 500 lines of code, so there are severe restrictions on its general utility. It works on monophonic, single instrument melodies, and works better if the melody is short and played on a wind instrument. It does not identify timing errors and wasn't tested on skipped notes, only wrong notes. To keep close roots to the gag, I only benchmarked the code on the classic tune for the Irish poem, "Believe me, if all those endearing young charms."[4]
However, I do demonstrate the robustness of the code to identify mistakes across the melody, as played in different keys, at different tempos, and across a few instruments. For my simple test examples, and given the limits of the HMM that I built the code on, this system avoids both false positives and negatives (thinking there is a mistake when there isn't and thinking there is not a mistake when there is).
I modified a simple HMM implementation[5][6] written in R to have rudimentary score following. I then introduced an obnoxious screech into the audio to indicate identification of each mistake.
| Contents | 
the tune
 
  Here is my representation of the melody in R, from the score, using the cheat sheet. I didn't actually use the timing info:
eyc_orig_notes <- c(64, 62, 60, 62, 60, 60, 64, 67, 65, 69, 72, 72) timing<- c(.25, .25, .75,.25, 0.5,0.5,0.5 , 0.5 , 0.5 , 0.5 , 0.5 , .75,)
code
the code
xylophone gag project code [7]
modifications
I added a few things to the HMM implementation here [8]
- I changed the highest and lowest "possible" notes, because my samples covered a range of keys, and tended to be higher.
- i recorded on a bunch of instruments (the vanilla HMM was tested almost entirely on bass oboe data). Instruments include guitar, penny whistle, recorder and the Bugs Bunny and Yosemite Sam audio from the Youtube video above.
- I made a near note bias, editing the transition table to make it more likely that you will go from any note to a note within 6 half-steps of the current note.
- I shortened the minimum number of frames per note to four, to catch the shorter notes in the sample.
- I created another window length, the minimum number of frames that a note must be sustained to be considered an error. The error detection is only as good as the HMM it is built on. Noisy attacks, like on a piano, are very hard to identify correctly. To keep from labeling these identification errors as player mistakes, I made a minimum size of eight frames, below which a pitch cannot be labeled as an error. In my code, one frame is about two hundredths of a second.
- I added simple score following and an "error state"
- The error state allowed me to catch two kinds of mistakes. Sometimes a player realized they made a mistake and plays the right note after playing the wrong note. Sometimes they don't realize, or they accept the mistake and continue on to the next note, skip the correct note entirely. These two possibilities make it hard to follow a score, because you can't know which note a mistake mistook. My solution in the code works for many kinds of mistakes on this tune, but certainly doesn't generalize.
- The score following takes a simple midi representation of the tune, identifies the key of the audio and follows the tune as it is being played. It can handle silence cleanly. If a note occurs that should not have, for long enough to not be a bad attack or a squeaky note onset, the system enters an error state, looking for a return either to the current note, the next note, or the one after that.
- Best details on actual code modifications are in the well-commented code above.
 
examples
In the images below, the horizontal axis is time (in frames of 2/100s of a second) and the vertical axis is recognized midi pitch (which you will see is jumpy). These are all graphs of a same tune. It rises in pitch. lines on the bottom edge of a plot represent silence. The canonical mistake, that bugs makes, is that the very last note should be C, but ends up too high or low.
caught mistakes
canonical mistake (on penny whistle)
canonical mistake with detection error
other mistake (on recorder)
| The second note is intentionally wrong. The others are detected as errors due to mis-recognition from the original audio (or maybe I played it wrong on my recorder and can't tell) | 
good versions (no mistakes played and no mistakes found)
| 
 | 
| 
 | 
| Yosemite Sam's correct, lethal, attempt at the song 
 | 
bad identifications (Bugs Bunny playing)
Deferring to the holy law of triples, Bugs Bunny attempts the tune twice before Sam gets frustrated and gets it right (wrong) the third time. My attempt at catching Bugs Bunny's mistakes failed due to the limits of the simple recognizer that I built on top of, evident in the screenshots. This may be due to recording quality, or maybe the fact that it is played on piano. I am tempted to rule that out though, because I recognized Yosemite Sam fine, and he played on the same piano.
| Guitar was problematic. Probably due to the attacks. | 









