Xylophone gag project

From enfascination

Jump to: navigation, search

I wrote a small program to help you blow people up on the piano. This project was inspired by the classic cartoon "Xylophone Gag"

Youtube broadcast, starting at 5:45, until 6:51
http://www.youtube.com/watch?v=YnJl6qAYdLs
(if this specific video gets pulled, search for any of the cartoon titles identifying the gag in this article)

The goal was to apply machine learning techniques to digital audio to identify a player's mistakes playing a known melody. This is a course project[3] in only about 500 lines of code, so there are severe restrictions on its general utility. It works on monophonic, single instrument melodies, and works better if the melody is short and played on a wind instrument. It does not identify timing errors and wasn't tested on skipped notes, only wrong notes. To keep close roots to the gag, I only benchmarked the code on the classic tune for the Irish poem, "Believe me, if all those endearing young charms."[4]

However, I do demonstrate the robustness of the code to identify mistakes across the melody, as played in different keys, at different tempos, and across a few instruments. Also different performers (thank you Nathaniel, Travis, Bugs Bunny and Yosemite Sam). For my simple test examples, and given the limits of the HMM that I built the code on, this system successfully identifies mistakes, avoiding both false positives and negatives (thinking there is a mistake when there isn't and thinking there is not a mistake when there is).

To implement this, I modified a simple HMM [5][6] written in R to add rudimentary score following. I then introduced an obnoxious screech into the audio to indicate identification of each mistake.

Contents

the tune

first page of Endearing Charms score[1]. This project uses first five measures.
midi cheat sheet[2]

Here is my representation of the melody in R, from the score, using the cheat sheet. I didn't actually use the timing info:

eyc_orig_notes <-  c(64, 62, 60, 62, 60, 60, 64, 67, 65, 69, 72, 72)
timing<-  c(.25, .25, .75,.25, 0.5,0.5,0.5 , 0.5 , 0.5 , 0.5 , 0.5 , .75,) 

code

the code

xylophone gag project code [7]

modifications

I added a few things to the HMM implementation here [8]

  • I changed the highest and lowest "possible" notes, because my samples covered a range of keys, and tended to be higher.
  • i recorded on a bunch of instruments (the vanilla HMM was tested almost entirely on bass oboe data). Instruments include guitar, penny whistle, recorder and the Bugs Bunny and Yosemite Sam audio from the Youtube video above.
  • I made a near note bias, editing the transition table to make it more likely that you will go from any note to a note within 6 half-steps of the current note.
  • I shortened the minimum number of frames per note to four, to catch the shorter notes in the sample.
  • I created another window length, the minimum number of frames that a note must be sustained to be considered an error. The error detection is only as good as the HMM it is built on. Noisy attacks, like on a piano, are very hard to identify correctly. To keep from labeling these identification errors as player mistakes, I made a minimum size of eight frames, below which a pitch cannot be labeled as an error. In my code, one frame is about two hundredths of a second.
  • I added simple score following and an "error state"
    • The error state allowed me to catch two kinds of mistakes. Sometimes a player realized they made a mistake and plays the right note after playing the wrong note. Sometimes they don't realize, or they accept the mistake and continue on to the next note, skip the correct note entirely. These two possibilities make it hard to follow a score, because you can't know which note a mistake mistook. My solution in the code works for many kinds of mistakes on this tune, but certainly doesn't generalize.
    • The score following takes a simple midi representation of the tune, identifies the key of the audio and follows the tune as it is being played. It can handle silence cleanly. If a note occurs that should not have, for long enough to not be a bad attack or a squeaky note onset, the system enters an error state, looking for a return either to the current note, the next note, or the one after that.
    • Best details on actual code modifications are in the well-commented code above.

examples

In the images below, the horizontal axis is time (in frames of 2/100s of a second) and the vertical axis is recognized midi pitch (which you will see is jumpy). These are all graphs of a same tune. It rises in pitch. lines on the bottom edge of a plot represent silence. The canonical mistake, that bugs makes, is that the very last note should be C, but ends up too high or low.

caught mistakes

canonical player error (on penny whistle)

  1. Original recording
  2. Recognized representation
  3. Error-marked representation
canonical mistake (on penny whistle)

canonical player error with detection mistake

  1. Original recording
  2. Recognized representation
  3. Error-marked representation
note detection error at time 200. Representation is flat.

other error and mistake (on recorder)

The second note is intentionally wrong. The others are detected as errors due to mis-recognition from the original audio (or maybe I played it wrong on my recorder and can't tell)

  1. Original recording
  2. Recognized representation
  3. Error-marked representation
notice wrong key (pitch shifted down 3 midi levels)

good versions (no mistakes played and no mistakes found)

successful penny whistle (best suited to Irish tunes anyway)

  1. Original recording
  2. Recognized representation
  3. Error-marked representation(notice that there are no high pitches (errors). This is the same as the recognized representation)
FinalNoErrors01Converted.jpg

  1. Original recording
  2. Recognized representation
  3. Error-marked representation(notice that there are no identified errors)
FinalNoErrors02Converted.jpg

Yosemite Sam

Yosemite Sam's correct, lethal, attempt at the song

  1. Original recording
  2. Recognized representation
  3. Error-marked representation(notice little errors on piano attack)
notice much lower key and faster play

bad identifications

Bugs Bunny playing

Deferring to the holy law of triples, Bugs Bunny attempts the tune twice before Sam gets frustrated and gets it right (wrong) the third time. My attempt at catching Bugs Bunny's mistakes failed due to the limits of the simple recognizer that I built on top of, evident in the screenshots. This may be due to recording quality, or maybe the fact that it is played on piano. I am tempted to rule that out though, because I recognized Yosemite Sam fine, and he played on the same piano.

  1. Original recording
  2. Recognized representation
  3. Error-marked representation
notice all the detected silence, and sparsity of detected notes

  1. Original recording
  2. Recognized representation
  3. Error-marked representation
FinalBugsBunny01Converted.jpg

guitar playing

Guitar was problematic. Probably due to the attacks.

  1. Original recording
  2. Recognized representation
  3. Error-marked representation
FinalGuitarConverted.jpg