Xylophone gag project

From enfascination

Jump to: navigation, search

I wrote a small program to help you blow people up on the piano. This project was inspired by the classic cartoon "Xylophone Gag"

Youtube broadcast, starting at 5:45, until 6:51
http://www.youtube.com/watch?v=YnJl6qAYdLs
(if this specific video gets pulled, search for any of the cartoon titles identifying the gag in this article)

The goal was to apply machine learning techniques to digital audio to identify a player's mistakes playing a known melody. This is a course project[1] in only about 500 lines of code, so there are severe restrictions on its general utility. It works on monophonic, single instrument melodies, and works better if the melody is short and played on a wind instrument. It does not identify timing errors and wasn't tested on skipped notes, only wrong notes.

To keep close roots to the gag, I only benchmarked the code on the classic tune for the Irish poem, "Believe me, if all those endearing young charms."[2] However, I do demonstrate the robustness of the code to identify mistakes across the melody, as played in different keys and across a few instruments. For my simple test examples, and given the limits of the HMM that I built the code on, this system avoids both false positives and negatives (thinking there is a mistake when there isn't and thinking there is not a mistake when there is).

I modified a simple HMM implementation[3][4] written in R to have rudimentary score following. I then introduced an obnoxious screech into the audio to indicate identification of each mistake.

Contents

the tune

sheet music

[5]

midi representation

Midi notes key.GIF
[6]

code

the code

xylophone gag project code [7]

modifications

I added a few things to the HMM implementation here [8]

  • I changed the highest and lowest "possible" notes, because my samples covered a range of keys, and tended to be higher.
  • i recorded on a bunch of instruments (the vanilla HMM was tested almost entirely on bass oboe data). Instruments include guitar, penny whistle, recorder and the Bugs Bunny and Yosemite Sam audio from the Youtube video above.
  • I made a near note bias, editing the transition table to make it more likely that you will go from any note to a note within 6 half-steps of the current note.
  • I shortened the minimum number of frames per note to four, to catch the shorter notes in the sample.
  • I created another window length, the minimum number of frames that a note must be sustained to be considered an error. The error detection is only as good as the HMM it is built on. Noisy attacks, like on a piano, are very hard to identify correctly. To keep from labeling these identification errors as player mistakes, I made a minimum size of eight frames, below which a pitch cannot be labeled as an error. In my code, one frame is about two hundredths of a second.
  • I added simple score following and an "error state"
    • The error state allowed me to catch two kinds of mistakes. Sometimes a player realized they made a mistake and plays the right note after playing the wrong note. Sometimes they don't realize, or they accept the mistake and continue on to the next note, skip the correct note entirely. These two possibilities make it hard to follow a score, because you can't know which note a mistake mistook. My solution in the code works for many kinds of mistakes on this tune, but certainly doesn't generalize.
    • The score following takes a simple midi representation of the tune, identifies the key of the audio and follows the tune as it is being played. It can handle silence cleanly. If a note occurs that should not have, for long enough to not be a bad attack or a squeaky note onset, the system enters an error state, looking for a return either to the current note, the next note, or the one after that.
    • Best details on actual code modifications are in the well-commented code above.

examples

caught mistakes

canonical mistake (on penny whistle)

  1. Original recording
  2. Recognized representation
  3. Error-marked representation

canonical mistake with detection error

  1. Original recording
  2. Recognized representation
  3. Error-marked representation

other mistake (on recorder)

  1. Original recording
  2. Recognized representation
  3. Error-marked representation

  1. Original recording
  2. Recognized representation
  3. Error-marked representation

FinalBugsBunny01Converted.tiff FinalBugsBunny01Converted.wav FinalBugsBunny01MarkedUp.wav FinalBugsBunny01Original.wav FinalBugsBunny02Converted.tiff FinalBugsBunny02Converted.wav FinalBugsBunny02MarkedUp.wav FinalBugsBunny02Original.wav


FinalNoErrors01Converted.tiff FinalNoErrors01Converted.wav FinalNoErrors01MarkedUp.wav FinalNoErrors01Original.wav FinalNoErrors02Converted.tiff FinalNoErrors02Converted.wav FinalNoErrors02MarkedUp.wav FinalNoErrors02Original.wav FinalNoErrors03Converted.wav FinalNoErrors03MarkedUp.tiff FinalNoErrors03MarkedUp.wav FinalNoErrors03Original.wav FinalYosemiteConverted.tiff FinalYosemiteConverted.wav FinalYosemiteMarkedUp.wav FinalYosemiteOriginal.wav

good versions (no mistakes played and no mistakes found)

bad identifications