Ph.D. Theses

Algorithms for Error-Tolerant Information Retrieval from Music Databases using Vocal Input

By Richard Lewis Kline
Advisor: Ephraim Glinert
June 25, 2002

We present a system for searching a database of music through input queries provided through vocal input, i.e., humming a few bars of a desired song. In order to ensure that the system performs well for the average person, a study of human humming skills was conducted to augment and extend the results of previous studies in music perception, recognition, and reproduction. We quantified the nature and frequency of errors typically introduced into vocal renditions of familiar and unfamiliar tunes, as well as the differences in performance between those with musical training and those without. The results of this study formed the basis of a series of algorithms designed to match an input query to its intended song stored in a database of music.

Algorithms developed for existing music information retrieval systems were evaluated against our collection of 172 hummed input query phrases and found to be inadequate in recognition accuracy. We created and tested more than 30 additional algorithms based in part on results obtained from our experimental study. New representations of music data such as duration contours and duration intervals were devised. An algorithm to extract tempo information from sparse and imprecise user data was developed.

Aspects of these individual efforts were eventually combined into an effective matching algorithm named SWRPD. For the 172 experimental trials, the algorithm correctly identified the intended song from a hummed input query in 68% of the trials for those with average vocal skills, and the correct song appears in the top ten reported results in 79% of the queries tested. Results for test subjects less proficient at humming were lower but still quite acceptable, at 46% and 58%, respectively. Based on our test data, the SWRPD algorithm provides in real time higher matching accuracy than any other published system.

Return to main PhD Theses page