X
Business

Audio CAPTCHAs easy to crack, according to researchers

A great story over at Ars Technica today details the efforts of the Carnegie-Mellon University team behind the reCAPTCHA service, who has turned its attention to the audio CAPTCHAs used by the visually impaired.These audio CAPTCHAs consist of a string of spoken characters, typically masked and distorted by a form of background noise.
Written by Andrew Nusca, Contributor

A great story over at Ars Technica today details the efforts of the Carnegie-Mellon University team behind the reCAPTCHA service, who has turned its attention to the audio CAPTCHAs used by the visually impaired.

These audio CAPTCHAs consist of a string of spoken characters, typically masked and distorted by a form of background noise.

The scientists looked into the security of existing audio CAPTCHAs used by Google and Digg, and found that they are relatively easy to crack. Ars Technica's John Timmer describes the process in detail:

The work involved gathering 1,000 audio CAPTCHAs from Google, Digg, and the reCAPTCHA service. 900 of these were used as a training set and the remaining 100 were set aside to test the system when done. The software first did a rough audio analysis, dividing each item into equal-sized chunks, each sufficiently long to fit any spoken character. Those segments with the highest energy peaks, which are considered most likely to contain actual letters, were set aside for analysis.

The authors tested a number of methods used to extract features from recordings of speech (for the curious, these are mel-frequency cepstral coefficients and two forms each of perceptual linear prediction and relative spectral transform-PLP). These features were then subjected to analysis using machine learning programs, which were trained on the identification of individual characters. Three methods—AdaBoost, support vector machines (SVM), and k-nearest neighbor (k-NN)—were trained using the 900 audio CAPTCHAs that had been processed manually. The result of this pairing of processing and analysis methods was a total of 15 different attempts at cracking each of the 100 test audio CAPTCHAs.

Apparently, Google's audio CAPTCHAs, which consist of a series of the digits 0 through 9 recited over background noise of speech played backwards, were nowhere near consistent enough to fool the researchers' software: the SVM technique got the CAPTCHA right about two-thirds of the time, and AdaBoost wasn't far behind, with k-NN performing poorly in the test. ). For Digg, their audio CAPTCHA uses both digits and letters, but plays them over "a less complex background that sounds like flowing water." AdaBoost failed the test, but SVM was able to clear 70 percent accuracy with k-NN trailing by a significant margin.

There's more detail in the article, but the bottom line is this: Based on the results, audio CAPTCHAs need more of just about everything: more speakers, more characters, more distortion, and longer strings of tokens.

As a result, reCAPTCHA has expanded its own service to include all numbers from 0 to 99.

Editorial standards