X
Business

Gmail, Yahoo and Hotmail's CAPTCHA broken by spammers

Breaking Gmail, Yahoo and Hotmail's CAPTCHAs, has been an urban legend for over two years now, with do-it-yourself CAPTCHA breaking services, and proprietary underground tools assisting spammers, phishers and malware authors into registering hundreds of thousands of bogus accounts for spamming and fraudulent purposes.
Written by Dancho Danchev, Contributor

Breaking Gmail, Yahoo and Hotmail's CAPTCHAs, has been an urban legend for over two years now, with do-it-yourself CAPTCHA breaking services, and proprietary underground tools assisting spammers, phishers and malware authors into registering hundreds of thousands of bogus accounts for spamming and fraudulent purposes.

Gmail, Yahoo and HotmailÂ’s CAPTCHA broken by spammers

This post intends to make this official, by covering an underground service offering thousands of already registered Gmail, Yahoo and Hotmail accounts for sale, with new ones registered every second clearly indicating the success rate of their CAPTCHA breaking capabilities at these services.

Monitoring the service for over a month now, revealed that during the period its "inventory of automatically registered email accounts" was emptying itself, then restoring to its current position - in the thousands, with 1 to 2 new accounts registered per second. Moreover, it's important to point out that compared to situations where scammers are scamming the scammers, these people "deliver the goods" that they promise. Last week, they've also started offering Hotmail and Yahoo email accounts, again in the thousands. For the time being, there are 134, 670 Gmail accounts available for purchase, as well as 42,893 Hotmail, and 10,847 Yahoo email accounts. There's naturally a price discrimination applied, for instance, if you're buying up to 10k Gmail accounts, the price for 1k would be $6, from 10k to 100k the price drops to $5 for 1k, and if you're going to buy over 100k accounts, the price would be $4 for 1k.

Considering the fact that researchers are already managing to achieve a recognition rate of of nearly 90% of Gmail's CAPTCHA, 58% for Yahoo's CAPTCHA, and over 92 for Microsoft's CAPTCHAs, the incentives for malicious parties to start efficiently breaking it and build a business model on the top of this seem to have prevailed. Here's a paper courtesy of Microsoft's research team, outlining some of the findings regarding the insecurities of these CAPTCHA's in general :

"The Google HIP is unique in that it uses only image warp as a means of distorting the characters. Similar to the

MSN/Passport and Yahoo version 2 HIPs, it is also two color. The HIP characters are arranged closed to one another (they often touch) and follow a curved baseline. The following very simple attack was used to segment Google HIPs: Convert to grayscale, up-sample, threshold and separate connected components.

This very simple attack gives an end-to-end success rate of 10.2% for segmentation the recognition rate was 89.3%, giving (0.102)*(0.893)6.5 = 4.89% total probability of breaking a HIP. Average Google HIP solution length is 6.5 characters. This can be significantly improved upon by judicious use of dilate-erode attack. A direct application doesn’t do as well as it did on the ticketmaster and yahoo HIPs (because of the shear and warp of the baseline of the word). More successful and complicated attacks might estimate and counter the shear and warp of the baseline to achieve better success rates."

Abusing the clean IP reputation of these reputable email providers, results in the flood of spam coming from legitimate domains, as well as the easy of registering bogus Blogspot accounts known as splogs, for blackhat search engine optimization, even malware, with Storm Worm diversifying its propagation vector to using Blogspot accounts presumably buying the already registered accounts.

With the continuing supply of bogus email accounts efficiently registered by breaking the CAPTCHAs at these services, isn't it time for major web companies to start considering replacements for text based CAPTCHAs like these ones, or perhaps put more efforts into slowing down the currently efficient text based recognition of their CAPTCHAs?

Editorial standards