Google + reCAPTCHA could raise bar in anti-bot, anti-spam battle
Summary: Google buys an excellent crowd-sourcing tool and, by default, gets to raise the bar significantly in the fight against bots and spam.
Locked in a cat-and-mouse game with spammers who use bots to defeat anti-fraud mechanisms and create fake accounts, Google today announced a deal to acquire reCAPTCHA, a company that provides those squiggly words at login screens (see image at right).
The ReCAPTCHA deal isn't exactly a security transaction. Strategically, it gives Google an excellent crowd-sourcing tool to beef up its already impressive machine-vision algorithms (think book-scanning and maps) but, in the long run, the ability to use CAPTCHAs that are near-impossible for bots to decipher allows Google to raise the bar significantly in the fight against bots and spam.
According to Adam O'Donnell, director of emerging technologies at anti-spam firm Cloudmark, believes this is a very smart purchase by Google.
"Google already has the best computer-vision techniques. The way ReCAPTCHA works, this means that Google will only be presenting CAPTCHA words that are very difficult for a bot to defeat," O'Donnell explained.
"By pushing up that boundary, it will make CAPTCHA technology much better."
The words presented by the ReCAPTCHA service come from scanned printed material (archival newspapers and old books). As Google explains here, computers find it hard to recognize these words because the ink and paper have degraded over time, but by typing them in as a CAPTCHA, crowds teach computers to read the scanned text.
In this way, reCAPTCHA’s unique technology improves the process that converts scanned images into plain text, known as Optical Character Recognition (OCR). This technology also powers large scale text scanning projects like Google Books and Google News Archive Search. Having the text version of documents is important because plain text can be searched, easily rendered on mobile devices and displayed to visually impaired users. So we'll be applying the technology within Google not only to increase fraud and spam protection for Google products but also to improve our books and newspaper scanning process.
CAPTCHAs have served to slow down spammers and phishers but in many cases, they are easily defeated by bots or humans hired to manually solve text in the squiggly-lined images.
[ Dancho Danchev: Google's CAPTCHA experiment and the human factor ]
Earlier this year, Researchers at Google recently released a paper detailing a new CAPTCHA system consisting of correct image rotation (Socially Adjusted CAPTCHAs) whose main purpose is to make it easier for humans, and much harder for bots to recognize them.
Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback
What about for the deaf-blind?
prevent deaf-blind from blogging and commenting
in blogs. Those who are deaf and blind (not
just deaf or blind) can only use braille
displays. Deaf can use CAPTCHA okay, but what
about those who have vision loss? What about
those with ZoomText or other magnifiers that
pixelate (not sure of correct spelling) when
zoomed in to like 2x or higher? Now for the
blind, if they can hear audio, great, but what
about for those with hearing loss?
I didn't read the entire article, but think
about it!
Well what are other options?
Hidden Text Boxes Only Visible To Bots
to bots? If any of the hidden text boxes have been
entered, then the submission will fail.
Will this work?
The bot can learn which boxes cause a failure.
True, but maybe a hidden-field submission could serve-up a fake "reward"...
When the server detects the improper submission it can "reward" the bot with a fake response that suggests the submission has been successful. The fake response would need to be identical in every way to what humans would see.
Good idea but...
Want to send them something? How about an error 404 page, or a disconnect, and/or block their IP for an hour?
hidden fields
1) put it as <input type="hidden"
2) put it inside a <div> and hide the div.
Both of these will show up in the html and thus would be easily spotted by any bot that can parse code.
-Bucky24
You are wrong.
You don't need graphics either.
Also, Captcha is WAY too expensive.
reCaptcha? reMistake, reWaste of money... reFAIL!
You need to handle visually impaired, hearing impaired, and the intelligence of a first grader. The last one should cover most of the people here. :)
A simple random text question would suffice. It can be rendered in sound as a question. Braille can present it too. Captcha gives the answer which bots can use speech to text. OCR is getting smarter, and you can't use pictures of cats and dogs to select, as that
can't be represented as sound, or Braille.
Bots will NOT be able to figure out simple, random, plain text question/answer that even a 4 year can do. Like "spell out the number that comes after four." Any language can be used, as well. No huge libraries of images, and the bandwidth to send them.
Captcha is a fail.
A waste of money, and difficult for many to use.
Another option to captcha is easier, cheaper, better.
Why is it that you have to throw a ton of money at the problem with marginal results, like Captcha?
The answer is just too simple for the egotists to see.
Sound files give away the answer.
Images selection can't be rendered with sound.
Hard to read distortions piss people off.
Just ask a simple question in simple text. The answer is not provided in sound representation, nor can it be figured out through OCR, or bots.
One person could create and feed the question of the hour and the bots would never break it.
No special software needed.
Captcha is like using a bulldozer to plant your roses.
I like your thinking but?
But please keep brain storming for an answer.
But reCaptcha fails in too many ways.
reCaptcha has an audio feature. nt
Add deafness to blindness and no audio.
than 140 dB in order for the deaf/blind to hear
it.
Well what's your answer?
The answer is too easy.
You have to eliminate images and not give the answer in sound files.
Captcha is expensive, incomplete and overkill.
The correct answer can provide the process for all the impared, and it isn't captcha.
See my other posts for the correct answer.
And it gives the answer. FAIL nt
Not to be cynical
Maybe that sounds a little cruel but there are also the illiterate - who are shut out from the whole experience. Or any number of groups which lack the required facilities to do what is required.
do the Helen Keller
Can't please everyone, so just try to do what works for the vast majority. It's gotten so bad that golf clubs with no wheelchair bound members, have to build ramps into their bathrooms.
Exactly my point
has captcha AND has a valid point a blind, deaf
and dumb person would express?
"Two and half men needs more blind, deaf and
dumb actors!"
Besides, if they feel they cannot express
themselves they can always get a free website
and express themselves there free of captcha.
Simple Solution
captcha.
Otherwise you have a perfect method for spammers.
Hearing-impaired wouldn't be a problem unless you
are blind, deaf and dumb and I don't think that
many are a serious concern. And they would
definitely have a caretaker to blog for them.