ie8 fix
madison

Zero Day

Ryan Naraine, Emil Protalinski and Dancho Danchev

Google + reCAPTCHA could raise bar in anti-bot, anti-spam battle

By | September 16, 2009, 12:54pm PDT

Summary: Google buys an excellent crowd-sourcing tool and, by default, gets to raise the bar significantly in the fight against bots and spam.

Locked in a cat-and-mouse game with spammers who use bots to defeat anti-fraud mechanisms and create fake accounts, Google today announced a deal to acquire reCAPTCHA, a company that provides those squiggly words at login screens (see image at right).

The ReCAPTCHA deal isn’t exactly a security transaction.  Strategically, it gives Google an excellent crowd-sourcing tool to beef up its already impressive machine-vision algorithms (think book-scanning and maps) but, in the long run, the ability to use CAPTCHAs that are near-impossible for bots to decipher allows Google to raise the bar significantly in the fight against bots and spam.

According to Adam O’Donnell, director of emerging technologies at anti-spam firm Cloudmark, believes this is a very smart purchase by Google.

“Google already has the best computer-vision techniques.  The way ReCAPTCHA works, this means that Google will only be presenting CAPTCHA words that are very difficult for a bot to defeat,” O’Donnell explained.

“By pushing up that boundary, it will make CAPTCHA technology much better.”

The words presented by the ReCAPTCHA service come from scanned printed material (archival newspapers and old books).   As Google explains here, computers find it hard to recognize these words because the ink and paper have degraded over time, but by typing them in as a CAPTCHA, crowds teach computers to read the scanned text.

In this way, reCAPTCHA’s unique technology improves the process that converts scanned images into plain text, known as Optical Character Recognition (OCR). This technology also powers large scale text scanning projects like Google Books and Google News Archive Search. Having the text version of documents is important because plain text can be searched, easily rendered on mobile devices and displayed to visually impaired users. So we’ll be applying the technology within Google not only to increase fraud and spam protection for Google products but also to improve our books and newspaper scanning process.

CAPTCHAs have served to slow down spammers and phishers but in many cases, they are easily defeated by bots or humans hired to manually solve text in the squiggly-lined images.

[ Dancho Danchev: Google's CAPTCHA experiment and the human factor ]

Earlier this year, Researchers at Google recently released a paper detailing a new CAPTCHA system consisting of correct image rotation (Socially Adjusted CAPTCHAs) whose main purpose is to make it easier for humans, and much harder for bots to recognize them.

Kick off your day with ZDNet's daily e-mail newsletter. It's the freshest tech news and opinion, served hot. Get it.

Topics

Ryan Naraine is a journalist and social media enthusiast specializing in Internet and computer security issues.

Disclosure

Ryan Naraine

The most important disclosure is of my employment with Kaspersky Lab as a member of the global research and analysis team. Kaspersky Lab is a global company specializing in anti-malware and secure content management technologies. I do not own stocks or other investments in any technology company.

Biography

Ryan Naraine

Ryan Naraine is a journalist and social media enthusiast specializing in Internet and computer security issues. He is currently security evangelist at Kaspersky Lab, an anti-malware company with operations around the globe. He is taking a leadership role in developing the company's online community initiative around secure content management technologies.

Prior to joining Kaspersky Lab, Ryan was Editor-at-Large/Security at eWEEK, leading the magazine's and Web site's coverage of Internet and computer security issues and managing the popular SecurityWatch blog, covering the daily threats, vulnerabilities and IT security technologies. He also covered IT security, hacker attacks and secure content management topics for Jupiter Media's internetnetnews.com.

Ryan can be reached at naraine SHIFT 2 gmail.com. For daily updates on Ryan's activities, follow him on Twitter.

Related Discussions on TechRepublic

Did you know you can take part in these discussions with your ZDNet membership?
70
Comments

Join the conversation!

Just In

RE: Google reCAPTCHA could raise bar in anti-bot, anti-spam battle
birumut Updated - 2nd May 2011
Great!!! thanks for sharing this information to us!
seslisohbet seslichat
0 Votes
+ -
What about for the deaf-blind?
Grayson Peddie 16th Sep 2009
Implement CAPTCHA or reCAPTCHA, and you will
prevent deaf-blind from blogging and commenting
in blogs. Those who are deaf and blind (not
just deaf or blind) can only use braille
displays. Deaf can use CAPTCHA okay, but what
about those who have vision loss? What about
those with ZoomText or other magnifiers that
pixelate (not sure of correct spelling) when
zoomed in to like 2x or higher? Now for the
blind, if they can hear audio, great, but what
about for those with hearing loss?

I didn't read the entire article, but think
about it!
0 Votes
+ -
Well what are other options?
storm14k 16th Sep 2009
I'm not knocking your concern because it is a valid point. But does that mean we just leave everything open to bots and spammers? What could be done for the disabled? I thought reCAPTCHA had some ways of handling this.
0 Votes
+ -
Hidden Text Boxes Only Visible To Bots
Grayson Peddie 16th Sep 2009
How about hidden text boxes that are only visible
to bots? If any of the hidden text boxes have been
entered, then the submission will fail.

Will this work?
0 Votes
+ -
Any method can be broken.
... for the bots. For example: a bot submits a form by filling-in data in the hidden fields.

When the server detects the improper submission it can "reward" the bot with a fake response that suggests the submission has been successful. The fake response would need to be identical in every way to what humans would see.
0 Votes
+ -
Good idea but...
jrbeaman 18th Sep 2009
That would mean they would put your page on the 'good' list and every bot on the planet would pound your server ending in a DOS attack brought on by yourself.

Want to send them something? How about an error 404 page, or a disconnect, and/or block their IP for an hour?
0 Votes
+ -
hidden fields
Bucky24 23rd Sep 2009
there are two ways I know of to hide a field.
1) put it as input type="hidden"
2) put it inside a and hide the div.

Both of these will show up in the html and thus would be easily spotted by any bot that can parse code.

-Bucky24
0 Votes
+ -
You are wrong.
jrbeaman Updated - 18th Sep 2009
Bots can be easily fooled in this situation.

You don't need graphics either.
Also, Captcha is WAY too expensive.
reCaptcha? reMistake, reWaste of money... reFAIL!

You need to handle visually impaired, hearing impaired, and the intelligence of a first grader. The last one should cover most of the people here. happy

A simple random text question would suffice. It can be rendered in sound as a question. Braille can present it too. Captcha gives the answer which bots can use speech to text. OCR is getting smarter, and you can't use pictures of cats and dogs to select, as that
can't be represented as sound, or Braille.

Bots will NOT be able to figure out simple, random, plain text question/answer that even a 4 year can do. Like "spell out the number that comes after four." Any language can be used, as well. No huge libraries of images, and the bandwidth to send them.

Captcha is a fail.

A waste of money, and difficult for many to use.
Captcha is over complex and too expensive and have too many shortcomings. There are simpler methods, that resolve all the problems mentioned in these comments that require very simple processes without images, sound files, and without huge files of data, and the simplest moron surfer can use it.

Why is it that you have to throw a ton of money at the problem with marginal results, like Captcha?

The answer is just too simple for the egotists to see.

Sound files give away the answer.
Images selection can't be rendered with sound.
Hard to read distortions piss people off.

Just ask a simple question in simple text. The answer is not provided in sound representation, nor can it be figured out through OCR, or bots.

One person could create and feed the question of the hour and the bots would never break it.
No special software needed.

Captcha is like using a bulldozer to plant your roses.
0 Votes
+ -
I like your thinking but?
clareJ 21st Sep 2009
I like your creative thinking but sound files is not a cure-all. Many times The users sound system is not working or the audio volume is turned down. They don't even know that they missed a clue.


But please keep brain storming for an answer.
0 Votes
+ -
But reCaptcha fails in too many ways.
jrbeaman 18th Sep 2009
It's too expensive when a simple random text question/answer will solve all the problems.
0 Votes
+ -
reCaptcha has an audio feature. nt
T1Oracle 17th Sep 2009
nt
0 Votes
+ -
Add deafness to blindness and no audio.
Grayson Peddie 17th Sep 2009
Sure there's audio, but you need to bring up more
than 140 dB in order for the deaf/blind to hear
it.
0 Votes
+ -
Well what's your answer?
Wintel BSOD 17th Sep 2009
Do nothing?
0 Votes
+ -
The answer is too easy.
jrbeaman 18th Sep 2009
Deaf and/or just blind have braile.

You have to eliminate images and not give the answer in sound files.

Captcha is expensive, incomplete and overkill.

The correct answer can provide the process for all the impared, and it isn't captcha.

See my other posts for the correct answer.
0 Votes
+ -
And it gives the answer. FAIL nt
jrbeaman 18th Sep 2009
0 Votes
+ -
Not to be cynical
rernst99@... 17th Sep 2009
about what percentage of society are you talking about? And how much are those folks into the Internet altogether - I mean - how many devices are out there that can read any web page and translate it into a braille device.

Maybe that sounds a little cruel but there are also the illiterate - who are shut out from the whole experience. Or any number of groups which lack the required facilities to do what is required.
0 Votes
+ -
do the Helen Keller
ilyab 17th Sep 2009
and talk with your (ah you know the rest)

Can't please everyone, so just try to do what works for the vast majority. It's gotten so bad that golf clubs with no wheelchair bound members, have to build ramps into their bathrooms.
0 Votes
+ -
Exactly my point
gammaworld@... 17th Sep 2009
Aside from politics and religion really what
has captcha AND has a valid point a blind, deaf
and dumb person would express?

"Two and half men needs more blind, deaf and
dumb actors!"

Besides, if they feel they cannot express
themselves they can always get a free website
and express themselves there free of captcha.
0 Votes
+ -
Simple Solution
gammaworld@... 17th Sep 2009
They have their friend/caretaker/milkman enter the
captcha.

Otherwise you have a perfect method for spammers.

Hearing-impaired wouldn't be a problem unless you
are blind, deaf and dumb and I don't think that
many are a serious concern. And they would
definitely have a caretaker to blog for them.
0 Votes
+ -
OK this is getting STUPID
GAXXIS 17th Sep 2009
As a Person with Learning disabled and sight Problems I'm really getting ANNOYED at the LEVEL of STUPIDITY of some of the posters on this topic.

That's all I can say without going off and getting Myself Banned from ZDNet.

'Shakes Head in Utter Amazement'

Gaxxis
0 Votes
+ -
Stupid is as Stupid does.
jrbeaman 18th Sep 2009
try: "spell the number 5"
or: "what number comes after three"

All can be rendered in plain text, displayed in braile, and does not give the answer in a sound file. Even a single digit IQ can answer it.

No big files or complex software needed.

Only 2 out of 50 people here have suggested it.
Shows how stupid the audience is, and how foolish google is.


0 Votes
+ -
Know whats next for the hackers?
jrbeaman 18th Sep 2009
They will ignore the graphics and use voice recognition on the sound bite.
0 Votes
+ -
Another failure for an answer...
jrbeaman Updated - 18th Sep 2009
The "have their friend/caretaker/milkman enter the captcha." is a total failure because he has the same problems that many people already have. The friend/caretaker/milkman can't read it even with a 100 IQ and 20/20 vision, and all their fingers.

Captcha is expensive overkill with too many limitations. Bad choice to thwart bots.

You have to do it without images. Period.
You have to do it without sound files.
You hace to do it so OCR doesn't apply.
You have to do it in multiple languages.
You have to do it so a 4 year old can figure it out,
AND a blind person too.

All of this can be done with existing tools, and still thwart bots. The answer is way too simple, and that is the beauty of it.

And, you people are supposed to be smart?

Total FAIL!

What's the correct answer?

Read my other posts here for the solution.
0 Votes
+ -
We Need A Better Way
shakethebabyass2011 16th Sep 2009
I hate Captcha but it is a reality and if it helps kill the bots then so may it be.

I like and do not mind the ones that ask a question

EX: What is the presidents wife name?

EX: Type the last word of this sentence love?

I dont understand why they cannot get a better way than the codes, I have literally tried for 10 min on some websites and I am not a bot!

It would be cool if they could do something to make captcha kinda fun instead of dreadful, maybe like solving a short very easy puzzle or a small connect the dots picture.. childish yes but it would save some time =)
0 Votes
+ -
What do YOU really hate about it?
Shmuel 17th Sep 2009
It is not good enough to say "i don't like it"...
Why do you not like it? Are you having trouble deciphering the text presented?
Posting negative perspective (or positive, for the matter) would be much more useful if a reason is given...
0 Votes
+ -
Too hard to see
mswift@... 17th Sep 2009
The poster you are answering said it took him 10 minutes sometimes to get past a CAPTCHA.

My average is about 4 tries before I get it to work. if I don't get it by then, I never go to that site again. The industry says it would be tremendous overhead to verify that anything coming in comes from an IP address that matched the proported URL. I.E.- if the incoming says it comes from google.com does it have a google IP? If that were verified then Google would drop spamming googlers, etc.

I agree that a question like which box is (red, green, blank, dotted, colored, spotted, etc) would work for a couple of weeks, so change the base question every week. Today, which lines are parellel, tomorrow, which line has the smallest dots.
sorry.

Captcha is too expensive and other methods fail.
0 Votes
+ -
Asirra Dogs vs Cats
rpayne@... 17th Sep 2009
Here's an idea from the Evil Empire that is fairly easy and kind of fun.

http://research.microsoft.com/en-us/um/redmond/projects/asirra/

This idea of having humans pick the cats out of a set of animals is not strong enough for commercial application, but might be a step in the tight direction.
0 Votes
+ -
I like your method
gammaworld@... 17th Sep 2009
They only have to add more pieces to make it
harder. Captcha could simply be a randomization of
simple puzzles - like from a childrens book.

It encourages smarter behaviour in the masses as
well.
LOL
0 Votes
+ -
Excellent suggestion.
jrbeaman 18th Sep 2009
But then you will get the 'idiots consortium' to complain that their people can't accomplish the task.

No, really, "what is two plus two, spelled out" is a great bot killer.

I see the captcha concept, though interesting, to be overkill.
Always liked reCAPTCHA, though admittedly it was annoying on occasions when I couldn't figure out what it displayed. I always thought it was a cool approach and nicely done. For some reason, I sort of assumed it would be a successful project that remained independent.

Now, I can't get excited over the commercialization of reCAPTCHA.

Another company that scans tons of books is Amazon. Amazon's scan is just amazing, preserving the book format while making the text "highlightable" and searchable. I would venture to guess Amazon and Google are the more advanced commercially viable companies in OCR when it comes to books.

On the note about Amazon, I came across an interesting table that details the discounts on Amazon.

It is at http://www.uberi.com

Maybe someone will find it useful too. While you are there, I would suggest checking out the "Amazon Filler Item" among other things there when you get a chance. It's quite amusing.
Captcha is expensive overkill with too may faults. OCR is not needed, nor are images.
It can all be done with plain text and be
easily presented in may forms for impaired
people.

Think KISS.

The simplest answer is the most costeffective, and many times the most bulletproof.
0 Votes
+ -
$1/year for an account.
CobraA1 17th Sep 2009
How about this: $1/year for an account, non-refundable.

Places virtually no financial burden on people. Places a
large financial burden on those wanting to open thousands
of accounts daily.
0 Votes
+ -
Failure
T1Oracle 17th Sep 2009
With web 2.0 every site needs an account these days. Not every site is going to be able to convince people to give up credit card data.
0 Votes
+ -
Require ISP email addresses
Stan57 17th Sep 2009
Want to stop spammers, Require real ISP emails addresses to sign up for free email accounts,with return receipt authorization. Anything else is just a waste of time and money, its just that simple.
0 Votes
+ -
Whose ISP?
cgarrett 17th Sep 2009
What about net caf? users? What about library Internet users? What
about wi-fi hotspots? Teens? I fail to believe that most people know
how to set up an additional ISP email, even if it is no extra cost.

Not every "real" email address comes from an ISP either. Microsoft
owns microsoft.com and that's where their email addresses are @. Are
you saying that their fiber provider should be providing them email
addresses? And if we're going to allow "real" email addresses. Where
do we stop? What if the address is joe@manhoodpill.ru?

And who wants to volunteer to take the expense to maintain a list of
ISP's and their email domains?
0 Votes
+ -
Library mail servers?
mswift@... 17th Sep 2009
No, they would be sending mail from their Gmail or Yahoo or other ISP accounts and the mail would have the IP of their email ISP. If they did have a mail server and there was real time cross checking, the librarian would get a message that someone using that mail server was CURRENTLY trying to send spam.

There are people who do connect IP addresses to a URLs. ICANN and the registrars do it for a living.
0 Votes
+ -
I think you're confused
cgarrett 17th Sep 2009
I'm sure there's a message in there somewhere. But the subject matter
in this thread is requiring an ISP email account in order to sign up for
GMail or Yahoo. I'm pretty sure that whatever you're saying is way off
topic.
0 Votes
+ -
My thoughts exactly
SkaldedKat 17th Sep 2009
If you cannot verify your identity, then go play in your own back yard and let me use my "PAID FOR" internet access without your childish, mindless, morally bankrupt and useless little games.
I agree. I have been using computers since the 5th grade and get paid to evaluate structures with my eyes and I STILL can't get past the occasional captcha - I just get so fed up I quit trying. admittedly that is very rare but it does happen. Answering a question would be better.
0 Votes
+ -
Is this a good enough argument?...
Shmuel 17th Sep 2009
As you indicate "this is a rare occasion"...

Most of the time, the logging process will give you another 'combination' of you fail the first...

It seems to be a small price to pay for the reduction in harvesting of users log in data...
0 Votes
+ -
Put a pron page then use the user to interpret the captcha from another site (for example, using a iframe).


0 Votes
+ -
Doesn't work
gammaworld@... 17th Sep 2009
The captcha's cycle. So it would be successfully
interpreted and when your bot 'types' it in it's
no go.
You have to make it so expensive for the Spam Bots to "captcha" that they change their methods or quit.
Or, train your software to recognize Bot registrations-maybe through origin- and remove them.
The ISP Convention I proposed some years ago would accomplish this by refusing traffic from ISP's that generate Bot programs.
Similarly, "metered" bandwidth would force Mass Marketing Emailers (AKA Spammers) to utilize Direct Marketing principles of Qualifying, Testing, and Measurement to afford the expense of generating Metered traffic.
he said sometimes it takes him up to 10 mins to decipher.
too hard for sight impaired now
CAPTCHA is most effective techniques to prevent
your mail
from spam. it power full anti-spam

i would like to publish this useful information on
my blog -
http://www.amitin.co.cc
0 Votes
+ -
CAPTCHA is overkill.
jrbeaman 18th Sep 2009
"CAPTCHA is most effective techniques to prevent your mail from spam. it power full anti-spam"

Questions with easy answers is the easiest and most effective way to do it. Blurred images or distorted characters only make it harder for the people, and they abandon the entry process, and many times the whole web site. Questions like "what color is the sky" can be handled with almost any imparement, is easy to render, requires no 'technology', can be programmed or used by almost anyone. Show me a bot that can get past that? Image selection
such as "which picture is of a cat?" is not visually impared friendly.

CAPTCHA is overkill. You have been conned!

"what number comes after seven" is a great way to tell a human from a machine, is easily done with sound, you can use speach to text to render it, and does not give the answer, which could be hacked with voice recognition.

Like I said, pay your millions because you got sold on an incomplete and over complex method.
Great!!! thanks for sharing this information to us!
seslisohbet seslichat

Join the conversation!

Formatting +
BB Codes - Note: HTML is not supported in forums
  • [b] Bold [/b]
  • [i] Italic [/i]
  • [u] Underline [/u]
  • [s] Strikethrough [/s]
  • [q] "Quote" [/q]
  • [ol][*] 1. Ordered List [/ol]
  • [ul][*] · Unordered List [/ul]
  • [pre] Preformat [/pre]
  • [quote] "Blockquote" [/quote]
ie8 fix
Click Here
ie8 fix

The best of ZDNet, delivered

ZDNet Newsletters

Get the best of ZDNet delivered straight to your inbox

Facebook Activity

White Papers, Webcasts, & Resources
ie8 fix
ie8 fix