Google + reCAPTCHA could raise bar in anti-bot, anti-spam battle

Google + reCAPTCHA could raise bar in anti-bot, anti-spam battle

Summary: Google buys an excellent crowd-sourcing tool and, by default, gets to raise the bar significantly in the fight against bots and spam.

TOPICS: Google

Locked in a cat-and-mouse game with spammers who use bots to defeat anti-fraud mechanisms and create fake accounts, Google today announced a deal to acquire reCAPTCHA, a company that provides those squiggly words at login screens (see image at right).

The ReCAPTCHA deal isn't exactly a security transaction.  Strategically, it gives Google an excellent crowd-sourcing tool to beef up its already impressive machine-vision algorithms (think book-scanning and maps) but, in the long run, the ability to use CAPTCHAs that are near-impossible for bots to decipher allows Google to raise the bar significantly in the fight against bots and spam.

According to Adam O'Donnell, director of emerging technologies at anti-spam firm Cloudmark, believes this is a very smart purchase by Google.

"Google already has the best computer-vision techniques.  The way ReCAPTCHA works, this means that Google will only be presenting CAPTCHA words that are very difficult for a bot to defeat," O'Donnell explained.

"By pushing up that boundary, it will make CAPTCHA technology much better."

The words presented by the ReCAPTCHA service come from scanned printed material (archival newspapers and old books).   As Google explains here, computers find it hard to recognize these words because the ink and paper have degraded over time, but by typing them in as a CAPTCHA, crowds teach computers to read the scanned text.

In this way, reCAPTCHA’s unique technology improves the process that converts scanned images into plain text, known as Optical Character Recognition (OCR). This technology also powers large scale text scanning projects like Google Books and Google News Archive Search. Having the text version of documents is important because plain text can be searched, easily rendered on mobile devices and displayed to visually impaired users. So we'll be applying the technology within Google not only to increase fraud and spam protection for Google products but also to improve our books and newspaper scanning process.

CAPTCHAs have served to slow down spammers and phishers but in many cases, they are easily defeated by bots or humans hired to manually solve text in the squiggly-lined images.

[ Dancho Danchev: Google's CAPTCHA experiment and the human factor ]

Earlier this year, Researchers at Google recently released a paper detailing a new CAPTCHA system consisting of correct image rotation (Socially Adjusted CAPTCHAs) whose main purpose is to make it easier for humans, and much harder for bots to recognize them.

Topic: Google

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.


Log in or register to join the discussion
  • What about for the deaf-blind?

    Implement CAPTCHA or reCAPTCHA, and you will
    prevent deaf-blind from blogging and commenting
    in blogs. Those who are deaf and blind (not
    just deaf or blind) can only use braille
    displays. Deaf can use CAPTCHA okay, but what
    about those who have vision loss? What about
    those with ZoomText or other magnifiers that
    pixelate (not sure of correct spelling) when
    zoomed in to like 2x or higher? Now for the
    blind, if they can hear audio, great, but what
    about for those with hearing loss?

    I didn't read the entire article, but think
    about it!
    Grayson Peddie
    • Well what are other options?

      I'm not knocking your concern because it is a valid point. But does that mean we just leave everything open to bots and spammers? What could be done for the disabled? I thought reCAPTCHA had some ways of handling this.
      • Hidden Text Boxes Only Visible To Bots

        How about hidden text boxes that are only visible
        to bots? If any of the hidden text boxes have been
        entered, then the submission will fail.

        Will this work?
        Grayson Peddie
        • The bot can learn which boxes cause a failure.

          Any method can be broken.
          • True, but maybe a hidden-field submission could serve-up a fake "reward"...

            ... for the bots. For example: a bot submits a form by filling-in data in the hidden fields.

            When the server detects the improper submission it can "reward" the bot with a fake response that suggests the submission has been successful. The fake response would need to be identical in every way to what humans would see.
          • Good idea but...

            That would mean they would put your page on the 'good' list and every bot on the planet would pound your server ending in a DOS attack brought on by yourself.

            Want to send them something? How about an error 404 page, or a disconnect, and/or block their IP for an hour?
          • hidden fields

            there are two ways I know of to hide a field.
            1) put it as <input type="hidden"
            2) put it inside a <div> and hide the div.

            Both of these will show up in the html and thus would be easily spotted by any bot that can parse code.

          • You are wrong.

            Bots can be easily fooled in this situation.

            You don't need graphics either.
            Also, Captcha is WAY too expensive.
            reCaptcha? reMistake, reWaste of money... reFAIL!

            You need to handle visually impaired, hearing impaired, and the intelligence of a first grader. The last one should cover most of the people here. :)

            A simple random text question would suffice. It can be rendered in sound as a question. Braille can present it too. Captcha gives the answer which bots can use speech to text. OCR is getting smarter, and you can't use pictures of cats and dogs to select, as that
            can't be represented as sound, or Braille.

            Bots will NOT be able to figure out simple, random, plain text question/answer that even a 4 year can do. Like "spell out the number that comes after four." Any language can be used, as well. No huge libraries of images, and the bandwidth to send them.

            Captcha is a fail.

            A waste of money, and difficult for many to use.
      • Another option to captcha is easier, cheaper, better.

        Captcha is over complex and too expensive and have too many shortcomings. There are simpler methods, that resolve all the problems mentioned in these comments that require very simple processes without images, sound files, and without huge files of data, and the simplest moron surfer can use it.

        Why is it that you have to throw a ton of money at the problem with marginal results, like Captcha?

        The answer is just too simple for the egotists to see.

        Sound files give away the answer.
        Images selection can't be rendered with sound.
        Hard to read distortions piss people off.

        Just ask a simple question in simple text. The answer is not provided in sound representation, nor can it be figured out through OCR, or bots.

        One person could create and feed the question of the hour and the bots would never break it.
        No special software needed.

        Captcha is like using a bulldozer to plant your roses.
        • I like your thinking but?

          I like your creative thinking but sound files is not a cure-all. Many times The users sound system is not working or the audio volume is turned down. They don't even know that they missed a clue.

          But please keep brain storming for an answer.
      • But reCaptcha fails in too many ways.

        It's too expensive when a simple random text question/answer will solve all the problems.
    • reCaptcha has an audio feature. nt

      • Add deafness to blindness and no audio.

        Sure there's audio, but you need to bring up more
        than 140 dB in order for the deaf/blind to hear
        Grayson Peddie
        • Well what's your answer?

          Do nothing?
          Wintel BSOD
          • The answer is too easy.

            Deaf and/or just blind have braile.

            You have to eliminate images and not give the answer in sound files.

            Captcha is expensive, incomplete and overkill.

            The correct answer can provide the process for all the impared, and it isn't captcha.

            See my other posts for the correct answer.
      • And it gives the answer. FAIL nt

    • Not to be cynical

      about what percentage of society are you talking about? And how much are those folks into the Internet altogether - I mean - how many devices are out there that can read any web page and translate it into a braille device.

      Maybe that sounds a little cruel but there are also the illiterate - who are shut out from the whole experience. Or any number of groups which lack the required facilities to do what is required.
      • do the Helen Keller

        and talk with your (ah you know the rest)

        Can't please everyone, so just try to do what works for the vast majority. It's gotten so bad that golf clubs with no wheelchair bound members, have to build ramps into their bathrooms.
      • Exactly my point

        Aside from politics and religion really what
        has captcha AND has a valid point a blind, deaf
        and dumb person would express?

        "Two and half men needs more blind, deaf and
        dumb actors!"

        Besides, if they feel they cannot express
        themselves they can always get a free website
        and express themselves there free of captcha.
    • Simple Solution

      They have their friend/caretaker/milkman enter the

      Otherwise you have a perfect method for spammers.

      Hearing-impaired wouldn't be a problem unless you
      are blind, deaf and dumb and I don't think that
      many are a serious concern. And they would
      definitely have a caretaker to blog for them.