X
Business

Google buys reCAPTCHA: Digitize old books and fight spam

Captcha's are annoying, but necessary.They try to distinguish humans from robots when entering form data.
Written by Andrew Mager, Inactive

Captcha's are annoying, but necessary.

They try to distinguish humans from robots when entering form data. One of the most terrifying problems with hosting your own content on the web is spam. These trolls will do anything to get you to click on something, and most of it seeps through into blog comments.

reCAPTCHA does the best job of preventing this kind of spam. They take poorly-rendered OCR scans, and display them to the user. Once a few users verify that the word is correct, reCAPTCHA confirms the word, and the book is one step closer to being digitized. From their website:

reCAPTCHA improves the process of digitizing books by sending words that cannot be read by computers to the Web in the form of CAPTCHAs for humans to decipher. More specifically, each word that cannot be read correctly by OCR is placed on an image and used as a CAPTCHA. This is possible because most OCR programs alert you when a word cannot be read correctly.

Read more here. I am really impressed that Google bought these guys (Techmeme). Google itself isn't very good at handling captchas, but now they will use something more standard, so I'm excited. And with Google's huge reach on the web, they will probably digitize books a lot faster than other sites.

Facebook is known for using reCAPTCHA as well:

What do you think of this? Will it help prevent spam, or will the hackers find another way? I'm interested to hear your thoughts.

Editorial standards