CAPTCHAs now being leveraged to digitize the world's print books

Summary:Through an audacious crowdsourcing strategy, anyone who buys a ticket to an event or conducts an online transaction is helping to convert classic books to digital format -- to the tune of 100 million words a day.

We've all encountered the online challenge-response test when ordering things online -- where a bunch of strange words in strange fonts are displayed and need to be retyped to verify that you are a living, breathing human being and not a bot. That's called a CAPTCHA, which stands for "Completely Automated Public Turing test to tell Computers and Humans Apart."

Louis von Ahn, associate professor of computer science at Carnegie Mellon University and original creator of the CAPTCHA challenge screen, had a brainstorm a couple of years back -- why not harness all that time and energy people are putting into re-typing CAPTCHA codes, and put it to good use?

Now, it is -- many CAPTCHA codes now presented to verify human end-users are actually words taken from classic print books, via optical character recognition, and farmed out for conversion to digital format.

As von Ahn put it at a recent TED presentation, there's a lot of potential energy and brainpower than can be harnessed out there:

"It turns out that approximately 200 million CAPTCHAs are typed everyday by people around the world. When I first heard this, I was quite proud of myself. I thought, look at the impact that my research has had. But then I started feeling bad. See here's the thing, each time you type a CAPTCHA, essentially you waste 10 seconds of your time. And if you multiply that by 200 million, you get that humanity as a whole is wasting about 500,000 hours every day typing these annoying CAPTCHAs. So then I started feeling bad."

von Ahn and his team launched the "reCAPTCHA" project, which engages libraries and publishers to deliver OCR images to Web security sites to essentially use the wisdom of the crowd to convert the words into text. While OCR technology automatically converts many words into digital text, about 30% of printed works more than 50 years old are unrecognizable to the system. "So the next time you type a CAPTCHA, these words that you're typing are actually words that are coming from books that are being digitized that the computer could not recognize," he says.

Currently, reCAPTCHA is helping to digitize 100 millions words a day, or the equivalent of about two and a half million books a year, Ahn says.

"Every time you buy tickets on Ticketmaster, you help to digitize a book. Facebook: Every time you add a friend or poke somebody, you help to digitize a book. Twitter and about 350,000 other sites are all using reCAPTCHA."

This post was originally published on Smartplanet.com

Topics: Innovation

About

Joe McKendrick is an author and independent analyst who tracks the impact of information technology on management and markets. Joe is co-author, along with 16 leading industry leaders and thinkers, of the SOA Manifesto, which outlines the values and guiding principles of service orientation. He speaks frequently on cloud, SOA, data, and... Full Bio

zdnet_core.socialButton.googleLabel Contact Disclosure

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Related Stories

The best of ZDNet, delivered

You have been successfully signed up. To sign up for more newsletters or to manage your account, visit the Newsletter Subscription Center.
Subscription failed.