Crowdsourcing unleashed: 25,000 join effort to digitize one library's historical collections

The National Library of Finland is digitizing millions of pages of historical texts via optical character recognition, which tends to miss a lot of words. More than 25,000 global volunteers are filling in the gaps, and even having fun with it. Here's how.

We already know, through projects such as SETI@home, that compute jobs involving massive amounts of data be broken down into tiny little pieces, spread out to systems across the globe, and reassembled. Can the same apply for more manual types of work?

Games without frontiers: Volunteers are challenged to enter the correct text, contributing to a global digitization effort

Apparently so. Organizers of Digitalkoot (Digital Volunteers) report that more than 25,000 volunteers from across Europe and the globe have been partaking in the digitization of historical collections at the National Library of Finland.

The Digitalkoot program enlists online volunteers, via crowdsourcing, to help digitize millions of pages of archive material. Through two online games, volunteers complete small portions of work, or microtasks, to help correctly digitize historical content. The national library reports that the volunteers have already completed more than two million individual tasks, totaling 1,700 hours of work.

The games, configured by Microtask, prompt volunteers to weed out text missed by optical character readers. The National Library of Finland has millions and millions of pages of historically and culturally valuable magazines, newspapers and journals online. The challenge is that the OCR output often contains errors and omissions, which hamper searches. Manual correction is needed to weed out these mistakes so that the texts become machine readable.

Most of the volunteers come from Finland, but there are also volunteers from the US, UK and Sweden.

Microtask says it accomplishes such projects by splitting dull repetitive tasks into tiny microtasks and distributing them over the Internet. After being carried out by interested microworkers around the world, the results are put back together into a completed assignment.

The national library reports that to date, four million pages of different types of texts from the 18th to 20th centuries have been digitized, but there still remains a huge bulk of cultural heritage archived only in paper files.

In the first phase, the program consists of two online games. In ‘Mole Hunt’ (Myyräjahti), the player is shown two different words, and they must determine as quickly as possible if they are the same. This uncovers erroneous words in archived material. In ‘Mole Bridge’ (Myyräsilta), players have to spell correctly the words appearing on the screen. Correct answers help moles build a bridge across a river. In the next phase, the library says, the program will be expanded to target history buffs.

This post was originally published on