Google backs character-recognition research

Ocropus, an open-source project based in Germany, aims to develop advanced handwriting recognition for e-library creation.
Written by Caroline McCarthy, Contributor
Google is sponsoring an artificial-intelligence research group's work to develop advanced technologies for character recognition.

The open-source project, called Ocropus, has several goals, including developing a high-level, easy-to-use handwriting recognition system that can convert handwritten documents to computer text, assisting in the creation of electronic libraries, analyzing historical documents and helping vision-impaired people access information. The "ocr" in Ocropus stands for optimal character recognition.

The project is headquartered at the Image Understanding and Pattern Recognition (IUPR) research group at the German Research Center for Artificial Intelligence (DFKI) in Kaiserslautern, Germany. DFKI Professor Thomas Breuel is leading the project.

Breuel made the announcement on Monday through a post on the Google Code blog. In addition to Google's sponsorship, Ocropus is getting funds from several German government agencies and other public and private entities.

The Ocropus team expects the project to last three years, and it will support three Ph.D. students or postdoctoral students. IUPR is basing the software primarily on two research projects: one, a handwriting recognition system developed in the mid-1990s for use by the U.S. Census Bureau; and two, newer layout analysis methods for character recognition.

Other resources include Tesseract, a decades-old engine for optimal character recognition originally developed by Hewlett-Packard Labs and re-released by Google last year as an open-source system.

A preview of the Ocropus system is available on the project's Web site under an Apache license, and the IUPR is soliciting open-source contributions in order to complete a number of goals. These include creating a desktop application for the system, adding third-party tools and adapting Ocropus to a variety of languages. It's currently English-only.

Editorial standards