Business

An alternative to Google's book-scanning project

With $1 million grant, Internet Archive hopes to create an open source alternative for digitizing the world's books

Written by ZDNET Editors, Contributor Dec. 22, 2006 at 9:43 a.m. PT

The Internet Archive received a $1 million grant from the New York-based Sloan Foundation to digitize the collections of the Boston Public Library, the Getty Research Institute, the Metropolitan Museum of Art, AP reports.

The grant marks a challenge to Google, which has been working with universities to digitize books and make them available through Google's search engine, but is facing accusations of copyright infringement.

"You are talking about the fruits of our civilization and culture. You want to keep it open and certainly don't want any company to enclose it," said Doron Weber, program director of public understanding of science and technology for the Alfred P. Sloan Foundation.

The works to be scanned include the personal library of John Adams, thousands of images from the Metropolitan Museum, a collection of anti-slavery material from John Hopkins University Libraries, and documents about the Gold Rush from a library at the University of California at Berkeley.

Internet Archive director Brewster Kahle says an open effort is crucial.

"They [Google] don't want the books to appear in anyone else's search engine but their own, which is a little peculiar for a company that says its mission is to make information universally accessible," Kahle said.

The Archive's efforts are part of the Open Content Alliance. Kahle won't scan copyrighted content unless it receives the permission of the copyright owner. Most of the roughly 100,000 books that the alliance has scanned so far are works whose copyrights have expired.

Although the Open Content Alliance depends on the Internet Archive to host its digital copies, other search engines are being encouraged to index the material too.

All but one of the libraries contributing content to Google so far are part of universities. They are: Harvard, Stanford, Michigan, Oxford, California, Virginia, Wisconsin-Madison, and Complutense of Madrid. The New York Public Library also is relying on Google to scan some of its books.

The University of California, which also belongs to the Open Content Alliance, has no regrets about allowing Google to scan at least 2.5 million of the books in its libraries. "We felt like we could get more from being a partner with Google than by not being a partner," said university spokeswoman Jennifer Colvin.
But some of the participating libraries may have second thoughts if Google's system isn't set up to recognize some of their digital copies, said Gregory Crane, a Tufts University professor who is currently studying the difficulty accessing some digital content.
For instance, Tufts worries Google's optical reader won't recognize some books written in classical Greek. If the same problem were to crop up with a digital book in the Open Content Alliance, Crane thinks it will be more easily addressed because the group is allowing outside access to the material. Google "may end up aiming for the lowest common denominator and not be able to do anything really deep" with the digital books, Crane said.

Editorial standards

Show Comments

An alternative to Google's book-scanning project

Related

The work laptop I recommend to most people is not made by Apple or Lenovo

7 reasons I use Copilot instead of ChatGPT

Have an old Kindle? Whatever you do, do not do this one thing