The Internet Archive's ten petabyte celebration
The Internet Archive has been in a former Christian Scientist church since 2009. The night of its Ten Petabyte Party, power went out in the entire neighborhood. Power was restored by PG&E at the end of the party, with all speeches and presentations held with improvised lighting and power.
A petabyte is a thousand terabytes, or a million gigabytes. The Internet Archive uses custom made petabox servers to store its data (a petabox is comprised of ten racks with each rack holding thirty-eight three-terabyte hard drives).
The atmosphere was joyous in the huge dark former church, and the pews were packed with supporters, fans and volunteers.
Outside the Archive, and around the corner in the former Christian Scientist reading room, is where the Archive has its book scanning room.
The Archive's repurposed reading room makes lovely, modernized use of the antique fixtures.
The Archive wants to create the world's largest library and uses a scanning system called Scribe. The San Francisco scanning room is one of many worldwide that contribute to the Archive - all which combine to scan over 1000 books a day (47 books an hour, or one book every 90 seconds).
The Internet Archive's goal is to make and preserve one copy of every published work it is able to attract or acquire - books, movies, records, everything.
The Archive's Scribe scanning system is available as a service, and is non-destrtuctive. The software that powers it is available on SourceForge. FYI, the Scribe software hasn't been updated in a while and was engineered specifically for the custom hardware IA uses in their scanning room.
Fun fact: the Internet Archive is known for archiving copies of the world's websites with its Wayback Machine. But did you know it is a bigger source of public domain e-books than Google? It has over one million torrents, too.
The scanning room at the end of the party. The Archive's scanning services offer open and free online access, permanent storage and lifetime file management. It was heartwarming to see how much the volunteers love their work, and geeked out with big smiles whenever someone asked a question.
The downstairs office at the Internet Archive. The Archive was established in 1996, is a non-profit, believes in free and open access to knowledge for all, and is dedicated to preserving the internet. I liked seeing an Iron Man mask at one Archive employee's desk.
The Archive is making 80 terabytes of archived web crawl research available for research. Keep up with more of the Archive's history-making archival activities by watching the Archive's blog or following the Internet Archive on Twitter.