GitHub: We're storing your open-source code in the frozen Arctic for 1,000 years

GitHub will take a snapshot of all public code repositories, save it on film, and archive it in an old Norwegian mine.
Written by Liam Tung, Contributing Writer

GitHub has unveiled the Arctic Code Vault, a new project to archive the planet's open-source software and ensure it's usable in a future world that may not have the machines or knowledge to read it. 

The code-sharing site will put the vault in the Arctic World Archive (AWA), a decommissioned coal mine in Norway's Svalbard archipelago, close to the North Pole and a mile from the Global Seed Vault

The Microsoft-owned company plans to take its first snapshot of every active repository on February 2, 2020 and store that on 3,500-foot film reels from Norwegian long-term storage company Piql. 

SEE: Six in-demand programming languages: Getting started (free PDF)

Normally film has a lifespan of about 500 years, but Piql's film is meant to last for 1,000 years. The company also makes the piqlReader to read offline data. 

GitHub announced the plan at its GitHub Universe conference in San Francisco

The Svalbard mine offers a few key advantages for very long-term data archiving: it's a demilitarized zone and is one of the most remote and geopolitically stable places on Earth where humans live. And it's really, really cold, which is good for a storage medium like film. 

The film reels will be stored in a steel-walled container inside a sealed chamber within the old mine. And the GitHub snapshot will be kept alongside other historical data from places including Italy, Norway, and the Vatican. 

The 2020 GitHub snapshot will include public code repositories as well as "significant dormant repos as determined by stars, dependencies, and an advisory panel", according to GitHub.         

"The snapshot will consist of the HEAD of the default branch of each repository, minus any binaries larger than 100kB in size. Each repository will be packaged as a single TAR file," it adds. 

"For greater data density and integrity, most of the data will be stored QR-encoded. A human-readable index and guide will itemize the location of each repository and explain how to recover the data."

The advisory panel will include experts from a range of fields, including anthropology, archaeology, history, linguistics, archival science, and futurism. 

SEE: Programming languages: Python overtakes Java on GitHub as Google Dart use soars

Other participants in the GitHub archiving project include Stanford Libraries, the Long Now Foundation, the Internet Archive, the Software Heritage Foundation, Microsoft Research, and Oxford University's Bodleian Library.

GitHub sees the Arctic World Archive as part of its cold storage strategy, which also includes Microsoft Research's Project Silica, it's recently unveiled effort to store the Superman movie on a credit-card-sized piece of quartz glass. Cold storage data will be updated every five or so years. 

At the other end, hot storage data includes all the continuously backed up data of Gits, Issues, and Pull Requests. Oxford's Bodleian Library provides redundancy for the Arctic Code Vault. It will house duplicate Piql film reels containing the 10,000 most-starred and relied on repositories. 

Editorial standards