GitHub just buried a giant open-source archive in an Arctic vault for 1,000 years

GitHub ships a snapshot of all public repositories taken to an archiving vault in the Arctic.

GitHub's transparency report: How and when it removes or block content

Microsoft-owned GitHub has finally moved its snapshot of all active public repositories on the site to a vault in Norway.

GiHub announced the archiving plan last November and on February 20 followed through with the 21 terabyte snapshot written to 186 reels of film. 

GitHub cancelled plans for a team to "personally escort the world's open-source code to the Arctic" due to the coronavirus pandemic, leaving the job to local partners who received the boxed films and deposited them in an old coal mine on July 8.   

SEE: Hiring Kit: Autonomous Systems Engineer (TechRepublic Premium)

The archive is being stored in Svalbard, Norway, a group of islands that's also home to the global seed bank. 

"The code landed in Longyearbyen, a town of a few thousand people on Svalbard, where our boxes were met by a local logistics company and taken into intermediate secure storage overnight," said Julia Metcalf, director of strategic programs at GitHub

"The next morning, it traveled to the decommissioned coal mine set in the mountain, and then to a chamber deep inside hundreds of meters of permafrost, where the code now resides fulfilling their mission of preserving the world's open-source code for over 1,000 years."

The repository includes public code repositories and significant dormant repos. The snapshot consists of the HEAD of the default branch of each repository, minus any binaries larger than 100kB in size. Each repository is then packaged as a single TAR file, and for efficiency's sake, most of the data will be stored as QR codes. 

A human-readable index and guide will itemize the location of each repository and explain how to recover the data.

SEE: Software developers: Coding interviews are a disaster, and here's why

The Internet Archive separately kicked off its archive of GitHub public repositories on April 13. Its Wayback Machine is archiving raw GitHub data as Web ARChive (WARC) files and so far has archived 55TB of data. 

Later this month the Internet Archive will use "git clone" to keep repositories available while also ensuring repo comments, issues, and other metadata can be accessed on the web.