Between 1660 and 1669, a resident of London, Samuel Pepys, kept a diary. At the time, he was working for the Royal Navy as an administrator and ultimately became one of the most important civil servants of his age.
That diary has become one of the most important historical documents of British history. What makes his diary important (and fascinating) to historians and scholars is that it forms a unique account of day-to-day life of someone who lived through that period.
But it almost didn't happen. Pepys didn't keep his diary with any intention of having historians, scholars, and regular people read it 400 years after he wrote it. He wrote large sections of it in code. Given that it was never intended to be published, someone had to find it, realise its importantce, and then publish it.
If that part hadn't happened, as a global society we would have lost something that aided our understanding of how we developed.
A problem with history is that you never know who's going to be important, famous, or infamous before they are. And when they become so, we generally want to know everything about them.
So doesn't that argue the case for a system whereby we copy everything — every single private and public bit that goes over the internet for every man, woman, and child, in every country — such that future generations can benefit from a total and complete archive of humanity's digital life, just in case we can dig out some sociologically important nuggets?
To be clear, although this piece has been inspired by events/revelations surrounding the NSA, I'm taking a totally neutral stance on what may or may not be happening. It's a fast-moving, evolving story, and I don't know enough about it to comment. (Other than to say, "you're surprised? Really?")
One of the stories discussed around the NSA builds on the idea that AT&T operates a secret room containing equipment rigged to copy fibre-optic-borne voice communications. In this scenario, light passed over incoming fibre-optic voice channels could be "split by a prism" and the raw data (containing the actual content of the communication) could be copied, stored, and processed by one set of equipment, whilst the other beam could be sent on to the normal equipment and down to the caller as if nothing had happened.
You can, obviously, do the same thing with packets of TCP/IP data. You don't even necessarily need it to come in over fibre-optic lines.
What's unattractive about a government agency doing that is that it's typically regarded as an invasion of privacy.
Modern communications are extremely ephemeral. For example, we know now that Charles Darwin was an extremely important individual. If he'd had access to the communications technology that we have now, is it additive to his story and our understanding of it to be able to pull out every email he sent, every website he went to, every IM he sent? Is there something important in the digital detritus that we're not creating that we'll as a society be the worst for missing?
I think it is. And without knowing ahead of time who's going to be important, can that be done in a safe way?
Of course, there are already people who maintain archives of public data, and this important to the story, but what percentage of data that becomes relevant is public from the outset. In order for this idea to work, you'd have to capture the lot in exactly the same way that we're now learning some government organisations either can do or are doing.
Enter the idea of the "Beneficent Archivist," some entity that's entrusted to copy every piece of digital data that we create as a society, public and private, every day and store it, safely, for future generations. At a later date, when society identifies individuals that we want to learn more about, we can go back to the archive and start digging.
But, rather than it being done by a shady government agency, it's done in plain-view by some organisation that we trust to do it and in full knowledge of why it's being done — i.e. for the good of society.
(And you've have to have some pretty nifty de-duping software so that you weren't grabbing thousands of copies of the same SpongeBob SquarePants episodes every day, but let's assume we can do that at scale, and whilst we're at it, let's assume storage is infinite.)
We know that SSL is encrypted, and we also know that SSL is not crackable today by regular people with regular equipment. We also know that over time, what's strong encryption today becomes weak encryption tomorrow as computing horsepower becomes more plentiful and new techniques develop. Wait long enough and even the most heavily encrypted data becomes plain-text.
So it could be we can just do this organically. Store everything today, and assume that the technology to start cracking data from a point in time is always N decades after its time of creation.
But maybe that doesn't offer enough control. Maybe we only want the archive open when the people involved die? Or when some committee decides it should be opened? And where do you find a group of enlightened humans to manage it? They'd be keepers of the most valuable-slash-powerful information in history.
So, do you build a technical solution to opening the archive such that mathematically it can't be opened before the expiration date? Or do you rely on the committee?
(On the maths front, we could ask the NSA. Some of their bods are really good at maths, I hear.)
Of course, all of these are difficult questions, and I'm just riffing on the idea. But I do think that that right now, today, we have the technology to make a historical archive of all digitally-based human endeavours that would undoubtedly be of immeasurable value to future generations.
And I totally think we should do it.
What do you think? Post a comment, or talk to me on Twitter: @mbrit.
Image credit: Wikimedia