Does the NSA make the case for a "Beneficent Archivist"?

Does the NSA make the case for a "Beneficent Archivist"?

Summary: One PRISM theory runs that the NSA copies everything "in case they need it later." Whether this is true or not, is there a case for archiving every global citizen's private data for the benefit of future historians?

TOPICS: Emerging Tech
Samuel Pepys
Samuel Pepys kept a diary in the 1660s. He would have totally dug Twitter.

Between 1660 and 1669, a resident of London, Samuel Pepys, kept a diary. At the time, he was working for the Royal Navy as an administrator and ultimately became one of the most important civil servants of his age.

That diary has become one of the most important historical documents of British history. What makes his diary important (and fascinating) to historians and scholars is that it forms a unique account of day-to-day life of someone who lived through that period.

But it almost didn't happen. Pepys didn't keep his diary with any intention of having historians, scholars, and regular people read it 400 years after he wrote it. He wrote large sections of it in code. Given that it was never intended to be published, someone had to find it, realise its importantce, and then publish it.

If that part hadn't happened, as a global society we would have lost something that aided our understanding of how we developed.

A problem with history is that you never know who's going to be important, famous, or infamous before they are. And when they become so, we generally want to know everything about them.

So doesn't that argue the case for a system whereby we copy everything — every single private and public bit that goes over the internet for every man, woman, and child, in every country — such that future generations can benefit from a total and complete archive of humanity's digital life, just in case we can dig out some sociologically important nuggets?

National security

To be clear, although this piece has been inspired by events/revelations surrounding the NSA, I'm taking a totally neutral stance on what may or may not be happening. It's a fast-moving, evolving story, and I don't know enough about it to comment. (Other than to say, "you're surprised? Really?")

One of the stories discussed around the NSA builds on the idea that AT&T operates a secret room containing equipment rigged to copy fibre-optic-borne voice communications. In this scenario, light passed over incoming fibre-optic voice channels could be "split by a prism" and the raw data (containing the actual content of the communication) could be copied, stored, and processed by one set of equipment, whilst the other beam could be sent on to the normal equipment and down to the caller as if nothing had happened.

You can, obviously, do the same thing with packets of TCP/IP data. You don't even necessarily need it to come in over fibre-optic lines.

What's unattractive about a government agency doing that is that it's typically regarded as an invasion of privacy.


Modern communications are extremely ephemeral. For example, we know now that Charles Darwin was an extremely important individual. If he'd had access to the communications technology that we have now, is it additive to his story and our understanding of it to be able to pull out every email he sent, every website he went to, every IM he sent? Is there something important in the digital detritus that we're not creating that we'll as a society be the worst for missing?

I think it is. And without knowing ahead of time who's going to be important, can that be done in a safe way?

Of course, there are already people who maintain archives of public data, and this important to the story, but what percentage of data that becomes relevant is public from the outset. In order for this idea to work, you'd have to capture the lot in exactly the same way that we're now learning some government organisations either can do or are doing.

Enter the idea of the "Beneficent Archivist," some entity that's entrusted to copy every piece of digital data that we create as a society, public and private, every day and store it, safely, for future generations. At a later date, when society identifies individuals that we want to learn more about, we can go back to the archive and start digging.

But, rather than it being done by a shady government agency, it's done in plain-view by some organisation that we trust to do it and in full knowledge of why it's being done — i.e. for the good of society.

(And you've have to have some pretty nifty de-duping software so that you weren't grabbing thousands of copies of the same SpongeBob SquarePants episodes every day, but let's assume we can do that at scale, and whilst we're at it, let's assume storage is infinite.)

We know that SSL is encrypted, and we also know that SSL is not crackable today by regular people with regular equipment. We also know that over time, what's strong encryption today becomes weak encryption tomorrow as computing horsepower becomes more plentiful and new techniques develop. Wait long enough and even the most heavily encrypted data becomes plain-text.

So it could be we can just do this organically. Store everything today, and assume that the technology to start cracking data from a point in time is always N decades after its time of creation. 

But maybe that doesn't offer enough control. Maybe we only want the archive open when the people involved die? Or when some committee decides it should be opened? And where do you find a group of enlightened humans to manage it? They'd be keepers of the most valuable-slash-powerful information in history.

So, do you build a technical solution to opening the archive such that mathematically it can't be opened before the expiration date? Or do you rely on the committee?

(On the maths front, we could ask the NSA. Some of their bods are really good at maths, I hear.)

Of course, all of these are difficult questions, and I'm just riffing on the idea. But I do think that that right now, today, we have the technology to make a historical archive of all digitally-based human endeavours that would undoubtedly be of immeasurable value to future generations.

And I totally think we should do it. 

What do you think? Post a comment, or talk to me on Twitter: @mbrit.

Image credit: Wikimedia

Topic: Emerging Tech

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.


Log in or register to join the discussion
  • Not for sure . . .

    (broken into several pieces due to broken spam filter in ZDNet)

    "We also know that over time, what's strong encryption today becomes weak encryption tomorrow as computing horsepower becomes more plentiful and new techniques develop."

    Not for sure: Thanks to the extremely exponential nature of key lengths, modern encryption algorithms are well in front of Moore's law, and there's a better chance that Moore's law will end rather than the encryption cracked.

    If you double the key length, you don't just double the time to crack it. *Each bit* you add doubles the time to crack it.

    Take, for example, going from 128 bits to 256 bits: You're doubling the time to crack the algorithm 128 times. When you go from 256 to 512, you're doubling the time to crack 256 times.

    In order to "keep up" with Moore's law, we only need to add a single bit to the key length every two years. But we're not doing that: We're doubling the key length every time we need a stronger algorithm.

    This well outpaces Moore's "law." We're easily getting to the point where turning every atom in the Earth into a supercomputer won't be able to brute force an encryption key. Moore's "law" will be dead long before we get to that point.
    • next piece

      "Enter the idea of the 'Beneficent Archivist,'"

      Except that if knowledge is power, and power corrupts, having access to every bit that ever exists could lead to quite a dangerous situation. If the reports on PRISM are correct, we're getting pretty close to a situation where if one person with enough power is a bad apple, that person has a turnkey solution to an instant dictatorship.

      The reason why the diary of Samuel isn't dangerous is because it's irrelevant to our government. It's long in the past. A historical curiosity. It's not as if we can throw him in jail or oppress him: He's dead. And so is everybody he knew.

      But an out of control government *can* take actions against other people with real time data. That's certainly possible.

      Besides, we're in an era where far more extensive records are kept anyways: Between the Wikipedia and, the chances of us losing an important part of history really is nil, even without PRISM-like collection of raw data.

      I'm not really convinced by your argument: In that era, most of that stuff was likely lost. Today, it's almost impossible to lose something online. And it's also the case that more people are recording events than ever before, making it actually likely that there are several people recording any particular event, rather than just one.
      • another piece

        Here's another reason I'm not convinced:

        "What makes his diary important (and fascinating) to historians and scholars is that it forms a unique account of day-to-day life of someone who lived through that period."

        Guess what? Millions of people on Facebook and Twitter now. You want an account of the day-to-day life of somebody today? Not a problem. It'll likely stay there forever too.

        Also, while it's historically important perhaps, it's not practically important: Technology won't come to a standstill if his writings were to be lost to time, and life goes on with or without his writings. We could in fact live without his writings.

        Is "historically important" more important than "this could potentially be abused on an enormous scale?" I'd say no.
        • It appears as if the spam filter hates long messages.

          Okay, that's the second time that doing nothing except breaking the post into several pieces fixed the spam filter.

          Except I haven't seen any "really long post spam." From observing the spam that gets by the filters, the latest trick is to use weird Unicode characters to foil bad link detection. Who is designing these spam filters?