In honor of the Presidential Center dedication, ZDNet Government is proud to present part 4 of our exclusive, four-part, in-depth special report on the George W Bush Presidential Center and the 200 million email archive project.
I really think that for this treasure trove of historical information to become useful, it's going to need some machine filtering. Back in the day, most government agencies "archived" email by printing it all out. That was their archiving mechanism, fully supported by law and regulation.
The idea, to meet the requirements of both the federal and the Presidential Records Acts, was to print out email messages and stick them in great big paper piles and shove them into an Indiana Jones-style warehouse.
That approach might be acceptable according to the law, or even from the point of view of some sad professor somewhere who decides to devote his life to sifting through email messages. But it doesn't really provide tangible use. For realistic and practical use, this stuff has to be machine readable, machine addressable, and machine searchable.
What we need, from a historian's perspective, is the ability, for example, to take a Google-like engine and just be able to type in queries and see what comes back out of the data stream. I'd like to see that level of transparency. Again, for policy reasons, it's probably not going reach that level, but as administrations use digital messaging technology more and more, we're going to see increasing amounts of traffic that needs to be sifted through.
To make the full cache of presidential records useful to the populace — which is obviously never the priority of any White House — some sort of machine analysis is going to have to be a key part of the solution.
More to the point, hand sifting and hand managing all of that paper is going to become extremely expensive. Unless we decide to outsource sorting through America's most confidential documents to a third-world nation where the pay is cheaper, we'll need to turn to machine-based analytics.
The issue of availability in machine form is important. For example, just being able to search, Google-like, on a message archive is a far different sort of capability than having the entire dataset and being able to subject that to advanced heuristics.
So there's also the question of whether the raw data is made available to researchers versus being able to retrieve individual messages. Different kinds of research projects are going to need different kinds of things.
Politics becomes an issue, again, sadly. Opposition researchers, searching for political nuggets of joy, will want to search for various words and see if anybody says anything interesting, inappropriate, illegal, or even just out-of-context explosive.
Outside of politics, we should be able to look at what the whole dataset can tell us, what kind of knowledge we can derive by essentially observing, and even modeling the interaction of a White House over the space of eight years.
In that light, releasing the entire dataset to academic analysis is something that I'd really like to see. For the political reasons I've mentioned, that's probably not going to happen.
Wrapping this up, one of the things that always exists in the minds of current presidents — as well as former presidents — is the question of their legacy. A president's legacy is often defined not by the true historical record, not by deep analysis, but by sound bites.
President George W Bush, like most presidents, was very controversial in his time. And, like most presidents, he's certainly going to want to be sure that his legacy is presented in the best possible light.
In that context, archivists are likely to want to go through all of those 200 million messages, examine each very carefully, and determine how they will fit with the legacy that President Bush wants to leave with future generations of Americans.
Presenting all those messages in the best light could take some time.
Our best wishes go out to all members of the Bush administration, the Bush family, and all the Americans who served in the White House, past and present. Thank you for your service.