Special Report: Innovative application of modern analytics techniques to presidential email

Special Report: Innovative application of modern analytics techniques to presidential email

Summary: In Part 3 of our 4-part Special Report, our resident presidential scholar David Gewirtz (who wrote the book on White House email) explores how applying modern analytics techniques to the President Bush 200-million email message archive could help governance.

SHARE:
TOPICS: Storage, Government
1

In honor of the Presidential Center dedication, ZDNet Government is proud to present Part 3 of our exclusive, 4-part in-depth special report on the George W. Bush Presidential Center and the 200 million email archive project.

Hand-processing 200 million messages

The problem will be analyzing all those email messages as part of the archiving process, especially if the goal is to separate out what can be made publicly available and what can't. Unfortunately, political interests, along with national security interests, will probably prevent this from being simply a machine processing problem.

Doing it without machine analytics assistance is going to be an epic problem.

If it were simply a machine processing problem, even if complex heuristics or artificial intelligence were used, it could probably be processed through in a month or two with high performance hardware.

When it becomes a question of making sure that every single message is thought through in terms of its political and national security implications, that thinking-through process is going to take a while.

Worse, each message may not be thought through by just one archivist. Each message (or at least the questionable ones) may have to be routed through an entire workflow process for approval to release. That, in turn, might be dependent on committee discussions, and all the normal foolishness that Washington is so good at, making sure nothing gets done.

These messages could be in limbo for a very, very long time.

What can you do with all this email data?

The most obvious (and the most likely) reason you're going to see this stuff delayed will be to prevent the opposing political party digging through all of it in the hopes of finding something that they can use as a mallet with which to beat their opposition. Opposition research is not necessarily the best use of a historical archive, but that's certainly going to be both the highest funded and the highest priority for those in politics.

When you move beyond politics and into governance, this stuff becomes interesting. For example, historians can look for clusters of emails around various events and see, perhaps, the discussions that went on and the thinking and the mindset of individuals in the White House during the various stages of those big events.

There were eight years of very volatile history that went on during Bush 43 that would be really fascinating to explore at the email message granularity level.

Of course, as we move forward, the years with our current administration have also been very interesting. If we can see what goes on in White Houses now and going forward into the future, that becomes quite educational from a historical perspective.

Even more important becomes the question of, "What can we learn to help us better manage the nation as we move forward?"

That, too, may benefit from machine help.

For example, we could do sentiment analysis. We could go through and process all those email messages and run analytics to see if certain events changed word usage. We might be able to predict stress levels before even the members of the White House know that things are heating up, and use analytics systems that can provide early alerts to certain kinds of situations.

That sort of thing could be very, very helpful as we look at crisis management in the future. For example, let's say that a situation is getting stressful to the point where mistakes might be made, or there might be unusual pressures going on in the White House.

Those people there, serving every day in the full force of the activity, might not realize that a situation has actually heated up or stepped up to the next level of crisis. Think of the frog who doesn't notice that things are heating up as it sits in the ever-warming water. The same kind of slow boil happens when you're in the crucible of the White House.

But if, behind the scenes, you can have systems watching behavior through email messages, they might be able to pop up an alert, for example, to the Chief of Staff saying, "You may not have noticed it, but things have heated up rather further than you expect. Use some caution, or be aware of your messaging." The alerts might offer specific historical examples, important cultural cues, and suggest potential courses of action.

This sort of analytics could apply to any number of things that officials may not have realized went from a lower level of concern to a higher level of impending crisis, where people may start making mistakes.

Later this week, in honor of the dedication of the Bush Presidential Center, Part 4 of our Special Report will explore how curators will manage 200 million presidential email messages and the question of a president's legacy.

Topics: Storage, Government

About

David Gewirtz, Distinguished Lecturer at CBS Interactive, is an author, U.S. policy advisor, and computer scientist. He is featured in the History Channel special The President's Book of Secrets and is a member of the National Press Club.

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

1 comment
Log in or register to join the discussion
  • An important data quality metric: Completeness

    David, you covered this in an earlier article:

    http://www.zdnet.com/special-report-g-w-bushs-103-6-million-missing-email-messages-and-the-it-archiving-challenge-7000013975/

    Approximately 1/3 of the emails are missing. Any conclusions made from the 200 million email messages that are available need to take this into account.
    Rabid Howler Monkey