Real-time big data in government: Big data or Big Brother?

The challenge is distinguishing between data collected for protection and data collection that violates our privacy, all while respecting the very core of our Constitution.
Written by David Gewirtz, Senior Contributing Editor

It goes without saying that governments — all governments — are the canonical users of big data.

Governments, going back to the times of the ancient Romans and Egyptians, required extensive record keeping to operate their empires, both for the management of extremely large civic works projects (like building the pyramids), and for the collection of revenue to fund those projects.

Later governments, both fair and oppressive, have found the gathering of data in vast volumes to be a functional necessity and competitive advantage. Both the old Soviets and the Nazis were infamous for their obsession with recording data about their citizenry. In those two examples, of course, that data collection would result in horrors and human rights abuses we hope to never again see practiced by so-called civilized nations.

Join me on Thursday

Live webcast: Data go vroom! How to keep up with the volume, velocity, and variety of big data in real time
  • Thursday, March 28, 2013
  • 2.00pm ET/11.00am PT/18.00 GMT
  • It's free!

So the gathering and processing of vast amounts of data is not new. What is new is the speed at which we can now process that data. By hosting databases fully in RAM, rather than on disk (or even faster solid-state devices), processing operations can increase in speed by a factor of a million or so.

Queries that used to take a day and a half to be solved using disk-based databases can now be resolved in a tenth of a second when based entirely in a few terabytes of directly addressable RAM.

We saw real-time analytics deployed in the last US election. President Obama's election team was able to dynamically analyze the global state of pre-election sentiment, and deploy advertising resources and human volunteers to the areas that needed the most attention, virtually in real time.

By contrast, Mitt Romney's analytics team famously provided incomplete and inaccurate information to the Romney central command, resulting in Romney's deployment of election resources to areas completely unrelated to need.

Now, we all know that elections aren't won solely by analytics. The policies of the two candidates contributed to the results, and some epic political (rather than computation) mistakes on the part of the challenging party didn't help matters.

Here we see not just big data in action, but fast big data in action. Had the president's data analytics operation taken months instead of days, or even days instead of minutes, his team might have missed key clues until the election was long over.

The challenge, of course, is how we handle this power

For example, the American Society of Civil Engineers said that one quarter of all American bridges are "deficient"; 17,000 bridges didn't meet inspection criteria, including 3 percent of all freeway bridges.

Want a scary statistic? The average age of America's bridges is 43 years. The average lifespan of America's bridges: 50 years. This means, unless something changes, we should all avoid pretty much all river crossings after the year 2020.

But my point here isn't to scare you (much). My point is that real-time analytics can help government and drivers alike. We all know about the spending reductions forced on American government agencies as a result of sequestration. So, the challenge (even after the parties get past their sequestration protestation infatuation), is how we can do more and more with less and less expense.

The bridge situation is an ideal example. The University of Texas is working on sensor technology that can report dynamic telemetry on a bridge's condition. They're working on sensors that can survive the constant vibration, weather, and even send and receive data through all the steel that normally would make radio transmission a near impossibility.

A little imagination can help us see how all this can work. Terabytes of sensor data come streaming into a central analytics engine straight from all the bridges. Dynamic, real-time analysis helps filter the signal from the noise, and — in real time — those bridges needing the most timely attention can get resources applied first (and, with dynamic crisis alerts, immediately, when warranted).

We can also see how this sort of telemetry can help fight terrorist threats. By sifting through vast amounts of data in real time, analytics systems can identify potential sources of threats, and mitigation teams can be dispatched to investigate.

Ah, but there's the rub. Did you see it? Did you feel it?

Did you notice how we suddenly went from big data to the possibility of Big Brother?

Clearly, we want and need to protect Americans from the constant threats against our security. Whether digitally or in meat space, the threat level is dangerously high. The American government must provide threat management or baaaad things will happen.

But the challenge is distinguishing between data collected for protection and data collection that violates our privacy, all while respecting the very core of our Constitution. Another challenge exists because a "potential" threat isn't an actual threat, and if we act against our citizens because some Minority Report analytics system assigned a threat potential to someone who hasn't yet done something, we're discarding our Constitution for some sort of dystopic future.

Congress isn't helping matters

There is a real need for corporations and government to share data that might help protect our infrastructure. And, in the worst case, that data may need to be de-anonymized so law enforcement can be dispatched to stop some bad guys from doing some very bad things.

But Congress tends to confuse national security with media industry preference. In the ongoing, and vaguely futile, effort to prevent media customers from fair use of the media they've purchased, Congress keeps attempting to conflate security with DRM, and so we wind up with CISPA and SOPA and all the rest.

So where does this leave us?

For the ZDNet IT audience, there are two things you need to keep in mind. First, you will need to understand real-time big data and what it means, how it works, its strengths, limitations, and what it can do for you.

To that end, I invite you to a free webcast I'm giving on Thursday at 2pm ET. In that, I and Dan Kearnan, senior director of SAP HANA Marketing, will be discussing keeping up with the volume, velocity, and variety of big data in real time.

Second, it's important to keep an eye on legislative activities, and understand when our privacy rights are being violated compared to when our security is being protected. This difference is a nuance quite clear to the rank-and-file investigators in America's famous three-letter agencies, but seems quite lost on Congress members more devoted to their lobbyist friends than their own constituents.

Keep reading ZDNet and stay up on these issues. This is only going to get more interesting as we move further into the future.

Related stories

Editorial standards