This guest post is from Vishwas Lele, CTO at Applied Information Sciences, a provider of software and systems engineering services to government agencies.
By Vishwas Lele
At AIS we work on IT initiatives across several U.S. Federal agencies including The Departments of Defense, Homeland Security and Justice. With our work in mind, I thought I would share some thoughts on the White House's recent Big Data announcement. While much has been said about the grand visions behind this initiative, my focus in this post is the usefulness of Big Data in medium-to-large tactical/operational IT projects.
Big Data is not just for canonical use cases (such as genome data analysis, video and image analysis), but is equally important for Federal agencies in accomplishing their core missions. The majority of the applications we build are targeted towards implementing new (or optimizing existing) business processes, and even these applications that can generate a lot of data.
Strategic value amid tactical challenges But there are challenges here. Given that the primary driver for these initiatives is to improve efficiency (and compliance), the data analysis part is often an afterthought. Even in cases where due importance is given to data analysis, the data collection strategy flows directly from the existing requirements. For instance, the grain (least count) of our data sets is governed by the level of drill-down that users have asked for today.
Similarly, the amount of the historical data that is kept around is governed by the parameters used for capacity planning. These decisions are a result of limited resources (such as storage infrastructure) and the traditionally non-trivial cost of preparing data for analysis. These costs have arisen because traditional BI tools require data be organized in well-defined structures.
Bringing analysis within Federal reach But now things may change. The advent of Big Data can bring the tools for arbitrarily large data collection and analysis within the reach of Federal agencies, even when resource-bound as discussed above. This is possible through adoption of open source frameworks such as Hadoop or Storm, commodity hardware and familiar SQL-like query constructs provided by such tools as Hive. Using an ODBC database driver for Hive, that imports results from a Hadoop query into Excel for further analysis, extends the life and usefulness of the data collected, and can be done affordably.
The advent of “Hadoop-as-a-service” from public cloud providers such as Microsoft and Amazon can greatly lower costs as well. The existence of such cloud solutions means that agencies without a continuous need for Big Data can use Hadoop on an as-needed basis. And agencies that cannot move to a public cloud environment for security reasons can benefit from the community cloud-based Hadoop-as-a service offerings.
A private sector example The CIO for travel services provider Orbitz decided to harness data that was going uncollected and unanalyzed. He initiated a big data strategy that allowed Orbitz to keep the logs of user activity indefinitely (prior to this initiative, logs were kept only for fixed number of days). This change caused the collection volume to grow from 7 TB to 750 TB. However, big data techniques made it possible for Orbitz not only to manage this data volume but turn it into key insights about their customers.
A public sector commonality AIS believes that Federal agencies can apply similar Big Data techniques to increase insight, by harnessing currently-uncollected data. For example, financial agencies can use Big Data to improve fraud detection. Similarly, law enforcement agencies can improve open-source intelligence collection and analysis.
Hopefully the spotlight on Big Data as a result of the recent White House announcement will encourage Federal agencies to take notice.