Tech industry delivers report on government Big Data

Industry consortium TechAmerica Foundation delivers its report on what government is doing with Big Data, and what it should do going forward.

This afternoon, TechAmerica Foundation’s Federal Big Data Commission, released its report, "Demystifying Big Data: A Practical Guide To Transforming The Business of Government."  The leadership and commissioners in the group hail from an array of companies, including Big Data and Business Intelligence companies Cloudera, Splunk and MicroStrategy, but also mega-vendors IBM, SAP and Microsoft; cloud services provider Amazon Web Services, hardware players Dell and HP; storage giants EMC and NetApp; and professional services firms Grant Thornton and CSC.

The report focuses on the value of what had until recently been referred to as "open data" but now, like many data-related areas, has been bucketed into the Big Data discussion.  Some deconstruction of the term "Big Data" is therefore necessary, and that’s exactly where the report begins.  Right away, in the foreword, the report's authors confront the definition problem:

…there remains a great deal of confusion regarding what the term [Big Data] really means, and more importantly, the value it will provide…This confusion may be due in part to the conversation being driven largely by the information technology community versus line of business community, and therefore centering primarily on technology.

The report then proceeds to present a definition of Big Data and its "business/mission" value; Big Data case studies; technology underpinnings; a suggested roadmap; and commentary on appropriate public policy.  Here are some highlights:

  • In the section covering Big Data’s definition, the report presents a table describing four major characteristics of Big Data. They include the usual “three Vs” (Volume, Velocity and Variety) as well as a fourth, Veracity, referring to data quality.
  • In the section on business/mission value, the report proclaims "Many key tenets for 'Good Government' and Big Data overlap."  This seems a useful context for the discussion, even if the "veracity" of the statement is not self-evident.
  • The case studies section profiles ten Big Data projects from various US Federal Government agencies, but includes projects from academia and the private sector, and from other countries, including Sweden, Denmark and Canada.  The projects focus on a mixture of administrative and scientific domains.
  • In the technology underpinnings section, the report suggests that in addition to MapReduce-based technologies like Hadoop, data warehousing and OLAP (OnLine Analytical Processing) should be considered relevant tools as well.  Oddly, the report describes DW and OLAP as if they were a single technology.
  • The roadmap section divides its approach into phases of defining, assessing, planning, executing and reviewing.  It also stresses the importance of proceeding in an iterative fashion.
  • In the policy section, the report explains that "there are over 40 laws that provide various forms of
    privacy protections."  It nonetheless recommends that the Federal Office of Management and Budget (OMB) provide further guidance on Big Data privacy and that this "could help accelerate the uptake of Big Data."

The report’s conclusion recommends that the Office of Science and Technology Policy (OSTP) develop a national R&D strategy for Big Data and that each federal agency name a Chief Data Officer, following the Federal Communication Commission’s example.  These seem like sensible and actionable recommendations.  It will be interesting to see how effective their enactment may be, and I’m eager to see if the Feds provide prescriptive guidance for state and local agencies in their Big Data endeavors as well. 

I’m especially keen to see whether government agencies can attract data science talent or if much of that work will have to be contracted out.  I would think that successful Big Data outcomes in government will rely on policymakers’ involvement and empowerment.  Whether today’s Big Data technology can accommodate such a scenario is far from clear.

Show Comments