US follows tech leaders into big data

The US government's 'Big Data Research and Development initiative' initially sees six government departments invest $200m into big data research to help them improve the technology used to make sense of vast amounts of data

The US has earmarked $200m to develop analytics for government data, hoping to encourage advanced 'big data' techniques.

The 'Big Data Research and Development Initiative' (PDF) was announced by the Obama administration on Thursday with an initial warchest of $200m (£124m) of new investments split across six federal agencies, with more set to be included over time. The agencies are going to work on tools to analyse vast data sets to help push forward environmental, health, defence and geological science.

To explain the scale of the big data problem, Kaigham Gabriel, acting director of the Defence Advanced Research Projects Agency (DARPA) said: "The Atlantic Ocean is roughly 350 million cubic kilometres in volume, or nearly 100 billion, billion gallons of water. If each gallon of water represented a byte or character, the Atlantic Ocean would be able to store, just barely, all the data generated by the world in 2010. Looking for a specific message or page in a document would be the equivalent of searching the Atlantic Ocean for a single 55-gallon drum barrel."

The administration compared the scale of the investment to previous schemes to advance supercomputing and help create the internet.

Big data push

Over the past few years enterprise technology companies including EMC, Oracle and IBM, have all started developing products to analyse large data sets from  disparate sources — 'big data'. Other companies have made huge bets on the field, with HP spending £7.1bn on British data analysis company Autonomy.

Security company and EMC subsidiary RSA has recommended that companies develop big data analysis capabilities to help them spot cyber-threats before they are attacked.

Under the US government's plans, the Department of Energy will spend $25m on creating an institute spanning six national laboratories and seven universities to create tools to analyse large data sets on the department's fleet of world-leading supercomputers.

The US Geological Survey will pay scientists to conduct big data research, while the National Science Foundation (NSF) will work with the National Institutes of Health to solicit projects looking at using big data for imaging, molecular and other biological sciences.

The NSF will also fund a $10m 'Expeditions in Computing' project at the University of California, Berkeley, and also support statisticians exploring protein structures. The Department of Defence (DoD) will make $60m available for new research projects to help it create "truly autonomous [military] systems that can manoeuvre and make decisions on their own" and improve the ability of combatants to use sensor information to gain battlefield awareness.

DARPA will launch the XDATA programme, investing $25m per year for four years into tools and scalable algorithms for analysing semi-structured and unstructured data, such as message traffic or databases; finally the National Institutes of Health has placed, at an undisclosed cost, 200TB of biological data from the 1,000 Genomes Project on the Amazon Web Services cloud.

