As I’ve reported previously, this month is a huge one for the Big Data news cycle. Next week's Strata + Hadoop World New York will bring a slew of announcements, and so did the first two weeks of this month.
A few more items came in today: a new report from Gartner; a new Big Data study from the IBM Institute for Business Value and the Saïd Business School at the University of Oxford; and a new product announcement from Teradata.
Big Data Spending First the Gartner news: the tech analyst and market research firm released a report today that says Big Data will drive a total of US$28 billion of IT spending worldwide this year, and US$34 billion in 2013. The total Big Data spend through 2016 will be a formidable US$232 billion.
Gartner also says that by 2020, Big Data will be a completely mainstream and embedded technology, rather than the standalone, somewhat fetishized category that it is today. The firm specifically says Big Data technology will be "non-differentiating and routinely expected from traditional enterprise vendors and part of their product offerings." I would agree there and would add that consolidation of many of the pure-play Big Data vendors we have today with some of the mega-vendors seems inevitable.
Big Data Study The IBM/Oxford study, "Analytics: The real-world use of big data: How innovative enterprises extract value from uncertain data," is a 20-page publication containing the results of a survey of 1144 business and IT professionals in 95 countries. It covers Big Data’s definition; surveys common Big Data Infrastructure; discusses Big Data’s top data sources, analytics capabilities and adoption stages (along with their sponsors, data availability and obstacles); and provides a set of recommendations for cultivating Big Data adoption.
Among other interesting facts, the study says:
The top four Big Data data sources are transactions, log data, "events" and emails
The top five Big Data capabilities are reporting, data mining, data visualization, predictive modeling and "optimization"
Only 6% of survey respondents are in the "Execute" phase of Big Data adoption (47% are still in the "Explore" phase)
63% of survey respondents report that the "use of information (including big data) and analytics" gives them competitive advantage
Big Data Appliance On the product announcement front, Teradata had two pieces of news. First, the company announced its new Teradata Aster Big Analytics Appliance. This looks to be a very interesting hybrid appliance, combining Teradata Aster’s SQL-MapReduce technology, Teradata’s Massively Parallel Processing (MPP) hardware platform, and Hortonworks’ Hadoop distribution, known as the Hortonworks Data Platform (HDP).
The hybridism cuts two ways here, using both SQL-MapReduce and a Teradata Aster technology called SQL-H. There’s some nuance here that can get confusing. Let me try and break it down:
SQL-MapReduce allows the creation of MapReduce functions written in Java, C# and other languages, which then become callable from SQL queries. This allows for what Teradata Aster terms "in-database analytics." The SQL queries, in turn, are applied to the Teradata Aster relational database.
SQL-H does somewhat the reverse: it allows SQL querying of Hadoop Data, both through Apache Hive and by using the Apache HCatalog metadata store (which unites querying of data through Hive, Pig and Hadoop’s MapReduce) to replicate the Hadoop data’s structure in the Aster data layer. Interestingly, from what I can tell, SQL-H queries can call SQL-MapReduce functions.
Teradata says the Big Analytics Appliance offers "up to 19 times better data throughput and performs analytics up to 35 times faster than a typical off-the-shelf commodity bundle." And if the word "appliance" worries you, Teradata counters with this: the product can store a maximum of 5 petabytes of uncompressed user data for Aster and up to 10 petabytes of uncompressed user data for Hadoop. There’s no appliance ceiling there; even for Big Data, that’s BIG.
And if the word "appliance" still worries you, perhaps because you're nervous that you can’t get hands-on with the technology before buying it, you might be interested in Teradata’s other news. The company also announced today that a new 5.0 release of Aster Express database is available as a free download. Aster Express provides a trial version of the Aster SQL-MapReduce framework and the SQL-H technology.
Big Data Conference Remember, there’s lots more news coming next week in conjunction with Strata + Hadoop World NY. Stay tuned.