Big Data’s big week

During a single week in usually hum-drum July, a slew of companies had a Big Data beach party bash. Big Data, BI, Database and Cloud companies went crazy with partnership announcements and a new version of an in-memory database was born.

In just one week, the Big Data world saw several major partnership announcements that in aggregate tie together an Internet search powerhouse, two Hadoop all-stars, a decades-old database company, several Business Intelligence players and the maker of a real-time database for Hadoop.  For the kicker, an important Big Data in-memory database saw a new release.

Let’s review these announcements and what they mean for the Big Data market.

Where’s the Data?  Google it!
Search and advertising giant Google is a counterparty in two of these deals.  The Mountain View, California-based company appears to be very serious about making its cloud platform a serious contender in Big Data.  On Tuesday, the company announced partnerships with several database and BI companies around its BigQuery cloud-based column store.

In one deal, Google and database veteran Pervasive Software are teaming to allow Pervasive’s RushAnalyzer to provide Extract Transform and Load (ETL) functionality for BigQuery.  Google built a full RESTful API over BigQuery with the clear intent that existing Business Intelligence (BI) and ETL tools would integrate with it, and it seems to be working.

Maybe that’s why open source ETL provider Talend announced a similar BigQuery partnership with Google as well.  Talend’s Open Studio for Big Data is an Apache Eclipse-based graphical add-in for loading and extracting data from Hadoop by automating Hadoop Distributed File System (HDFS), Hive, Pig, HBase and Sqoop.  And now Open Studio for Big Data works with BigQuery, too.  The other BigQuery partnerships include deals with Informatica and SQLStream for ETL, Jaspersoft for reporting and analytics and QlikView for dashboards.

Don’t forget Hadoop
This week’s partnerships are not all about BigQuery though.  For example, Pentaho’s existing partnership with Cloudera got amped up on Wednesday.  Cloudera’s Distrubution Including Apache Hadoop (CDH) has for some time included the Sqoop import/export facility for interfacing Hadoop with SQL-based relational databases, and the Oozie component for creating and scheduling Hadoop workflows.  Like many Hadoop components though, neither of these features much in the way of tooling or graphical user interface (GUI).  That’s where this deal comes in.  Pentaho’s visual design studio now works with Sqoop and Oozie, providing a point-and-click GUI against both.

Back on Tuesday, San Francisco-based Drawn to Scale announced it will be redistributing MapR’s M3 distribution of Hadoop with Spire, Drawn to Scale’s real-time database for Hadoop.  MapR’s Hadoop distro embeds an HDFS-compatible network file system, files in which are readily updateable.

And speaking of real-time Big Data, Terracotta, another San Francisco firm (that is a wholly owned subsidiary of German company Software AG), announced version 3.7 of its BigMemory product on Wednesday.  Much like SAP HANA, another in-memory database from a German company, BigMemory is completely RAM-based, and employs a scale-out architecture. Both databases have the ability to handle transactional and analytical workloads.  Version 3.7 of BigMemory brings enhanced security, powerful data compression and new search capabilities too.

Big Data, BigQuery, BigMemory, Big Week
If it wasn’t already clear that everyone wants a piece of the Big Data action, it should be now.  The number of announcements during a mere 2-day period this week was staggering.  Now we just have to get these companies to take some time off in August.