My inbox is hit with so many Big Data product and news announcements every day that I could make a full-time job out of triaging them. Today, though, a set of announcements come that bear certain witness to 2013 as the year of Big Data's Enterprise maturation process.
I've spent most of my career in the Enterprise development and tools space, so I suppose I could be jaded about these announcements. But having been an avid Big Data watcher for more than a year, I must admit I wasn't sure I'd see the day when SQL, Enterprise security, monitoring and seamless provisioning would make up the headline fodder in this space.
And yet, that's exactly what's happening today. Let's take a look at the details...
Cloudera annouces general availability of Impala 1.0
Back in October, Cloudera shocked me a little by telling me they believed MapReduce wasn't the solution for all Big Data problems, and that they were building a parallel SQL query engine that could work over data in Hadoop's Distributed File System (HDFS) and bypass Hadoop's MapReduce engine, permitting speedy iterative (non-batch mode) query over Hadoop data. Today that product, Impala, has reached GA. And since it's not only fast, but API-compatible with Apache Hive, lots of existing BI tools can work with it right away.
This fits nicely with Cloudera's announcement yesterday that it has formed an alliance with BI powerhouse SAS. That alliance is not just a business arrangement either, as SAS engineers have adopted their technology to deploy physically over Hadoop clusters and perform their analyses in a parallel fashion. This is a huge deal as it avoids data movement between SAS and Hadoop, analyses can be performed over full data sets and not just samplings of the source data.
Splunk releases version 2.4 of its App for Enterprise Security
The combination of Splunk Enterprise and Splunk's App for Enterprise Security brings to bear statistical analysis of user-generated machine data to discover unknown threats to digital systems in real time. Rather than merely monitoring for known threat patterns, Splunk's suite uses search, dashboards and visualizations to detect anomalies and outliers that may indicate the presence of yet-unidentified threat patterns instead. Clearly, bringing hardcore statistical analysis to users, rather than just raw Big Data storage and query capability, is where real value gets added.
MongoDB gets a backup service
10gen, the company behind venerable NoSQL document store MongoDB, is today announcing the limited release of MongoDB Backup Service, joining the free, cloud-based MongoDB Monitoring Service (MMS) it launched previously. Given my own 20-year background in relational databases, the lack of a dedicated backup service seemed a bit strange to me for such a cloud-friendly product as Mongo. But the Big Data and NoSQL worlds are no strangers to gaps like that. The more important point is that the gaps are being filled very rapidly this year, and 10gen's announcement is part of that trend.
Basho and SoftLayer provide Riak in the cloud
Speaking of NoSQL and cloud friendly, Basho, the company behind NoSQL key-value store Riak, has teamed with cloud provider SoftLayer, to provide an easily provisioned option for running Riak in the cloud.
Customers can deploy either open source Riak or the more robustly-supported Riak Enterprise on the SoftLayer cloud. Basho and SoftLayer have worked together to provide servers specifically tuned to run Riak and, quite interestingly, provide the servers in physical "bare metal" form, rather than as virtual cloud server instances. While that can make the deployment more complex, the companies say the process is nonetheless automated, and that servers are provisioned in under two hours.
They grow up fast
Clearly the BI and Big Data worlds are converging now, with Hadoop and NoSQL databases acquiring the accoutrements they need to work in the Enterprise. Before, companies in the space were, arguably, catering to what investors were impressed by. Integrating with SAS, offering bare-metal servers, cloud backup and protecting infrastructure are the things customers care about, and that's the stuff that profitability is made of.