Today was the first day of Strata + Hadoop World 2012 NYC, and the Big Data news just keeps coming. Here’s a roundup of today’s announcements:
Hadoop + in-memory BI
Platfora unveiled its In-Memory Business Intelligence platform for Hadoop. The product employs an HTML 5-based user-friendly front-end and, using something Platfora calls the “Hadoop Data Refinery,” translates the UI-derived request to MapReduce code that is then dispatched to Hadoop. The results are then loaded into the company’s in-memory database engine and its “Fractal Cache.” This contrasts rather markedly with the typical BI-to-Hadoop connectivity pattern, which uses SQL and Apache Hive.
MapR M7 includes accelerated HBase
MapR’s Network File System-based distribution of Hadoop is available for on-premise use and is publicly available in the cloud from Amazon (and soon from Google). A new version of this distribution, dubbed M7, speeds up performance of Apache HBase, the HDFS-based Wide Column Store NoSQL database. Through a combination of in-memory columns, elimination of the need for compactions, and utilization of data structures that “minimize the read- and write-amplification factor,” M7 seeks to make HBase more enterprise-ready. This complements nicely MapR’s previous focus on optimizing HDFS itself.
(Digitally) reasoned partnerships
Digital Reasoning, whose Synthesys product automates “entity oriented” analytics and visualization of Big Data announced a partnership with Opera Solutions, and integration with Oracle’s Big Data Appliance. The company also explained that it will soon announce a partnership with Tableau, enabling unstructured data visualizations within the Tableau's eponymous product.
Also read: Searching for data scientists as a service
Data Profiling for Hadoop
Talend, the prominent open source data integration/ETL vendor, announced that its Talend Platform for Big Data product now provides data profiling tools for Hadoop. Data profiling involves analysis of data quality; Talend’s profiling for Hadoop extends to data in raw HDFS files as well as in HBase databases. The profiling is performed in-place, avoiding the need to extract the data before profiling it. The product produces a “custom graphical report on the level of quality of organizations' data.” Talend’s provision of data profiling for Hadoop continues the fusion of BI and Big Data.
ParAccel merges MPP and Hadoop
Massively Parallel Processing (MPP) vendor ParAccel today announced an update to its Hadoop On Demand Integration (ODI) Module. The module utilizes Apache HCatalog and integrates at the MapReduce level to bring Hadoop-borne data into the ParAccel Analytics platform. This facilitates the ability to query Hadoop data via SQL, user-defined functions and downstream BI tools. The ODI Module is designed to be compatible with “all popular
Hadoop distributions” and is certified on Cloudera's Distribution Including Apache Hadoop (CDH).
Zettaset adds enterprise-class features to Hadoop
Zettaset, based in Mountain view, CA, announced today that version 5 of its Orchestrator product would ship by the end of the year. This version of the product adds features to Hadoop that most enterprise IT and data center managers will see as crucial, and largely absent from the Hadoop ecosystem. Features like Active Directory and LDAP integration, configuration management, logging, auditing, and role-based access control are features new to v5 of the product. Orchestrator is built to run on any Hadoop distribution, including Cloudera’s CDH, Hortonworks’ HDP, MapR's M3 and M5, and IBM InfoSphere BigInsights. It can even convert a cluster from one distro to another. Full-disclosure: I have a consultant-client relationship with Zettaset via a third party.
Where it's all going
The common thread through all these announcements is the “enterprise-ification of Hadoop. By integrating Hadoop with BI, Analytics, and Enterprise security products, and further integration and optimization of HBase, vendors are clearly pushing to make Hadoop “blend in” to enterprise environments by working with products and technologies already deployed in the data center.
There are still two days left to Strata + Hadoop World 2012 and there’s more news to come. I’ll be meeting with a slew of vendors on-site at the show tomorrow and will post my findings from those briefings in short order.