When I started this blog a bit over a year ago, I felt convinced that the separation between Big Data and Business Intelligence was arbitrary and contrived. It would seem that San Mateo, CA-based Platfora agrees, as it is today releasing for general availability its namesake product that seeks to bridge the BI-Big Data divide.
To your last dying day
The Big Data world, which has orbited around Hadoop, and the BI world, which has orbited around OLAP cubes, charts and dashboards, have mutually maintained their segregated status quo. Big Data and BI are the Jets and the Sharks of the technology industry: they live in the same neighborhood and have much in common, but thrive on obsessing over their differences.
Apache Hive has attempted to broker a peace between the warring clans, but has achieved détente at best. Yes, most BI tools now connect to Hadoop via Hive, but using that technology, Hadoop keeps working in its batch mode ways, to the chagrin of BI tools that want quick results, so they can fire off subsequent queries.
It would seem the folks at Platfora agree that a positive peace is possible, and that the mere truce brought about by Hive is at best a stopgap solution, and one that has outlived its usefulness. Platfora's namesake product is a business-user-friendly BI tool that combines an HTML 5-based user interface, a data visualization engine, and an in-memory analytical database — supporting what Platfora calls its "Fractal Cache" technology — that maintains in-memory "lenses."
Platfora is certified with the big three Hadoop distros: Cloudera's Distribution Including Apache Hadoop (CDH), the Hortonworks Data Platform (HDP), MapR, and even newcomer EMC/Greenplum Pivotal HD. My guess is that Platfora won't take long to obtain similar certification for Intel's Distribution for Apache Hadoop as well, although no such certification has been announced.
Same thing, only different
I met with Platfora's Founder and CEO, Ben Werther, at last week's GigaOM Structure:Data event at Chelsea Piers in New York City. Given my own opinion on the futility of separating Big Data and BI, I was very interested in what Mr. Werther had to say. But I was also skeptical. After all, Cloudera's Impala, Hadapt, and numerous Massively Parallel Processing (MPP) data warehouse products, all attempt to provide interactive (rather than batch) query capabilities over Hadoop, to enable BI-style analysis of Big Data. What makes Platfora different?
I'll withhold judgement until I see the product demoed in-depth, but for now I can at least convey what Werther told me: unlike various data warehouse solutions which must be supported by careful schema design and sometimes elaborate Extract, Transform and Load (ETL) processes, Platfora supports the same sort of just-in-time schema determination as Hadoop itself. Platfora also generates its own, optimized MapReduce jobs, providing a nice middle-ground between the tedium of hand-coded MapReduce and Hive's somewhat one-size-fits-all approach of generating MapReduce code by translating SQL queries.
Hadoop for the masses
Effectively, Platfora is taking the concept of Pervasive BI and bringing it forward to the Big Data world. But the vision of Pervasive BI was never quite realized, so can Platfora make Pervasive Big Data a reality?
I think it's a steep climb, but I'm glad someone's trying. And with Werther having served as a Senior Product Manager at both Siebel and Microsoft, Director of Product Management at Greenplum and VP of Products at DataStax, Platfora's certainly coming at it from the right vantage point.
BI, SQL, and Hadoop are converging, with momentum that appears unstoppable. The only questions are how long will the convergence take, and whose solution(s) will win out.