Business Intelligence (BI) and Big Data engage in a very flirtatious dance, coming close, then diverging, but remaining within mutual line of sight, in anticipation of eventual unification.
Datameer knows all about that. Its namesake product utilizes Apache Hadoop as a workhorse to perform analytics on large data sets. It has an array of connectors for most any data source, and works with virtually every Hadoop manifestation, including Apache's own code, distributions from Cloudera, Hortonworks, MapR, IBM, EMC, Yahoo, Amazon's Elastic MapReduce and even Microsoft's Hadoop distribution for its Windows Azure cloud computing platform.
Also read: SQL Server 2012 RTM: Big deal for Big Data?
Also read: Datameer to Bring Hadoop Analytics to Windows Azure
Taming Hadoop, not subjugating it
I've discussed here many times the issue that Hadoop is powerful but its Java-based MapReduce interface is hardly business user-friendly. Many BI companies combat this unfriendliness by connecting to Hadoop through Hive, effectively treating it as another relational data source, and leaving the rest of the traditional BI rubric intact.
Datameer attacks this problem differently. It really wasn't until I had the chance to talk with the company's CEO, Stefan Groschupf, that I really understood how or why. Datameer's take is that Hive-based approaches retain the traditional, and less-than-agile, BI workflow of formal ETL and an a priori commitment to a particular schema before analysis can even begin. Meanwhile, Hadoop's great power is that it permits and encourages the imposition and interpretation of schema during analysis, rather than making schema a prerequisite for analysis.
So Datameer provides a spreadsheet-like UI to source data and identify how it should be shaped. Then, with the help of some 200 built-in analytic functions, it generates native MapReduce code to collect, shape and analyze data on the fly using Hadoop. And while it doesn't rely on Hive for Hadoop abstraction, Datameer does support Hive, both as a data source and a data export endpoint.
(Hadoop) power to the people
That's all well and good, and it's a much more progressive approach to BI on Big Data than shoehorning Hadoop into older data warehousing and OLAP paradigms. But if it costs $100,000 and up to license Datameer on a large Hadoop cluster, then how will the pervasive and self-service movements that have been energizing BI take part? Datameer has been exclusively an Enterprise BI tool, with individual and departmental BI scenarios -- and customers -- left unaccommodated. Given Hadoop's cluster-oriented focus, it's perhaps not surprising that this has been the case.
But that changes today with the introduction of version 2.0 of Datameer's product. And that's because, in addition to revving Datameer Enterprise, the company is introducing Datameer Workgroup, accommodating up to 50 users, running on a single server, and Datameer Personal, a single-user product that runs on a desktop PC. For a long time we've been hearing that Hadoop supports virtually unlimited scale-out. With today's announced introductory pricing of $2999 for its Workgroup edition and $299 for its Personal edition, Datameer is making Hadoop scale down too. Datameer Personal can run on Windows, Mac or Linux desktops and if you don't wish to install it "on the metal," it's available in the form of a VMWare virtual machine image as well.
Groschupf explained to me that Hadoop's ability to perform sequential reads on huge volumes of data (akin to large table scans in a relational database), and do so very quickly, is highly beneficial for analytics work, even on a single node. So these new Datameer editions make the technology much more approachable for individual and departmental applications, and yet still leverage important capabilities of Hadoop. Plus, Datameer will make upgrades available between Personal and Workgroup, as well as between Workgroup and Enterprise editions.
Datameer 2.0 provides more than just these new entry-level editions. It also implements a fully HTML 5-based user interface, delivering compatibility with more platforms and device from factors (in other words, you can use it on an iPad), and its data visualization component graduates to become the Business Infographics Designer, though it still can produce more conventional charts and graphs. Here too, the technology is HTML 5-driven, and because it uses vector-based graphic techniques, the cross-device compatibility can be delivered without loss of visual fidelity, on very small or very large displays.
Datameer is a big name in the Big Data arena. And Datameer making Hadoop more accessible, both in terms of technology and project scope, is very significant for the industry. It's also significant to the new user populations that can now avail themselves of the power of Hadoop. Now individual and departmental BI users have an on-ramp to become Big Data users.
And the dancers suddenly draw closer together.