X
Business

Is this the age of Big OLAP?

There are now three vendors in the burgeoning OLAP on Hadoop sub-segment of the analytics world. Their architectures differ but their goals are generally the same: make Hadoop into a scaleable, distributed OLAP engine, compatible with numerous BI front ends.
Written by Andrew Brust, Contributor

There was a time when analytics was synonymous with BI, and BI was synonymous with OLAP -- Online Analytical Processing. The term was meant to contrast with the more common Online Transaction Processing (OLTP), and involved the creation of multi-dimensional "cubes" rather than 2-dimensional tables. Each dimension is a different category with which to perform drill-down analyses on numerical data, known as measures.

Now that you've had your BI 101 crash course, look around at numerous analytics products like Tableau, and you'll see the paradigm of dimensions and measures is alive and well. OLAP never died, even if its underlying technologies morphed a bit.

The justice of scale
What has dogged OLAP, though, is its scalability. Most OLAP servers run on single, albeit beefy, servers, which limits the parallelism that can be achieved and therefore imposes de facto limits on data volumes. Customers who hit these scalability ceilings may contemplate using Big Data technologies, like Hadoop and Spark, but those tend not to employ the dimensional paradigm to which OLAP users are accustomed.

What to do? Well, a few vendors have decided to take Hadoop and Spark, and leverage them as platforms on which big OLAP cubes can be run and built. The vendors, namely AtScale, Kyvos Insights and Arcadia Data, have looked at Big Data adoption patterns in some enterprises, and seen that momentum has stalled. Their approach has been to let people in those enterprises work in the OLAP environments they are comfortable with and, at the same time, make use of their Hadoop clusters.

AtScale's CEO, David Mariani, was the person behind a couple of the biggest Microsoft OLAP projects (at Yahoo and Klout). And while he was able to do big things with Microsoft's OLAP platform, SQL Server Analysis Services (SSAS), he definitely hit up against scalability limits, including cube re-processing times of as much as a week. His motivation for creating a more resilient OLAP platform was pretty clear, so he formed AtScale.

Three approaches
AtScale cubes can be made to appear to BI clients as if they were SSAS cubes. So compatibility is high. Meanwhile, when the measures and dimensions of an AtScale cube are queried, AtScale generates corresponding SQL queries to get the data from underlying tables in Hive and Spark SQL. In this sense, AtScale is implementing a type of ROLAP (relational OLAP) over tables stored in Hadoop.

If it's a physical cube you desire, but you still want the distributed processing and storage of Hadoop, then Kyvos Insights' product may appeal to you. It implements a persistent cube, which acts as a dimensional cache of the underlying data, complete with stored aggregations. The cube itself is made up of multiple "cuboids," as Kyvos calls them, each of which is stored on a different node in the cluster.

And what if you want the familiarity of OLAP without the need to model a cube explicitly (i.e. define all the measures and dimensions before doing any analysis)? Then you might take a look at Arcadia Data, which lets you perform ad hoc analyses that beget derived structures -- generated cubes, in effect -- that the engine can query. You can then refine the design of these derived structures into full-fledged cubes, and you always can design cubes in the more conventional model-first approach. Arcadia also provides its own visualization facilities, rather than relying on the use of other BI tools as a front-end.

Destination or transition?
There's a range of options here, any of them bound to help organizations struggling with Hadoop adoption. But while these technologies do help take the familiar OLAP approach to analytics and get it past its scaling limitations, they don't usher the customer into the use of Hadoop and Spark in their native "habitats."

Some customers may see that as a good thing. Others may see these products as bridging technologies, useful for a time. During the transition period, employees can get used to working with unstructured data, and doing analysis before modeling, rather than the other way around. Still other customers may wish to jump into working with Big Data in a more "indigenous" fashion. What's most important is the functionality users get, the return on investment that enterprises get, and the results they both generate.

Disclosure: the company I work for, Datameer, provides a business user tool and platform for working with Big Data that does not use the OLAP metaphor.

Editorial standards