Splice Machine 2.0 combines HBase, Spark, NoSQL, relational...and goes open source

RDBMS-on-Hadoop database Splice Machine onboards Apache Spark and goes open source. Is it trying to be all things to all people, or is it just combining a set of raw technologies and making them useful and readily available?
Written by Andrew Brust, Contributor

In the worlds of Big Data, NoSQL and relational databases, Splice Machine's name doesn't come up that often. But a closer look at the company's product, architectural approach and CEO put them on my radar a while back. And Version 2 of the product, which is being announced today, has made that radar dot much brighter.

Also read: The NoSQL community threw out the baby with the bath water

Also read: Full SQL on Hadoop? Splice Machine opens up its database for trials

Have RDBMS cake, eat NoSQL scaling, too
Before we look at version 2, let's cover the motivation behind v1. Specifically, Splice Machine looked long and hard at some pressing database conundrums:

  • The relational database model (along with SQL) works well -- best, in fact -- in many circumstances, but scaling it has always been hard.
  • NoSQL databases are much easier to scale but the schema-less model and lack of "ACID" (Atomicity/Consistency/Isolation/Durability) guarantees can be disorienting.
  • Hadoop scales well too, and its HDFS file system has become an important storage standard, but Hadoop's batch model can also cause dissonance for relational database professionals

The solution: create an ACID-compliant, SQL relational database on top of Apache HBase -- a NoSQL database that uses HDFS as its storage layer. Now you've got SQL, the relational model, ACID/transactional consistency, horizontal scaling and HDFS, all in one product.

Also read: Splice Machine's SQL on Hadoop database goes on general release

Sparking v2
So version 1 is pretty cool but version 2 of the product ups the ante considerably: it on-boards another important data technology -- Apache Spark -- as an additional execution engine.

Splice Machine's CEO, Monte Zweben, gave me the lowdown on v2. Zweben is an alumnus of Stuyvesant High School, Carnegie Mellon, Stanford and the AI branch of NASA's Ames Research Center; he's also Rocket Fuel's Chairman of the Board.

Clearly no dummy, Zweben explained that the product employs a cost-based optimizer to enlist the services of Spark for queries that are long-running, have lots of scans and/or multiple phases of execution. Analytical queries often fit that profile, and will be well-handled by Spark. Simpler, operational queries will still be executed via HBase.

Gentlemen, you don't have to choose your engines
Splice Machine users need not concern themselves with these implementation details; they just query the database in SQL and Splice Machine handles the rest. And, by the way, Splice Machine will use the core Spark engine, rather than going through Spark SQL, which would just add an unnecessary layer.

Open source = Open Sesame?
Splice Machine is a well-kept secret though; Zweben told me the company has about 10 customers. Although he hails from the world of commercial software, Zweben believes that open sourcing the Splice Machine product will help spread the word more widely. So version 2 of the product will be available in a free and open source Community Edition with the full database engine. A paid Enterprise Edition, that includes professional support and DevOps features like integration with LDAP and Kerberos as well as backup and restore, will provide the monetization model for the company.

Zweben believes that open sourcing the product will help build a community and an ecosystem around it, which is clearly needed. Nonetheless, Splice Machine does not see open sourcing the product as the only necessary step there. Accordingly, the company will be making major investments in ecosystem infrastructure, including a community Web site with tutorials and code, and an Amazon Web Services-based "sandbox" environment that allows for a low-friction setup of the product in the cloud, for evaluation, training and perhaps some development purposes.

Using open source as a vehicle for product evangelism is sensible. Open source community editions are in many ways analogous to free evaluation and developer editions offered for closed source software products.

Also read: Hadoop vendors are listening: Hortonworks gets pragmatic

Unintended consequences
Splice Machine Community Edition will be available on GitHub under an Apache open source license, but will not be an Apache Software Foundation project, a least not initially. Meanwhile, Apache Phoenix, which also offers a SQL relational-on-HBase database, is an ASF project. Will open sourcing Splice Machine thus expose it to competition it may not have directly faced before?

The reality is that ACID transactions in Phoenix are only a Beta feature and table JOINs in Phoenix are limited. This makes Phoenix more of a SQL-on-HBase component and less of a true relational database meant to be used in a standalone manner. But Phoenix is clearly looking to bridge those gaps so some competition is inevitable.

Rubber, meet road
Splice machine certainly has an uphill battle ahead, to compete, build a community and add customers. But with a total of $31M in funding and a very experienced and knowledgeable CEO, the company has significant prowess. Going open source and adding support for Spark (that users can take advantage of without any special effort) makes a good thing better. Now it comes down to grit.

Editorial standards