In the worlds of Big Data, NoSQL and relational databases, Splice Machine's name doesn't come up that often. But a closer look at the company's product, architectural approach and CEO put them on my radar a while back. And Version 2 of the product, which is being announced today, has made that radar dot much brighter.
Have RDBMS cake, eat NoSQL scaling, too
Before we look at version 2, let's cover the motivation behind v1. Specifically, Splice Machine looked long and hard at some pressing database conundrums:
- The relational database model (along with SQL) works well -- best, in fact -- in many circumstances, but scaling it has always been hard.
- NoSQL databases are much easier to scale but the schema-less model and lack of "ACID" (Atomicity/Consistency/Isolation/Durability) guarantees can be disorienting.
- Hadoop scales well too, and its HDFS file system has become an important storage standard, but Hadoop's batch model can also cause dissonance for relational database professionals
The solution: create an ACID-compliant, SQL relational database on top of Apache HBase -- a NoSQL database that uses HDFS as its storage layer. Now you've got SQL, the relational model, ACID/transactional consistency, horizontal scaling and HDFS, all in one product.
So version 1 is pretty cool but version 2 of the product ups the ante considerably: it on-boards another important data technology -- Apache Spark -- as an additional execution engine.
Splice Machine's CEO, Monte Zweben, gave me the lowdown on v2. Zweben is an alumnus of Stuyvesant High School, Carnegie Mellon, Stanford and the AI branch of NASA's Ames Research Center; he's also Rocket Fuel's Chairman of the Board.
Clearly no dummy, Zweben explained that the product employs a cost-based optimizer to enlist the services of Spark for queries that are long-running, have lots of scans and/or multiple phases of execution. Analytical queries often fit that profile, and will be well-handled by Spark. Simpler, operational queries will still be executed via HBase.
Gentlemen, you don't have to choose your engines
Splice Machine users need not concern themselves with these implementation details; they just query the database in SQL and Splice Machine handles the rest. And, by the way, Splice Machine will use the core Spark engine, rather than going through Spark SQL, which would just add an unnecessary layer.
Open source = Open Sesame?
Splice Machine is a well-kept secret though; Zweben told me the company has about 10 customers. Although he hails from the world of commercial software, Zweben believes that open sourcing the Splice Machine product will help spread the word more widely. So version 2 of the product will be available in a free and open source Community Edition with the full database engine. A paid Enterprise Edition, that includes professional support and DevOps features like integration with LDAP and Kerberos as well as backup and restore, will provide the monetization model for the company.
Zweben believes that open sourcing the product will help build a community and an ecosystem around it, which is clearly needed. Nonetheless, Splice Machine does not see open sourcing the product as the only necessary step there. Accordingly, the company will be making major investments in ecosystem infrastructure, including a community Web site with tutorials and code, and an Amazon Web Services-based "sandbox" environment that allows for a low-friction setup of the product in the cloud, for evaluation, training and perhaps some development purposes.
Using open source as a vehicle for product evangelism is sensible. Open source community editions are in many ways analogous to free evaluation and developer editions offered for closed source software products.
Splice Machine Community Edition will be available on GitHub under an Apache open source license, but will not be an Apache Software Foundation project, a least not initially. Meanwhile, Apache Phoenix, which also offers a SQL relational-on-HBase database, is an ASF project. Will open sourcing Splice Machine thus expose it to competition it may not have directly faced before?
The reality is that ACID transactions in Phoenix are only a Beta feature and table JOINs in Phoenix are limited. This makes Phoenix more of a SQL-on-HBase component and less of a true relational database meant to be used in a standalone manner. But Phoenix is clearly looking to bridge those gaps so some competition is inevitable.
Rubber, meet road
Splice machine certainly has an uphill battle ahead, to compete, build a community and add customers. But with a total of $31M in funding and a very experienced and knowledgeable CEO, the company has significant prowess. Going open source and adding support for Spark (that users can take advantage of without any special effort) makes a good thing better. Now it comes down to grit.