Splice Machine doubles down on managing machine learning

Splice Machine, which positions itself as a database on which you can do operational machine learning, is adding a new tool that performs lifecycle management of ML models.


It's common practice for database and analytics products to AI-wash themselves. For instance, many platforms have added support for running Spark compute jobs, and because Spark supports ML, they can make that claim. As we reported a couple years back, Splice Machine has already taken the first steps by integrating Spark analytics and Zeppelin notebooks.

Splice Machine has now taken the next step a new ML Manager feature that provides lifecycle management for machine learning models. It bundles into the database the type of functionality that would otherwise require separate tools like Data Robot, Domino Data Lab, or Dataiku. The closest parallel would be Cloudera's Data Science Workbench, which plays a similar role with the company's Hadoop platform. While it allows models, notebooks, and al their attributes such as features and hyperparameters and data sources to get tracked, it currently lacks some of the collaboration features such as chat or annotation capabilities that many of the third-party tools provide.

The strong point of Splice Machine's ML Manager is the fact that it is built atop the database, meaning that data can be ingested without having to serialize it. After experiments testing different variations of the model are completed, Splice Machine's Spark integration makes it straightforward to populate data into a Spark DataFrame, paving the way for the models to be run.

With its Spark integration, there is some architectural similarity with Databricks Delta, which the company recently open sourced with a new data lake capability that would make updates to Delta transactional (e.g., enforcing ACID consistency). But, as was pointed out to us in Twitter after our piece on Databricks delta ran, the transaction guarantees are only ironclad on HDFS, not cloud storage. Splice Machine's Spark-integrated analytics can also run with data from HDFS (on which HBase runs) or cloud storage. The difference is that Databricks ACID guarantees are run in batch mode, while for Splice Machine, the ACID support adds concurrency control at the cell (record) level.

From the get-go, Splice Machine has differentiated itself from other open source relational database platforms such as MariaDB or PostgreSQL with its roots in big data. As a hybrid transaction/analytic system, the OLTP side runs off Hadoop's HBase, while its analytics flavor can run on a variety of data sources, form file systems to cloud storage, as long as the data is in structured formats like Parquet. It has introduced its own managed database-as-a-service (DBaaS) in AWS and azure, and we expect that in the next year, Google Cloud platform will get added to the list.

Splice Machine's ace in the hole is very much tied to its machine learning capability. Accenture has made Splice Machine the core database for its AI platform. And as part of Splice Machine's Series B funding round back in February, Accenture's venture arm put some skin in the game.