Domino Data Lab adds autoscaling to MLOps

Domino Data Lab 5.0 is focusing heavily on the deployment, autoscaling, and monitoring part of the machine learning lifecycle.
Written by Tony Baer (dbInsight), Contributor

As big on Data bro Andrew Brust reported last fall, Domino Data Lab has of late been taking a broader view of MLOps, from experiment management to continuous integration/continuous delivery of models, feature engineering, and lifecycle management. In the recently released 5.0 version, Domino focuses on obstacles that typically slow physical deployment.

Chief among the new capabilities is autoscaling. Before this, data scientists had to either play the role of cluster engineers or work with them to get models into production and manage compute. The new release allows this step to be automated, leveling the playing field with cloud services such as Amazon SageMaker and Google Vertex AI which already do, and Azure Machine Learning offers in preview. Further smoothing the way, it is certified to run on the Nvidia AI Enterprise platform (Nvidia is one of the investors in Domino).

The autoscaling features build on support for Ray and Dask (in addition to Spark) that was added in the previous 4.6 version, which provides APIs for building in distributed computing into the code.

Another new feature of 5.0 tackling the deployment is the addition of a new library of data connectors, so data scientists don't have to reinvent the wheel each time they try connecting to Snowflake, AWS Redshift, or AWS S3; other data sources will be added in the future.

Rounding out the 5.0 release is built-in monitoring. This actually integrated a previously standalone capability and had to be manually configured. With 5.0, Domino automatically sets up monitoring, capturing live prediction streams and running statistical checks of production vs. training data once a model is deployed. And for debugging, it captures snapshots of the model: the version of the code, data sets, and compute environment configurations. With a single click, data scientists spin up a development environment of the versioned model to do debugging. The system, however, does not at this point automate detection or make recommendations on where models need to be repaired.

The spark (no pun intended) for the 5.0 capabilities is tackling operational headaches that force data scientists to perform system or cluster engineering tasks or rely on admins to perform it for them.

But there is also the data engineering bottleneck, as we found from research we performed for Ovum (now Omdia) and Dataiku back in 2018. From in-depth discussions with over a dozen chief data officers, we found that data scientists typically spend over half the time with data engineering. The 5.0 release tackles one major hurdle in data engineering -- connecting to popular external data sources, but currently, Domino does not address the setting up of data pipelines or, more elementally, automating data prep tasks. Of course, the latter (integration of data prep) is what drove Data Robot's 2019 acquisition of Paxata.

The 5.0 features reflect how Domino Data Lab, and other ML lifecycle management tools, have had to broaden the focus from the model lifecycle to deployment. That, in turn, reflects the fact that, as enterprises get more experienced with ML, they are developing more models more frequently and need to industrialize what had originally been one-off processes. We wouldn't be surprised if Domino next pointed its focus at feature stores.

Editorial standards