DataRobot, a leading player in automated machine learning (ML) and artificial intelligence (AI), has acquired Paxata, one of the early self-service data preparation pure play vendors. DataRobot says the acquisition of Paxata will help it "bolster its end-to-end AI capabilities;" in fact, it headlined its press release on the subject with that very wording. Terms of the deal were not disclosed.
Paxata, on its own, was arguably more focused on data preparation for straight-up descriptive analytics, rather than AI. But AI platforms need data prep too, to help data scientists streamline and cleanse their data sets. Data prep can also be extremely helpful in so-called feature engineering work, which aims to extract ML model inputs (the "features") into their own data columns from specific subsets of column data present before the data prep work takes place.
I spoke with Phil Gurbacki, SVP of product development and customer experience at DataRobot, who told me every DataRobot user needs to do data prep in order to be successful with ML. As such, Gurbacki said that while the standalone Paxata product will remain available, the company is most enthusiastic about taking Paxata data prep and bringing it to every single DataRobot customer in an integrated fashion.
Gurbacki also explained that data prep workloads for AI and ML are different than they are for BI and analytics. First, prep for AI is typically focused on a narrow set of columns that are transformed into the model features. Also, data prep is needed not just for training ML models, but also for prepping the data scored by those models as predictions are generated. Data prep on scoring data needs to happen with very low-latency and is, by its nature, a frequent, production process. This differs from BI data prep, which is conducted less frequently, on larger data volumes, against a broad set of columns.
Though the workloads differ, DataRobot sees the Paxata technology as being ready and able to accommodate both scenarios.
Prep, for the people
Paxata was founded in 2012 by a team that included seasoned veterans from the enterprise business intelligence (BI) world. Co-founder & chief product officer Nenshad Bardoliwalla is an alumnus of legacy CRM vendor Siebel's analytics team, as well as BI pioneer Hyperion, and SAP (Siebel and Hyperion were both acquired by Oracle). Co-founder and CEO Prakash Nanduri hailed from Tibco and SAP.
I met Bardoliwalla at a TDWI chapter meeting in NYC, where he presented when Paxata was still in stealth mode. He explained that he and others has the strong belief that data prep in the enterprise BI world was too hard and too reliant on IT specialists. This state of affairs, in turn, disenfranchised business users from pursuing analytics with enthusiasm and effectiveness.
If this were an analogy question on a standardized test, we might say [Paxata]:[data prep] as [DataRobot]:[AI and ML]. Both companies have sought to democratize their respective technology areas, by offering self-service platforms that empower business users and mitigate their reliance on rarefied specialists. With that in mind, the acquisition makes a great deal of sense, something Gurbacki confirmed when he told me that "DataRobot's mission is to build an enterprise AI platform that bridges the gap between raw data and business value."
Vendor category or feature set?
It's also the case that data prep as a pure play vendor category is getting whittled down, through diversification and, now, consolidation. Alteryx has significantly broadened its platform, through the acquisitions of Semanta and Yhat, in the data catalog and AI arenas, respectively. Datameer has done likewise with the introduction of its Neebo data virtualization platform. And while Trifacta remains independent, the company is highly focused on cloud data warehouse and data lake scenarios, and its technology is leveraged by Google for its Cloud Dataprep product. Meanwhile, home-grown self-service data prep has been added by companies like Microsoft, Informatica, Talend and Tableau, to their own stacks and core products.
It's a natural stream of events for innovation in a specific technology area (like self-service data prep for Big Data) to beget multiple pure play vendors who productize that innovation. And it's a natural consequence, as an area of innovation matures, for its vendors to be acquired, both by incumbents and players in newer areas, like AI. We've seen this happen with BI and -- while one data point doesn't constitute a trend -- maybe now we'll see it with data prep.