IBM is announcing today new capabilities in the Watson Data Platform (WDP), part of the Cloud. As ZDNet's Larry Dignan wrote about in September, IBM's spin and strategy is to make WDP into a "data science ." Put another way, IBM is looking for it to a for AI.
Also read: IBM's Watson Data Platform aims to become data science operating system
Composed of numerous including managed NoSQL databases, Watson Machine Learning and APIs for natural language understanding, recognition, chatbots and more, IBM is the platform with new features.
Catalog, refinery and governance, oh my!
First off, IBM is adding to its Data Catalog and Data Refinery offerings with features intended to data enrichment and data cleansing capability. By doing so, the company believes, customers will be better able to create data sets that are more , and of higher quality. When combined with feature engineering work (determining columns in a data set that are germane to predicting the value of others), such data sets can be used to create superior machine learning models.
Metadata gleaned from Data Catalog and Data Refinery can help data governance policies. Such governance becomes more relevant each day as new data breaches surface, and regulations meant to mitigate them multiply. Accordingly, IBM is also adding new features to its Unified Governance Platform, including capabilities aimed squarely at regulatory compliance, like the European Union's General Regulation (GDPR).
Also read: IBM launches Apache Spark cloud service
As I alluded to earlier, while all of these components (especially Analytics Engine) are data lake technologies, IBM really sees them as the basis for doing better machine learning and artificial intelligence.
By combining the power of the Watson brand with AI-oriented marketing, IBM is making a bet that it can get real traction for its data platform, and compete with the likes of , and for cloud data platform dominance. That's a tall order but it's not , especially given IBM's Enterprise cred and the breadth and legacy of its on-premises data stack.