In the latest twist to the "data warehouse in a box" saga, IBM is taking the wraps off its Cloud Private for Data (a.k.a., ICP for Data) offering that it first announced back in March at Think. IBM promises in the new release to offer a private cloud platform that goes well beyond data warehousing to support advanced data science and data engineering as well. It will be available as software on an IBM turnkey product and through clusters running natively on Red Hat OpenShift container application platform as well. The latter is key to connecting to the IBM WebSphere installed base -- they don't have to buy a separate cluster to take advantage of the new capabilities.
ICP for Data blends and reengineers a mix of IBM tools and platforms into a cloud-native container and microservices architecture that runs with Kubernetes orchestration. It's safe to say that ICP for Data is cloud buzzword-compliant.
It will tie in multiple data stores including the well-known Db2, which offers the data warehouse. But it also includes IBM Db2 Event Store, a recently-introduced platform that is designed for extremely high ingest of event data -- this will be especially useful for connecting to IoT devices and aggregators at the edge. And at launch, it will also support bundling and integration of several popular third party open source databases including MongoDB and EnterpriseDB.
It encompasses capabilities drawn from IBM's data management and analytics product portfolio, plus new cloud-oriented governance and data science capabilities. For data scientists and engineers, it encompasses functionality adapted from Data Science Experience (DSX), which includes model management and deployment, and IBM's Data Refinery data preparation capabilities. It also includes data profiling and ETL capabilities from IBM InfoSphere Information Analyzer and DataStage. For business analysts running BI workloads, there are dashboards drawn from IBM Cognos DDE (Dynamic Dashboard Embedded) that have been embedded into the workspace. For governance, ICP for Data includes a data catalog that encompasses a business glossary, security and access policies, and data lineage.
But as noted, this is developed as a private cloud offering, not an agglomeration of tools. That comes through an architecture built on containers with functionality and data sets exposed as microservices and APIs. As a private cloud offering, provisioning will be automated under the covers.
The user experience has a common look and feel that exposes functionality based on role. For instance, data engineers gain access to functions such as choosing data sources (they can be stored within the ICP cluster or come from an external source), mapping transformations, and running the ETL jobs. The data scientist gets access to functions drawn from DSX, such as access to Jupyter notebooks, integration with Git for version control, and DSX's model management capabilities that provide access to previously-built models, and the ability to work with the model through the full lifecycle from development to testing and deployment. In turn, business analysts get access to their familiar dashboards. Data stewards can manage the data catalog and the business glossary by which data sets are classified and governed. Business users can work through the built-in dashboarding capabilities.
Data scientists can work across public and private clouds. They can kick off a project within ICP for Data's model management capabilities or work with the recently unveiled Watson Studio that runs in the IBM Public Cloud. As IBM designed ICP with the same containerization as the IBM Public Cloud, data and modeling artifacts can move back and forth between IBM public and private cloud environments. That means that, while they can develop machine learning models on ICP for Data, the artifacts are portable and deployable on the deep learning neural networks supported by Watson Studio. We believe that at some point, it would make sense for IBM to bridge those worlds.
With the launch, there is also the beginnings of a third-party ecosystem. Beyond the integrations with MongoDB and EnterpriseDB, there is also an emerging third-party partner ecosystem that also includes Datameer, for exploratory analytics, plus Aginity, Lightbend, NetApp, Portworx, and Tata.
ICP for Data is a work in progress. The core bits are in place, but the data quality, master data, and reporting pieces will follow in upcoming releases. At launch, the core data visualization capability will be there. The same applies to the advanced modeling features. IBM is building the third-party ecosystem for integrating tools and databases. The initial version going live will be the higher-end enterprise edition. A smaller Cloud Native Edition and a freemium Community Edition will likely follow later in the year.
In the grand scheme of things, the private cloud is the latest spin on the age-old concept of the turnkey system that evolved over the previous decade to special-purpose appliances, such as Netezza. The idea was buy a box with software already pre-configured, turn the switch on, and go.
But as noted, private clouds are more than cluster appliances with a bunch of pre-integrated databases, tools, or applications. Private cloud offerings that embrace native cloud architectures differentiate with containers to make functionality portable, Kubernetes to make containers composable, and microservices and APIs to make functionality and data readily consumable.
IBM is hardly the only player in the private cloud arena. Oracle Cloud at Customer and Microsoft Azure Stack are prime evidence that there are enterprises demanding the flexibility of the cloud, but whose policies or regulatory mandates otherwise prevent them from putting data in the public cloud. With Oracle having just acquired DataScience.com, it also has the opportunity to put together an offering that addresses the analytics to data science and machine learning lifecycle. But for now, with ICP for Data, IBM is the first one off the mark.