Announced in a company blog several months ago, at SAPPHIRE this week, SAP is officially taking the wraps off a new suite of tools aimed at governing and integrating data across highly federated environments.
SAP does not own its own hardware infrastructure. That has had large implications how it delivers the HANA database, where SAP doesn't manufacture the database box, but instead certifies OEM partners to deliver it. And that's the design pattern that's driven SAP HANA Data Management Suite, which is aimed at integrating and managing data in federated environments, inside and outside SAP, and in and out of the cloud.
Release of the data management suite this week is not exactly a mystery as SAP disclosed its direction in a public blog post earlier this year. But it is making it official this week at its annual SAPPHIRE event that kicks off today in Orlando.
For now, this suite comprises a collection of existing products based on SAP HANA, SAP Data Hub, SAP Cloud Platform, Big Data Services , and SAP Enterprise Architecture Designer (EA Designer) offerings. While SAP HANA Data Management Suite includes some familiar products, the launch includes a number of new features. The headline is a feature in the SAP HANA database itself: HANA 2.3 adds support for the new Intel Optane DC persistent memory (previously code-named 3DXpoint), announced just last week. As an in-memory database, Optane could significantly boost SAP HANA's effective storage capacity thanks to NVRAM technology that delivers almost the performance of memory but at SSD prices.
Before reviewing the key enhancements, it would help to understand what each of these SAP products actually delivers. SAP HANA, of course, is the in-memory database that powers transaction and analytic processing. SAP Data Hub handles the integration of real-time data pipelines involving enterprise databases (like HANA) and big data coming from feeds as varied as IoT devices, messaging, logs, and manages data governance. SAP Enterprise Architecture Designer handles the master data. In turn, SAP Cloud Platform and Big Data Services provides a cloud implementation of a data lake using Hadoop.
Among the other new capabilities coming with this release include real-time anonymization for GDPR or other data privacy mandates, but only for data stored in SAP HANA. Rules can be documented in EA Designer governing which entities must be anonymized, and how.
SAP Data Hub 2.3 adds metadata cataloging that can discover relationships between data in distributed environments; support for SAP Cloud Platform, Big Data Services deployment on public clouds orchestrating containers with Kubernetes. EA Designer has added the ability to move repository content from on-premise and the cloud and automatic generation of physical data models from conceptual ones. Another new feature is the ability to analyze blockchain data with enterprise data.
Of course, the big question here is, what makes this a suite rather than "markitecture." At this point, the suite offers a common development platform for applications running real-time analytics, on premise and/or in the cloud. But as a fully integrated suite, it is very much a work in progress.
On the roadmap are unifying the user experience, converging the metadata catalog, containerization of Data Management Suite functions (to further increase seamless transition to the cloud), and applying machine learning to proactively tune and heal the underlying database.
The obvious competition is from rivals like Informatica who have specialized in all aspects of data integration and governance, and in Informatica's case, have already unified their offerings on a common metadata engine. SAP's ace in the hole is its ownership of the application pillar, and especially, its rich library of vertical industry information models that populate its master data, transaction, and analytic data stores.
Clearly, in assembling the SAP HANA Data Management Suite, SAP must balance the needs of its existing installed base for the respective tools that won't tolerate disruption. But then there's the need to make this a real product rather than a loose assemblage of data integration, master data, data quality, and data governance functions. For many customers, the flexibility of cloud-native services that hardly look like traditional silo'ed applications have raised expectations for a more modern architecture. More to the point, the data silos that customers are going to have to govern will include many in the cloud.
So we'd like to see SAP get more aggressive, at least by opening a separate track with a cloud-native edition of the SAP HANA Data Management Suite that exploits containerization and microservices and is integrated from the get-go. Much of the work in containerizing these offerings is already complete or underway, and by year end, all will be offered with subscription pricing. But they are still surfaced as separate, linked products. Instead, we would like to see SAP take an approach akin to what IBM just implemented with Cloud Private for Data, where it essentially deconstructed and mashed up capability from multiple products into containers with functionality and data sets exposed as microservices and APIs with a common user experience and back-end metadata services.
And SAP, we have one last feature on the wish list. Your metadata management is still based on relational technology. But, to paraphrase the tagline for Data Hub, as you "connect enterprise data with big data," relational approaches will prove too rigid. The situation gets compounded as your solution extends to non-SAP sources. Bite the bullet and embrace a graph rather than relational engine for metadata. It's far more flexible, maintainable, and extensible for the cold, hard, heterogenous world of data you intend to govern.