SAP's road to cloud-native data platforms in a hybrid world

While SAP's data portfolio has been available in the cloud for some time, it is now making serious moves to make SAP HANA, data integration, and analytic platforms cloud native. The question is how SAP will harmonize the experiences of on premise and cloud for a customer base that will keep their feet planted in both worlds.

At SAP's 2018 TechEd conference that has just wrapped up Las Vegas, it made several announcements for general availability and betas of SAP HANA and data integration products with an eye toward the cloud. SAP has offered HANA in the cloud for some time; for instance, you can go to AWS Marketplace and order up an instance, taking your existing license.

SAP also announced general availability of SAP Data Hub 2.3, which was part of our SAPPHIRE coverage back in the spring. To recap, Data Hub manages data pipelines, providing the flexibility to move data and/or push down compute to the data source. The notion with data pipelines is chaining together operations related to sourcing, ingesting, cleaning, integrating, and in some cases, executing ML processes and performing light in-line analytic and/or filtering operations on data. Version 2.3 takes important steps toward unifying metadata management and data exploration.

SAP is designing Data Hub with an eye on cloud deployment, support of multi-cloud data landscapes, and on managed cloud service delivery. For now, it is a bring your own license service - not yet a managed service. But there is precedent with SAP Data Network, providing a managed service for data discovery, enrichment, and analytics. The plot thickens with the announcement at TechEd last week of the beta for SAP HANA database as a service. Like Data Hub, you could already run HANA in the cloud, but it was targeted at lift and shift where you managed HANA in the cloud rather than your own data center.

There's little shock or awe about SAP's HANA as a service announcement. (Postscript: This was announced at SAPPHIRE last spring.) It's more an inevitability where SAP's rivals are already running there. We've been big proponents of managed cloud database services because, unlike infrastructure as a service (which is the modern, multi-tenanted equivalent of hosting), managed cloud services fully deliver on the promise of simplification. Set the instance, size of the footprint, and service level that you want, and leave the messy, non-value-added jobs of provisioning, patching, and maintenance to the database provider. And by the way, the same goes with any form of packaged software; we can all thank Salesforce.com for raining awareness of the benefits of SaaS (and PaaS) services when it comes to allowing IT and the business to focus on more value-added tasks.

Of course, the rub with managed cloud services is the lock-in factor, but that's where cloud challengers like SAP come in: depending on where they make their managed service available, they are not necessarily locking you into the usual suspects: AWS, Azure, or Google Cloud.

But that's where the balancing act comes in. If you're Amazon or Google, it is much easier to provide a consistent experience to your managed database service, whether it be Aurora, Cloud Spanner, DynamoDB, or Cloud Firestore, because it's only delivered on their environment.

For incumbents like SAP (and Microsoft, for that matter), who have the on-premise legacy, there is the challenge and opportunity of giving customers the best of both worlds. Adapt your on-premise product to give a consistent experience with the cloud counterpart, and you have a powerful wedge against cloud-only providers, not to mention a ready answer for those demanding private cloud deployments. This is an area where IBM, Oracle, and Microsoft are staking preemptive claim.

But to deliver that common experience, the devil is in the details. At the base, most cloud-native platforms separate compute from storage to provide elasticity and ease of scaling. At first blush, that sounds like a nice idea for on-premise, except that data center deployments won't have the virtually infinite scalability and multi-availability zone replication for HA/DR (and improved local performance for reads and, for multi-master databases, writes). As we pointed out last week, Microsoft is beginning to perform this balancing act with SQL Server 2019 and Azure SQL Database.

The flip side of cloud native is containerization and microservices. By deconstructing monolithic on-premise databases and applications, they can utilize cloud resources far more efficiently. If you are running a change data capture log, for instance, you don't have to fire up the whole database engine and therefore reduce the amount of compute footprint to consume.

Containerization is not an all-or-nothing deal. You can put the entire database in a container, like SAP has already done with SAP Data Hub which includes SAP Vora (which allows you to easily deploy on different clouds seamlessly using Kubernetes), or you can go further and deconstruct the platform into multiple containers that in the long run could really exploit the efficiencies and scale of cloud. For now, SAP has taken the first option, but it's still a key step forward in promoting its hybrid, cloud-agnostic advantage and in making HANA ready to be deployed as a container on Data Hub.

Now we're waiting for the other shoe to drop. SAP Data Hub is a product that's begging for cloud deployment because so much of the data that SAP customers will be working with will reside in the cloud. SAP has not announced any plans yet, but watch this space. And indirectly related to this is how SAP supports data scientists performing advanced analytics, modeling and machine learning. Given SAP's all-in support for Leonardo, its venture to develop industry expertise in AI, and the embedding of the fruits of those labors in SAP enterprise applications, we are expecting that SAP will plant its stake there soon. And yes, a lot of that (especially IoT) will require running in the cloud. So, we're looking for SAP to do two things: get the wheels in motion for a cloud-native data science platform that in turn will reuse much of the data engineering capabilities of Data Hub.