Microsoft Ignite postmortem: The underlying hybrid convergence

While Intelligent Cloud, Intelligent Edge, and AI grabbed the headlines at this year's Ignite conference, there is a hybrid cloud/on-premises message lurking in the background. As Microsoft's competitive landscape gets increasingly driven by Amazon and Google Cloud, its hybrid on-premise/cloud strategy could become its prime differentiator -- and databases could be exhibit one.

hybrid-architecture.png

As my colleague Mary Jo Foley noted during a live interview at Microsoft Ignite, IoT and intelligent edge processing predominated Satya Nadella's opening words at Microsoft's enterprise event held this week in Orlando. Beyond the keynote, AI was front and center. As we noted from our dispatches about Build this past spring, AI is driving Microsoft's enterprise front and back office suites, not to mention how it manages the Azure cloud.

But as Microsoft is increasingly focusing its business on Azure, there's an equally important message that gets drowned out by the noise that, in the long run, could distinguish Microsoft from the pack. Unlike Amazon and Google, Microsoft has an on-premise presence and to its credit, it has not let its on premise legacy drag it down like ball and chain. It is aspiring to deliver a consistent experience on premises and in the cloud. In all fairness, that is also the position that each of the household brands in enterprise IT are shooting for. Microsoft's advantage is that Azure has had a head start in building its footprint over IBM and Oracle clouds dating to its Office 365 foothold.

The latest announcements coming from Microsoft at Ignite about SQL Server 2019 and Azure SQL Database show how near and how far the brass ring of consistent experience across on premise and cloud are. A strong clue is the unification of the code bases between SQL Server and Azure SQL Database that came with last year's releases. Both were twins separated at birth: Azure SQL Database was patterned off SQL Server, but until last year followed its own code branch to support a cloud-native architecture.

Convergence means that SQL Server and Azure SQL Database will share some identical features, provide consistent experience on similar features, while retaining operational differences relating to the fixed nature of on-premise data centers vs. the elasticity and almost-limitless scale of the cloud.

Both of them placed storage and compute together. Now the differences get blurred with SQL Server 2019 Big Data Cluster support and Azure SQL Database's new Hyperscale feature.

Previously, SQL Server queried Hadoop through the pushdown mode of PolyBase. With the 2019 release, SQL Server has added a new mode in addition to its traditional relational table layout: a cloudlike scale-out mode for collocating SQL Server's database engine on the same compute nodes as Spark, that in turn are adjoined to Hadoop's HDFS data nodes. This allows SQL Server to run T-SQL queries on HDFS and it will provide native Spark support as a bonus.

In big data clusters, SQL Server 2019 separates the database engine from the data, with the engine sitting in the compute node, along with Spark. It's a topology that very much resembles that of Impala, Cloudera's open source interactive SQL-on-Hadoop engine that stations daemons on every Hadoop compute node. The draw of this approach is that SQL Server can run T-SQL queries over terabytes or petabytes of data much faster than PolyBase could. It also draws SQL Server closer to what is possible in Azure.

While this is not a carbon copy of cloud architecture, one could imagine a companion cloud-native service from Azure SQL Database that runs SQL -- or Spark -- against data stored in Azure Blob storage or ADLS. One can always dream.

The cloud parallels for SQL Server 2019 Hadoop support further extend to containerization. The first trappings of container (and Kubernetes) support came in SQL Server 2017, but it was limited to TestDev sandboxes because of the lack of high availability/disaster recovery capability. With that gap addressed in the 2019 release, SQL Server 2019 can operate in Docker containers that are orchestrated by Kubernetes High Availability Disaster Recovery (HADR) scenarios. One could imagine the parallels with cloud where Azure Service Fabric is used for HADR.

There's another piece that makes the experience of running SQL Server look like you're running in the cloud: Azure Data Studio. Known as SQL Operations Studio while it was in preview, Azure Data Studio this week entered general release. It provides a database developer's IDE for coding T-SQL or Spark that can be used with SQL Server, Azure SQL DB and Azure SQL Data Warehouse from Windows, macOS and Linux. You develop T-SQL or Spark, but it doesn't matter whether you're working in SQL Server or Azure SQL Database.

On the cloud side, Azure SQL Database introduces a new hyperscale capability that emphasizes how cloud databases differ from on premise. As the name implies, hyperscale scales out the database, with Microsoft currently claiming support of up to 100 TBytes, at least for now. Faster cloud network speeds in conjunction with decoupling of compute, storage, and log writes make hyperscale possible.

Hyperscale works by using a service fabric relying on a Paxos consensus approach for ACID consistency. Transactions are processed with the primary compute node writing logs to a log tier, while separately fetching data pages either from local cache (for hot data) or page servers (for cooler data). This means that hyperscale can support data tiering, exploiting the variety of storage options available in the cloud. The database automatically partitions itself for scale out, while snapshots address one of the big show-stoppers for large transaction databases: speeding database recovery operations from hours or days to minutes. In turn, elasticity allows you to scale compute for multi-terabyte databases up or down within minutes.

The hyperscale capability of Azure SQL Database provides a case in point of how cloud and on-premise databases can both differentiate and mimic one another. In the cloud, the abstracted architecture that separates compute, logs, and storage is useful for providing elasticity. On premises where compute and storage capacity are more finite, those same features could be used for accelerating maintenance, backups, and database commits if Microsoft were to extend those capabilities to SQL Server.

And while we're on the topic of extensibility for hyperscale, the modular way that Microsoft implemented this feature could also pave the way for extension to Azure's other relational database services like MySQL, MariaDB, and PostgreSQL. While Microsoft is not saying what it will do with hyperscale in the future, our take is never say never.