Microsoft yesterday announced major enhancements to its cloud data warehouse offering. Microsoft Corporate Vice President for Azure Data, Rohan Kumar, gave me the skinny in a private briefing in New York. If you want the TL;DR, the service will now be very competitive in performance with competitor AWS Redshift, while retaining some key configuration and pricing advantages over that offering.
But if you'd like to understand this announcement in full context, a little history lesson is in order...
We don't need no stinking DW
Once upon a time, Microsoft didn't have a dedicated data warehouse (DW) product. Instead, it relied on some highly optimized reference architectures and its flagship database product, SQL Server, as a data warehouse solution. But eventually, that became ill-advised as a competitive strategy, and Microsoft decided it needed a DW product.
Redmond's foray into the dedicated data warehouse world started with its acquisition of a company called DATAllegro, in 2008. Once the deal was complete, Microsoft charged the DATAllegro team with inaugurating "Project Madison:" building a new massively parallel processing (MPP) DW product, based on SQL Server and Windows.
The Madison technology came to be known as SQL Server Parallel Data Warehouse (PDW) and is now known as Analytics Platform System (APS). The product was not exactly a smash hit. Nonetheless, Microsoft needed the product to fill out its portfolio, and the company continually improved the product until it became quite robust, even if it was a bit of a sleeper.
Enter the cloud -- and some hard choices
The real success for the Madison technology came when Microsoft implemented it as a cloud service. But rather than just porting PDW to Azure, Microsoft re-architected the product to work really well in the cloud. The biggest change Microsoft made was to standardize on Azure Blob Storage for SQL DW's storage layer, rather than solid state drive (SSD)-based local storage on the server nodes in the MPP cluster.
Decoupling compute and storage -- something SQL DW's chief competitor, Amazon Redshift, doesn't do -- allows the two to be scaled individually and also allows the MPP cluster to be "paused" and "resumed" since Blob storage persists even as the compute nodes are shut down. Both of these features make for economical operation of SQL DW versus Redshift, to which storage and compute capacity must be increased in lock step, and whose MPP clusters must remain up and running at all times.
But there's a reason that Redshift uses node-based local SSDs for storage -- it's fast, especially compared to a cloud blob store. As a result, SQL DW has had to cache data in memory to be performance-competitive with Redshift. And in situations where that memory cache is overwhelmed, data has needed to "spill" back to disk storage (Azure Premium Storage, to be precise), causing intermittent, and thus unpredictable, performance issues.
Best of both?
But yesterday, Microsoft announced general availability of enhanced service tier, designed to address this issue. The new offering's brand, a product of Microsoft's inimitable nomenclature approach, is "Azure SQL Data Warehouse Compute Optimized Gen2 Tier."
A great name it ain't, but a good enhancement it is. In short, SQL DW Gen 2 offers the ability to cache data to NVMe SSDs. But the real innovation here that the durable storage layer is still Azure Storage Blobs. Thus SQL DW gains local SSD performance, while maintaining independently scalable compute and storage, and retaining the well-loved pause and resume capability.
Microsoft says the NVMe devices used in Gen 2 deliver up to 2GB/sec of local I/O bandwidth, with multiple devices available per physical host. As a result, the company says it's observed an up-to-5x query performance improvement for certain workloads on SQL DW Gen 2 vs. Gen 1.
As a kicker
While the NVMe caching is the major news, one other announcements comes along for the ride: SQL DW, including the Gen 2 improvements, is now available in 20 Azure data center regions, up from just 6, previously. Microsoft says this geographic availability surpasses that of data warehouse offerings from any other cloud provider. Such availability is particularly useful to customers with data-residency and compliance needs, especially with the European Union's General Data Protection Regulation (GDPR) going into effect in three and a half weeks.
Current SQL DW customers who wish to take advantage of the the Gen 2 architecture will need to take their Gen 1 instances through an upgrade process, which Microsoft says is recommended. For Redshift customers with pause-resume lust, the migration may be a bit more involved, but possibly well worth the trouble.