Snowflake introduces multi-cluster data warehouse

Not content with adding nodes to make queries faster, Snowflake lets customers of its cloud Elastic Data Warehouse service add entire clusters, to accommodate more concurrent users.

In many ways, the cloud data warehouse (DW) scene has two lobes. One encompasses the battle between the two leading public cloud providers, Amazon Web Services (AWS) and Microsoft's Azure. The other consists of IBM's dashDB and two cloud data warehouse pure plays: Cazena and Snowflake.

Big innovation comes from small companies
If you limit your analysis of the cloud DW space just to the MS-AWS lobe, you'll miss some important breakthroughs in the market. For example, Microsoft distinguishes its Azure SQL Data Warehouse from AWS' Redshift service by the ability to scale compute and storage independently of each other, and to pause the DW during times of zero demand. But Snowflake's Elastic Data Warehouse (EDW) actually brought that architecture to market first.

And today, just eight days after Microsoft put its DW platform into general availability (GA), Snowflake (whose CEO, Bob Muglia used to run the entire Server and Tools Business at Microsoft), is adding an architectural breakthrough to EDW.

Clusters, for all my friends
While most DW platforms, both on-premises and in the cloud, are physically based on pooling the resources of a cluster of servers, Snowflake is introducing the ability to have the DW run on multiple such clusters. Snowflake says that this architecture will provide greater concurrency -- that is, the ability to handle more active simultaneous users.

In fact, because the allocation (and de-allocation) of additional clusters can be performed on an automated, zero-admin basis, Snowflake is in fact saying the multi-cluster architecture will accommodate an unlimited number of simultaneous users.

Encore
In addition to the enhanced concurrency, Snowflake is announcing a new query caching technology that optimizes performance for dashboarding and reporting. It's also adding "data milestoning" to make data recovery easier, in the event of loss or corruption.

Then there's the new automated data distribution and metadata management which, Snowflake says, eliminates otherwise necessary manual tuning for performance scaling and query optimization.

Dayenu
That would suffice us, but there's a kicker: the same multi-cluster architecture that accommodates more users can also enhance disaster recovery and high availability. This almost comes as a "free" architectural consequence, since each of the multiple clusters can be located in physically separate availability zones. So add HA/DR to the new EDW feature list.

Snowflake tells me it has over 200 customers. For a product that itself GA'd a little over a year ago, that's a lot. Certainly, IBM and Cazena should take notice. But so too should Amazon and Microsoft.

Resting on your laurels is so pre-cloud.

Related stories: