Microsoft buys Citus Data

Microsoft’s acquisition of Citus Data is meant to differentiate its Azure Database for PostgreSQL service and put skin in the game for the PostgreSQL open source project.
Written by Tony Baer (dbInsight), Contributor on

Adding to its open source bona fides, Microsoft today announced the acquisition of PostgreSQL database provider Citus Data for an undisclosed sum. Given that Microsoft already offers a managed PostgreSQL service on Azure, the obvious question is why they would need to make the acquisition.

Microsoft's strategy is twofold. First, it adds depth to the talent base, in this case, for acquiring rather than having to develop PostgreSQL talent. It could better position Microsoft to contribute and have a stake in the direction of the PostgreSQL open source project, while providing a faster ramp up to supporting the latest versions.

Secondly, it is about the way that Citus Data has extended the PostgreSQL platform. Citus Data has always differentiated itself for supporting a scale-out configuration of PostgreSQL, and Microsoft could benefit from that, especially as it competes with Amazon Aurora's PostgreSQL compatible edition.

But in and of itself, the fact that Citus Data carved its own flavor out of PostgreSQL is not unique. Others before it, like ParAccel (the technology that AWS acquired to start building Redshift), Greenplum, Netezza, EnterpriseDB, Vertica and many others have custom-built databases using the core PostgreSQL engine as the starting point. As we noted last year, the popularity of cloud-based pure vanilla PostgreSQL managed services has brought this white label database out from the shadows, and at Microsoft, the rapid growth in uptake of Azure Database for PostgreSQL put them on a course for pursuing Citus Data.

For its Azure Database PostgreSQL service, Microsoft boasts that its implementation is pure open source and won't on its own cause vendor lock-in. But the pure open source/portability argument is a double-edged sword as Azure's chief cloud rivals, AWS and Google Cloud, each sport their own pure open source PostgreSQL database services. At that point, differentiation comes through version number and ancillary cloud services, such as security, monitoring, and synergy with other services on the platform.

Since launching its PostgreSQL service a couple years back, Microsoft noted that many of its customers were looking for more scale. With SQL Server 2019 Big Data Clusters, Microsoft has introduced its own answer for scaling out a SQL database for analytics. But Microsoft did not want to force these customers who already expressed preference for the PostgreSQL platform to migrate.

AWS already had its answer with Aurora, where the emphasis is on API compatibility, so the platform looks and acts like PostgreSQL to developers. Beneath the hood, Aurora implements PostgreSQL (along with MySQL) with optimization for its own intelligent, distributed storage infrastructure. Aurora targets large, multi-terabyte OLTP deployments where parallel processing can support high concurrency.

For Microsoft, having its own scalable PostgreSQL answer to Aurora is where Citus Data comes in. Admittedly, comparing the two is like apples and oranges, since Aurora is focused on OLTP while Citus Data's scale-out architecture is meant to provide the horizontal scalability and auto-sharding capabilities associated with NoSQL databases like MongoDB, but with the ACID support associated with enterprise databases; massively parallel transaction processing for real-time analytics; OLTP; and multi-tenancy support to support cloud or hybrid cloud deployments.

Citus Data has always differentiated with a scale-out implementation for PostgreSQL, but a couple years ago, refactored its platform so it was no longer a fork of the core open source platform. Instead, it refactored the code, transforming the distributed functionality into an extension that is available through an API.

That change coincided with another that was directly related. Now that the Citus Data platform was no longer a fork, it could be offered as open source. The core platform is available under the PostgreSQL standard license that is similar to the BSD or MIT open source licenses. However, the Citus Data extension is available under the AGPL, which is technically an open source license (the same one that MongoDB formerly used),but discourages third-party providers from commercializing it.

The refactoring also conveyed another key advantage for Citus Data. Now that its platform was no longer a fork, Citus Data could stay current with the latest PostgreSQL open source releases with a minimum of effort. And with the new version 10 -- which introduces native partitioning, improved parallel query support, logical replication, and full text search for JSON among others -- that's no small advantage.

And with the extension architecture, that will allow Microsoft to offer its Azure Database for PostgreSQL in different versions: the vanilla PostgreSQL edition that is fully portable, and what we'll call an extended enterprise edition with the Citus Data extension for customers that demand sheer scale. Today, Citus Data makes its platform available both on-premises, and through a managed service that is hosted on AWS. Going forward, we would expect that expansion of the service will move towards Microsoft Azure.

Postscript: At this point, comparing SQL Server 2019 Big Data Clusters with the Citus Data technology for Azure is a bit like comparing apples and oranges because SQL Server is for on-premises deployment, whereas the Azure PostgreSQL service for which Citus Data technology will be applied is cloud-native. Nonetheless, we believe that the architecture of SQL Server 2019 Big Data Clusters could readily be adapted for a similar configuration in the Azure SQL Database cloud counterpart.

Editorial standards