For nearly 30 years, PostgreSQL (a.k.a., Postgres) has arguably been the most common SQL open source database that you have never heard of. Call it the Zelig of databases, its technology either sat behind or acted as the starting point behind an array of nearly a dozen commercial database offerings from EnterpriseDB to Amazon Redshift, Greenplum, Netezza, and a host of others. And PostgreSQL has distinguished lineage as one of the brainchilds of Turing Award winner and database legend Dr. Michael Stonebraker, who started the PostgreSQL project based on the lessons learned from his previous database venture, Ingres.
But now there are commercial products that put PostgreSQL out front and center. EnterpriseDB opened the Pandora's Box roughly a decade ago with a commercially supported platform designed as an Oracle replacement. More recently, cloud providers have stepped in with a raft of hosted offerings, beginning with Amazon Wed Services, offering PostgreSQL as one of the platforms supported through its managed Relational Database Service (RDS).
A few months back, Matt Asay made the case for PostgreSQL becoming hip again in the sense that boring (things just work) has come into vogue. The one point we'd argue with, however, is the "again" unless you count the moment PostgreSQL materialized in Stonebraker's mind.
But it still all begs the question: Has the time come for PostgreSQL to step out from the shadows and stand up as its own platform?
We had a chance to take in a day of the annual PostgreSQL conference (actually, they also hold a bunch of satellite gatherings) in Jersey City this week. For an event where AWS (along with Pivotal Greenplum) grabbed the only two keynote slots, it was quite a low-key affair of DBAs speaking of the maturity (or lack thereof) of features like data validation, sharding, change data capture and so on. Given that it announced the GA of its own cloud PostgreSQL service this week, we're sure that Google would have hungered for one of the other keynoting slots. And maybe the PostgreSQL community folks need to think bigger in cultivating commercial ties as their platform emerges from the shadows.
AWS's Mark Porter, general manager of Amazon Aurora PostgreSQL and Amazon RDS for PostgreSQL, had to do some fancy footwork making the case to an open source crowd about AWS's support for the open source community. While it has not been known for making major technology contributions to the PostgreSQL project, it has in fact contributed to the community through bug fixes, free testing accounts, and other forms of financial support. And there's a good reason why the PostgreSQL implementation on Aurora is not open source, because it's designed explicitly for AWS's own cloud infrastructure. Of course, Microsoft and Google retort that their cloud managed services are in fact compliant with open source (but then again, so is Amazon's RDS PostgreSQL implementation).
The not-so-dirty little secret about PostgreSQL is that while the open source project is focused on a transaction database, many of the commercial products that have descended from it are MPP data warehouses. So, out of necessity, the Greenplums, Netezzas, and Redshifts of the world had to create their own forks of the open source project, even to add basic features like columnar tables.
Scott Yara, founder of Greenplum and SVP of products at Pivotal, described the company's full circle journey dealing with the analytic forking. During the early days of Greenplum (before it was acquired by EMC), the emphasis was on delivering a production MPP data warehouse, which dictated focusing on its own technology. An acquisition, spinoff, and pending IPO later, Pivotal has committed to 100% open source. But the Greenplum database remained on an older, now unsupported version of PostgreSQL. With the next major version (6.0) that is expected later this year, the Greenplum database will merge back with the open source trunk and get on a reasonably current, supported release of the open source project. That will allow Pivotal to eliminate some features that it developed that are now included in the more current open source platform. But as PostgreSQL remains a transaction database, the Greenplum database will still require its own analytic extensions.
A frequent theme of PostgreSQL is that it's the open source SQL relational database that's meant for enterprise workloads. That's a point that the MySQL and MariaDB folks would likely contest, but there remain real differences, such PostgreSQL's support for more complex SQL functions and data types encompassing arrays, joins, and windowing, among others.
Ultimately the "replace Oracle" theme comes up, given that PL/pgSQL was designed to resemble Oracle PL/SQL. It's a theme that has been promoted by EnterpriseDB for many years. And it was a theme reiterated by FINRA in one of the conference sessions. FINRA is most of the way through moving what had been roughly 650 Oracle instances to Amazon RDS for PostgreSQL. This is part of a larger corporate strategy to migrate its entire on-premise IT infrastructure to AWS. According to FINRA lead developer Steve Downs, features such as object/relational mappings, stored procedures, and the ability to form complex queries using view merging and predicate pushdown give Oracle DBAs familiar feels in PostgreSQL.
Nonetheless, as a different database (and SQL implementation), there are clear differences between PostgreSQL and Oracle. A few examples include how the databases handle numeric and variable character fields, synonyms, replication (which is not as mature as Oracle's), and refreshes of materialized views, among others.
If imitation is flattery, PostgreSQL has it in spades, as it has become the go-to open source platform for third parties seeking to deliver their own relational database products. That is directly attributable to the conservative nature of the open source project that has prioritized stability and working nuts and bolts over bleeding edge flash. What's significant is that the latest 10.0 version, released last fall, addresses features that would otherwise be taken for granted with Oracle or SQL Server. Highlights include declarative table partitioning; refinements to replication such as publish/subscribe; and quorum commits (which is potentially very useful for global cloud deployments).
Yes, there is catch-up ball for PostgreSQL to play, and there are clear reasons for Oracle or SQL Server customers to continue with their platforms. But on the horizon, much of the differentiation will be in database implementation, not nuts and bolts features. And much of that differentiation will be with how databases natively exploit the elasticity, automation, choice of infrastructure, and global scale of the cloud.
The fact that AWS, Azure, and Google Cloud are now leading with PostgreSQL services, rather than white labeling them, is sure a sign that after 30 years, PostgreSQL might finally be coming out of the shadows.
Postscript: The day that this post published, Pivotal had its IPO, with the stock closing up 5% for the day, raising $555 million.Like sister company VMware, post-IPO, Dell EMC still holds majority ownership and control of the company.