Amazon wants your enterprise database

Amazon's awaited release of PostgreSQL on Aurora sharply raises the stakes in its competition with Oracle. Still in public preview, when will Aurora PostgreSQL go GA, and what can we look forward to in the roadmap? Here are some hints of what we expect.
Written by Tony Baer (dbInsight), Contributor

Video: How to manage multiple public cloud strategy

Amazon's steady march from cloud services to cloud platform provider is totally consistent with the company's M.O. It takes the long view in developing its business. Amazon's database business makes a good case in point.

After hitting the conference circuit last fall with Oracle Open World and Amazon re:Invent, there was little doubt of the love lost between both companies. And so it's been tit-for-tat. Amazon's announcement last fall of a public beta of a PostgreSQL engine on Aurora was a shot across the bow at Oracle. It's not just that Amazon offered a PostgreSQL managed service, but that it would be implemented on Aurora, Amazon's reimagining of the database in a cloud-native architecture.

Also: Microsoft acquires cloud-computing orchestration vendor Cycle Computing | AWS launches data security service called Macie with machine learning | AWS infrastructure is now behind three main streaming media providers

Although the PostgreSQL open source community has been vehement that the platform is vendor-agnostic, providers like EnterpriseDB and others have long positioned PostgreSQL as an Oracle replacement. There are differences, but hold that thought for a moment.

Barely a couple months after the PostgreSQL on Aurora unveiling, Oracle doubled the price of running its database on Amazon's managed RDS service to position Oracle's own Public Cloud as the place where Oracle database runs the cheapest and new versions come out first.

Meanwhile, Amazon is greasing the skids to get customers to move databases with its Database Migration Service (DMS) priced using a razor and razor blades strategy: Pay a nominal $3/TByte (plus the cost of running IOPS-optimized T2 or C4 compute instances) and move your Oracle, SQL Server, MySQL, MariaDB (and several other popular databases) to or from any of the Amazon RDS or Aurora databases. Since launching the service just over 18 months ago, Amazon DMS has moved 34,000 databases and counting.

This is not to say that migrating databases can be done with your eyes closed. Because each relational database has its own dialect of SQL and uses features such as stored procedures or materialized views differently, there will be impedance mismatches to resolve. Such differences can be so specific that the old adage of replace your database and replace your DBA had a grain of truth to it.

So along with DMS, Amazon offers an offline Schema Conversion Tool (SCT) that helps with conversions and handles mismatches by providing reporting tools, and optionally, conversion tools using Lambda serverless compute functions with Python libraries to code the feature changes.

Now, using DMS doesn't necessarily mean that you are switching databases; for instance, many Amazon customers use it to migrate to newer versions of the same database. Amazon is obviously not alone in offering its own database migration tooling. Nonetheless, lowballing the cost of a service that would otherwise require more costly third party tools is clearly a means-to-the-end strategy to get customers to move to Amazon.

Since Amazon's late 2015 2016 announcement, PostgreSQL on Aurora has moved from limited to fully public beta and has drawn several thousand customers to kick the tires. So when is it finally going to go GA? For now, the most specific answer from Amazon is "soon."

At the high end, Amazon has a retailing customer with an extremely large 24-TByte point of sale database (they still keep historical data) that has reported good results. And that customer was unusual -- most transaction databases top out at 1 or 2 TBytes.

We suspect that Amazon will focus on optimizing storage, given that the Aurora databases (MySQL and PostgreSQL) handle updates quite differently. Whereas MySQL (which until this year was the only Aurora engine) erases obsolete data during updating, PostgreSQL is an append-only system that uses more cloud-friendly MVCC methods to version-control updates. MVCC lends itself well to highly replicated cloud databases like Aurora, which carries a minimum of six copies of the data for resiliency.

So where will PostgreSQL on Aurora go from here? The obvious answer is PostgreSQL 10, which is due in October (Amazon is trying to commit updates within 30 days of the open source release). Among the highlights of the new release is declarative partitioning, an important feature that starts to address the gap with incumbents like Oracle.

Otherwise, Amazon has been predictably mum about the roadmap, but we expect that as Aurora is a transaction database working with live data, that there will be some synergies with its Kinesis streaming service. We'd also like to see automated tiering that utilizes the whole of Amazon's varied database and storage portfolio for putting data in the right place, at the right time, at the right cost.

But most of all, there is the task of managing expectations, As we've noted above, different SQL relational data bases are... well... different. And since Aurora is an Amazon-unique product, customers that adopt Aurora will have to undergo some form of conversion process. Theoretically with similar sources like MySQL or Postgres, the transition should be transparent, but if Amazon is serious about drawing Oracle or SQL Server customers, it will have to get its hands dirty with best practices and cookbooks that clarify the differences and deliver prescriptive advice on how to make the transition.

Editorial standards