Oracle's MySQL HeatWave goes on Autopilot

The second release of MySQL HeatWave adds a dose of machine learning-based automation to optimize performance.

At the end of last year, Oracle took MySQL in a new direction by introducing a cloud database-as-a-service (DBaaS) that added analytics. Eight months later, its next act is introducing a new "Autopilot" feature to its MySQL Database Service with HeatWave that employs machine learning to juice OLTP and OLAP performance of the database. This marks the second time that Oracle has applied machine learning to optimizing the operation of the database, but compared to the Autonomous Database, MySQL HeatWave's Autopilot capabilities are designed to introduce intelligent automation to a market working with small departmental databases that prefers open source, is cost-sensitive, and is more accustomed to manual knob turning.

Specifically, MySQL Autopilot uses internalized machine learning to assist users for some tasks, and applies closed-loop automation to handle others. It assists users by making recommendations on how to provision and load the database, and implements closed-loop automation for failure handling/error recovery and query execution. And with machine learning, Autopilot can iteratively improve query plans based on results from other similar, different queries. The latest release of MySQL HeatWave also increases cluster sizing to 64, more than doubling capacity limits to 32-TByte instances, and raises scalability by 20%.

For instance, when provisioning the database, Autopilot conducts a smart scan of the table using adaptive sampling techniques to predict the most optimal cluster sizing. This is based on predicting the amount of memory that will be required for holding the table and the intermediate results in memory. In this case, the output of the model is making recommendations, including an explanation for the recommendation to the customer, who ultimately decides on cluster sizing.

Conversely, in the case of query execution, the model automatically schedules queries using logic to reduce contention between short- and long-running queries characteristic of mixed workloads. Without such smart query scheduling, short queries, which are expected to be interactive, could get delayed by long workloads unless a database administrator takes action. With Autopilot, this scenario can be avoided as it predicts the execution time of each query, then prioritizes short queries over long ones to reduce or eliminate waits for interactive queries.

Other Autopilot functions include optimizations for loading and deciding where to physically place data. Based on query patterns, the algorithm strives to place rows that are most frequently joined closer together and within the same, or adjoining partitions on the same node. It then predicts the degree of improvement that the user can expect by comparing the proposed optimized placement vs. the default. Explaining the results will be useful for customers deciding whether to take HeatWave's data placement recommendations.

A related feature impacting query planning is auto query plan improvement where the system learns from queries and improves performance of subsequent queries which aren't necessarily identical with existing queries.

By comparison, query planners with conventional relational databases tend to be rules-based, optimizing for cost or performance. In turn, experienced DBAs typically take on the task of schema design to optimize for anticipated queries, and laying out the data (data loading to optimize access). MySQL Autopilot routines automate these tasks using ML, and the company claims that its algorithms can improve performance on benchmarks such as TPC-H (typically used for analytics) by up to 40%.

As we have noted with Oracle Autonomous Database, databases mark an ideal use case for applying ML as the problems are well-bounded, the optimizations can be well-defined (e.g., optimize for performance, throughput, or capacity, etc.) and can feed off a rich supply of operational data. Autopilot employs ML models for optimizing query planning and performance; cluster capacity; network overhead; and parallelizing loading operations.

Other enhancements in this release include boosts in capacity; while the initial release of Oracle MySQL HeatWave limited clusters to 24 nodes (holding up to 12 TBytes of data), the new release boosts that to 64 nodes and 32 TBytes. System scalability for end-to-end queries has been improved by about 20%.

Security is also tightened up; while data was already encrypted at rest, in the new release, it is also encrypted on the wire. That puts it in sync with MySQL offerings from Azure and Google Cloud; by comparison, encryption is optional with Amazon Aurora.

As we noted when Oracle first released the current MySQL Database Service with HeatWave at the end of last year, the Oracle service was not meant to be a copycat of rival cloud services that implemented the core open source MySQL database without modifications.

From that standpoint, Oracle's service is more comparable to Amazon Aurora, where AWS added its own enhancements to the storage engine to make MySQL handle larger, multi-terabyte transaction processing deployments supporting high concurrency. Oracle's service handles that, but also differentiates itself by adding analytics with the HeatWave in-memory query accelerator. We reviewed the differences, offering comparisons with Microsoft Azure (which extended PostgreSQL to support sharded transaction processing) and, of course Aurora. And we compared the analytic extensions with other customized platforms such as Actian Vector.

While Amazon Redshift has been on Oracle's radar screen for some time, fast forwarding to the present, Oracle is not surprisingly targeting Snowflake. Citing third party benchmarks that it commissioned and published, Oracle claims that MySQL HeatWave outperforms Snowflake; Amazon Redshift; Azure Synapse Analytics; and Google Cloud Platform's BigQuery. In contrast to most cloud database rivals, Oracle has long positioned its databases to handle mixed transaction and analytic workloads, and when MySQL HeatWave was first released at the end of last year, it extended that to MySQL. So, compared to AWS, for instance, Oracle positions MySQL HeatWave as transcending the capabilities of Amazon Redshift (for analytics); AQUA (analytics query acceleration); Aurora (for transactions); and Glue (for ETL).

When comparing Oracle MySQL HeatWave with other cloud data warehousing or "data lakehouse" platforms, there are still differences with regard to integration of AutoML or other machine learning services, and there isn't any capability for running Spark natively. That said, the initial target is organizations seeking to consolidate their MySQL instances and expanding them to analytics. We could see such features coming to HeatWave down the road.

HeatWave's machine learning -driven Autopilot features naturally invite comparisons with Oracle Autonomous Database. But each serves a very different audience. Built on Exadata, the Autonomous Database is intended as an enterprise consolidation platform for Oracle instances that can hit the petabyte range. As noted above, MySQL databases are typically much smaller, tend to be run at department level, with customers who are looking for low cost. In most organizations, there might be dozens, if not hundreds or more MySQL departmental databases floating around. That's the sweet spot that Oracle is targeting with the MySQL Database Service with HeatWave.

As we noted, given that others have beaten Oracle to the starting line with their own vanilla MySQL implementations, Oracle had to do something different. In this release, adding machine learning sends the message to developers that on Oracle Cloud Infrastructure, their MySQL instances should also require less care and feeding and perform better.

Disclosure: Oracle is a dbInsight client.