OtterTune sets out to auto tune all the databases

Tuning databases is key to application performance and stability, but it's a hard job. Auto-tuning helps, but it was reserved for the Oracles and Microsofts of the world till now. OtterTune wants to democratize this capability

Databases are the substrate on which most applications run. Although different applications have different needs served by different databases, they all have one thing in common: They are complex systems that need continuous fine-tuning to work optimally.

Databases come with a plethora of parameters that can be tuned by "turning knobs." Traditionally, this has been the job of Database Administrators (DBAs). Their job is a hard one, as they need to know the specifics of the database, the hardware it's running on, and the workloads it serves.

Some database vendors like IBM, Microsoft, and Oracle have taken steps to automate this work. OtterTune is a startup that wants to democratize this capability. Today, OtterTune is announcing the private beta of its new automatic database tuning service, as well as an initial $2.5 million seed funding round led by AccelZDNet caught up with OtterTune CEO and co-founder Andy Pavlo to find out more.

From research to the real world

Pavlo is a distinguished academic researcher in databases: Associate Professor of Databaseology in the Computer Science Department at Carnegie Mellon University, in his own words. Following in the footsteps of Mike Stonebraker, a prominent figure in the database world and one of his mentors, Pavlo is now setting out to apply his research in the real world.

What set the wheels in motion was work Pavlo and his team published in 2017 in SIGMOD, one of the biggest venues for database research. That work laid the foundations for how to use large-scale machine learning to automatically tune databases. It got attention in research, and the team was invited to share their work on the AWS Machine Learning Blog.

ottertune.jpg

OtterTune wants to auto tune all the databases

From that point on, requests from people interested in applying this to their databases started flowing in. Pavlo was not ready to accommodate them at the time, but this clearly showed they were on to something. Now he and his team say they are ready to take the next step and apply their ideas in real-world settings.

The "academic", as Pavlo referred to it, version of OtterTune has actually been in use in a number of use cases already. Those include Societe General and Booking.com. OtterTune does not have any paying customers yet, but with the private beta announced today OtterTune is open for business. OtterTune works for both on-premises and cloud-based database deployments (PostgreSQL, MySQL, and Amazon RDS).

The academic version supports Oracle, and that's on OtterTune's roadmap to support later in 2021. This brings us to an interesting question, as Oracle already has its own flavor of an autonomous database. So how does OtterTune work, and how is it different compared to Oracle's implementation?

Under the hood

Pavlo's research focuses on two tracks under the umbrella of autonomous databases. One is a black box optimization effort for database systems: try to optimally tune and manage a database through APIs that the system exposes without making any changes to the internals.

White box optimization, on the other hand, is building a database system from scratch with the idea that it should be autonomous. Both OtterTune's and Oracle's approaches are black-box optimization efforts. Work in this field goes back to the 1970s, Pavlo noted:

"There was a big push into what was then called self adaptive systems, because people recognize that with a relational model, if you abstract away to a logical layer what the actual physical implementation or the physical manifestation of the database is, then someone needs to make a decision on how to optimize that system.

What's different now is that people are applying machine learning techniques to try to automate this. The work from the early 2000s from Oracle, IBM and Microsoft was really about advisory tools for human DBAs. What they have now is the same methods and tools at a high level, except instead of a human clicking OK, the software itself clicks OK and applies the changes."

ottertune-2.gif

OtterTune's workflow, relying on collecting database metrics and using machine learning to automatically optimize them

It's a reactive rather than a proactive approach, but so is OtterTune, Pavlo acknowledged. A key difference however is that OtterTune extends to other databases that don't necessarily have the kind of budget or team the Oracles of the world have.

While OtterTune refers to using machine learning, it also mentions that it does not need to examine an application's data or queries to do its magic, which left us scratching our heads. As Pavlo explained, however, OtterTune connects to a database and retrieves the current configuration of the system via standard SQL commands.

These are internal performance counters that every database system maintains to keep track of the work they're doing. Metrics like pages read, pages written, locks held, number of queries or latency is what OtterTune's machine learning relies on to work. OtterTune records these metrics at different frequencies and stores them in an internal repository that keeps track of all the training data, metrics, and configurations from every training session.

Then that data is segmented based on the database system type, including context about the hardware. With that, statistical models are trained that can predict how the database system is going to perform as you change the values for the configuration knobs.

OtterTune's roadmap

Even though that sounds quite abstract, there is in fact a certain degree of domain knowledge applied. Keeping knobs too dangerous to mess with out of the equation is a key concern. Plus, there is nuance involved in the fact that OtterTune currently only works with parameters global to the database, not specific to the table level for example.

Overall, however, the approach should work for other systems beyond the ones currently supported, Pavlo said. Setting up the infrastructure to collect the data was the hard part. OtterTune will probably be targeting Amazon Aurora next and going from there depending on customer demand.

The $2.5 million in seed funding is just the beginning. Pavlo mentioned OtterTune is bound to raise more capital in 2021, and get an injection of business expertise in addition to growing its current headcount of 12, consisting mostly of Pavlo's research team alumni.