Databases were traditionally highly specialized data stores that were designed for specific tasks and until recently, they've been getting even more specialized. Recall data warehouses? Somebody once quipped to us that they were "bug fixes" for the shortcomings of transaction databases for handling more data-intensive analytics, query, and reporting. But after Y2K, SQL relational databases were supposedly the de facto enterprise standard until they got broken by weblog traffic coming from Internet applications. And from all that came NoSQL databases for handling operational use cases, like maintaining online user profiles or product catalogs, and big data stores like Hadoop for handling those ugly multi-terabyte and petabyte jobs.
The good news was that data platforms could satisfy a much wider range of use cases; the bad news was all these fit-for-purpose data stores were generating new data silos. So much for the data lake.
The backlash to the proliferation of data silos begat, not necessarily a new generation of databases, but more versatility. Back in 2014, we were asked at the last minute by the sponsors of a big data conference to throw together a session. So we wrote a presentation looking at the emerging trend toward database convergence. Transaction databases were adding in-memory column stores for analytics; SQL databases were adding support for querying JSON documents (and vice versa); while Hadoop was making itself accessible to interactive SQL query. Some analyst firms even devised a new category for transaction databases extending their reach to analytics with the term "hybrid."
While we took for granted that databases were overlapping, we were asked on enough occasions over the rest of the year to deliver it that it became a sort of stump speech.
To be fair, a better term for this trend wasn't database convergence, but overlap. Your installation of DB2 or Oracle supporting JSON wasn't necessarily going to lead to ripping out your MongoDB deployments. Neither would we expect your Couchbase installs with N1QL query language to necessarily supplant MySQL or Oracle. If you're going to replace a database, such as moving from SQL to NoSQL, it's likely to be from use cases demanding more flexible data models. Instead, your use of data platforms with extended capabilities, such as a SQL database with JSON extensibility, is more for edge cases, such as where you want to supplement customer transaction records with log data correlating their navigation of your website.
Nonetheless, over the past year, we've seen a new spin with the multi-model database. The poster child, Azure Cosmos DB, is a database defined by its varied APIs. Want a document database? You have a choice: you can use the native API (originally developed for Cosmos DB's ancestor, DocumentDB) or the MongoDB BSON flavor. Want a key/value store? Use the Azure Table Storage API. Do you want graph, or SQL? There's an API for that.
While multi-model is not Cosmos DB's only draw (global distributed scalability and configurable consistency models certainly play in), as Andrew Brust reported, the platform has brought in over $100 million annualized revenue in less than a year. Why the demand?
According to Rimma Nehme, group product manger for Cosmos DB, the most common use cases for multi-model deployments are scenarios that traditionally required multiple databases. Online commerce marks a good case in point, where you have customer transaction records that are modeled as relational; website navigation that is best represented as JSON documents; and segmentation for generating next-best offers, represented as customer behavioral graphs. Not surprising, online commerce represents some of Cosmos DB's most prominent customer references.
And we're seeing more evidence of a trend coming out the open source world. OrientDB exposes graphs, document, key/value, reactive, object-oriented, and geospatial models in the same engine. In turn, Yugabyte has just introduced a multi-model database with a modest start but grand intentions. Designed as a distributed transaction database, the just-release 1.0 version is a work in progress, initially supporting Apache Cassandra and Redis APIs, and for now, a snapshot isolation ACID model. When the company finally paints its masterpiece, there will also be APIs for graph, full text search and (cloud-oriented) object stores; for ACID, there will also be the option for serializable transaction consistency. While we're on the topic of APIs, Amazon's Neptune platform breaks the graph database mold by providing both flavors of graph through support of Resource Description Framework (RDF) graphs and property graph through APIs.
The evidence that we are seeing emergence of databases using APIs to change their skins is the latest proof point that enterprises want to break out of their database silos.