Cloud native databases. Serverless databases. However you want to call them, there's a new breed of databases on the rise. One that promises automatic scalability on a global scale: No more toiling over configuration, management, replication and the like, just spin some instances in the cloud and go.
But although data is naturally gravitating to the cloud, not everyone is willing and able to move all data there. So databases these days are also increasingly expected to be able to handle workloads seamlessly both on premise and across a multitude of clouds.
And then, there's also what by now seems like an old dilemma: to SQL, or to NoSQL? While upending the traditional design of relational databases has brought benefits in terms of scalability, replacing SQL is not necessarily something people want.
So how does one combine SQL, cloud native, multi-cloud, and hybrid cloud?
Multi-cloud, hybrid cloud, meet SQL
Unsurprisingly, there are a few cloud native database offerings around from cloud vendors. Some of them, like Azure CosmosDB, Google Spanner, and AWS Aurora, also support SQL. Clearly, none of them is multi-cloud.
There are also a few multi-cloud databases around, the likes of DataStax Enterprise or MongoDB, making a play to capitalize on this strength, and adding serverless features. Being open source is a common trait among such offerings. They typically do not have SQL support though.
But are there options that are cloud native, support multi-cloud and hybrid cloud, SQL, and open source? A few, and CockroachDB is notable among them.
If you're wondering what's with the name, it's a nod to resilience, something for which cockroaches are notorious for. Cockroach Labs, home of open source CockroachDB, was founded in 2015 by ex-Googlers Spencer Kimball, Peter Mattis, and Ben Darnell. While at Google, they had all used Bigtable and were acquainted with its successor, Spanner. Then, they set out to build something that can do what Spanner can, and more.
ZDNet had a Q&A with Kimball, Cockroach Labs CEO, to discuss where they are in their journey, and what's coming next. Since 2015 Cockroach Labs has grown to almost 100 employees, moved to a new NYC office, and opened three new offices in Seattle, Boston, and San Francisco, and raised a total of $53.5M over three rounds.
Kimball said they spent much of the first three years architecting and implementing the core product, with roughly 80 percent of employees focused on R&D. More recently, they have built out customer support, marketing, people ops, and sales teams, and are now closer to 50 percent of headcount devoted to R&D.
These all sound like signs of growing up. But where does CockroachDB stand compared to the competition? Kimball thinks it fundamentally comes down to the capabilities of a geo-distributed SQL RDBMS, but offered in a way that provides flexibility to customers who either can't or don't want to go all-in and embrace a proprietary offering from one cloud vendor.
Geo-distribution for the win
But while differentiation from solutions offered by cloud and NoSQL vendors is clear, CockroachDB is not the only cloud-native, multi/hybrid cloud, SQL game in town. What sets it apart, according to Kimball, is geo-distribution:
"We've spent a lot of time implementing CockroachDB from the ground up to provide truly geo-distributed SQL. More recent entrants to the cloud-native SQL market are either not geo-distributed (TiDB, Citus), or the SQL aspect is a monolithic head that's been affixed to a distributed body (Yugabyte, FoundationDB, Aurora).
Geo-distributed yields two fundamental advantages: resilience, which can tolerate datacenter and even region-level failures, and data domiciling, which can keep data close to the customer for latency and privacy. Of the big vendors, only Spanner and Aurora provide the same resilience model, though Aurora's is limited to a single region. No other database vendors yet provide the data domiciling capabilities which CockroachDB offers."
Kimball noted that when they started the company, they weren't yet sure where CockroachDB would fit into the ecosystem, or what kinds of companies would be willing and able to move to a new RDBMS. He went on to add, however, that in 2018 they began to answer those questions and ended with an impressive first year of revenue:
"It turns out that much of the Fortune 2000 is struggling with often board-level mandates to embrace the benefits of the public cloud. That modernization process opens the door to consideration of alternatives to Oracle, especially databases better suited to exploiting the opportunities inherent in the cloud.
Where CockroachDB has a big strategic advantage over the likes of AWS Aurora or Google Cloud Spanner is that we offer a bridge from the reality of existing on-premise deployments to the desired outcome of using the public cloud wherever it makes sense. CockroachDB can be run on-premise, hybrid, and across arbitrary cloud vendors."
Business, meet open source. Open source, meet the cloud.
This brings us to an interesting topic: Competition with cloud vendors, and the commons clause. This is something many open source vendors software are facing, as cloud vendors are taking their products and offering them as managed services, directly competing with said software vendors. In response, software vendors are modifying their licenses to prevent this. Kimball acknowledged this as a huge problem:
"Just when everyone thought a stable business model had evolved for open source businesses, AWS perfected their strip mining operation. We don't yet face the same set of conditions that Confluent does, both in terms of market adoption and direct competition from AWS with our core open source product. Additionally, stewardship of Kafka by the Apache Foundation introduces some of the complexity in Confluent's licensing scheme.
Nonetheless, we must address the same root problem. We were one of the first companies to introduce a source-available enterprise license and to date, have been careful to apply it only to features which are useful chiefly to companies which really should be paying us."
Kimball also noted that the competitive behavior of AWS will put pressure on them to add features that traditionally would be pure open source to the "free" category of our enterprise license, and to add an exclusion for AWS-like behavior.
Business seems to be going well for Cockroach Labs, despite the competition. We would also add names such as NuoDB to that list, although NuoDB is not an open source offering. CockroachDB comes in three flavors: Core, Enterprise, and Managed. Features such as Geo-Partitioning, distributed backup and restore, and extra security are part of the Enterprise version.
Names such as Comcast and Baidu are listed as CockroachDB users. Having a precedent with one of the BAT using open source Apache Flink, and eventually acquiring data Artisans, the vendor offering support for it, we were curious about Cockroach Labs' relationship with Baidu. Kimball said Baidu turned to CockroachDB to replace its sharded MySQL with a distributed database that scales horizontally while providing the familiar SQL interface.
A propos MySQL, and familiar SQL interfaces: it's worth mentioning that CockroachDB is compatible with PostgresSQL. CockroachDB made this decision for a number of reasons, not the least of which was being able to act as drop-in replacement. MySQL compatibility was also considered, but only one could work. Interestingly, both PostgresSQL and MariaDB, MySQL's offshoot, are now building up their cloud plays. A little open source SQL goes a long way.
On being cloud native, Kubernetes, streaming, analytics, HTAP, and the future of CockroachDB
Inevitably, the discussion touched upon the impact of Kubernetes on databases as well. Cockroach Labs is a member of CNCF, and Kimball believes Kubernetes, more than any other CNCF project, illuminates a path to the desired outcome:
"It can be run on-premise, and natively on all the cloud vendors, giving operations a consistent control plane across environments. This is why it's one of the fastest growing open source projects of all time.
However, compared to Borg, the project within Google which inspired it, Kubernetes is still in its infancy. It's struggled recently to handle stateful services. Additional tools will be necessary to orchestrate multiple Kubernetes clusters across regions or cloud providers. These capabilities are critical to enabling CockroachDB's features such as geo-replication and geo-partitioning."
While CockroachDB has a big stake in Kubernetes, what about features such as support for streaming and analytics? Is going HTAP something we can expect to see? Kimball said they are tackling transactional use cases first and foremost, but intend to build on that:
"It's a $45 billion market, so a juicy target. Providing a cloud-native, geo-distributed system of record is a position of strength from which we will expand into a formidable HTAP offering. This is a key difference from the strategy employed by Snappy Data or Splice Machine.
Those products have chosen to tackle HTAP as a distinct product category, whereas we believe strongly that the product category that matters is actually the OLTP system of record, and adding better analytics to that is a downhill journey."
Kimball said they are in the process of vectorizing SQL execution and upgrading the underlying storage system to optimize for analytical workloads. In the meantime, distributed change data capture allows changes in the database to be transactionally streamed in real time to cloud storage or to Kafka.
This allows CockroachDB to be paired with data warehousing or BI solutions. Native integration with Kafka is uni-directional, but there's a JDBC sink connector for Kafka that CockroachDB is compatible with out of the box. Kimball concluded by mentioning features such as full-text and geo-spatial indexes as likely future additions, and noted they routinely consider adding graph capabilities as well.
In any case, CockroachDB is part of a growing class of databases that want to have it all, and is worth keeping an eye on.
NOTE: The post has been updated on 2/28/2019, to include reference to CockroachDB's PostgreSQL compatibility, and how this ties in with ongoing efforts by PostgreSQL and MariaDB.
- The new era of the Multi-Model Database
- Oracle unveils Autonomous NoSQL Database service
- MongoDB wants to get the database out of your way
- The web as a database: The biggest knowledge graph ever
- Big Data 2019: Cloud redefines the database
- Oracle's next chapter: The Autonomous Database and the DBA
- One-fourth of corporate data now in the cloud