Cockroach Labs: With new funding, its future belongs in the cloud

Cockroach Labs has just doubled its funding, coming on the heels of a year where it has multiplied its customer base. Its next challenge is making the case that its platform is not just for the elite usual suspects. That’s where the cloud may play a key role.

cockroach.jpg

As the company expands beyond its initial round of early adopter customers, Cockroach Labs has secured an additional $55 million in Series C funding that takes its total funding beyond the $100 million level. The funding comes as the company enters the geometric growth phase, doubling revenue quarter over quarter, and now, multiplying the customer count. It expects to invest the funds in growing sales and marketing, and product engineering as well.

The company was founded and currently led by former Google employee and CEO Spencer Kimball, who developed CockroachDB as a multi-platform answer to Google Cloud Spanner. Those of you Big on Data fans will recall George Anadiotis' lengthy interview of Kimball, where he discussed Cockroach's design approach emphasizing geo-distribution and resiliency.

CockroachDB differentiates as one of the few geographically distributed cloud databases that supports the equivalent of multi-master capability across more than one region: the ability to read and write data on a local replica without having to access a central master first. That has obvious performance advantages for any application where updates may come from anywhere in the world.

Most cloud-native databases today provide some form of automated replication capability so that you can have local copies distributed across regions for fast local access to data. But in the vast majority of cases, such fast local access is limited to reading data; for writes, they typically have to make a roundtrip to a central master to get committed.

There are a handful of platforms that provide some form of multi-master capability, such as Amazon Aurora (still in preview), Percona XtraDB Cluster, and NuoDB. There are even fewer that guarantee to pull this off across different geographies, such as Microsoft Azure Cosmos DB, and of course, Google Cloud Spanner. Distributed ACID is complex, which is why multi-master databases have been the exception rather than the rule. And maintaining transaction consistency is tricky, given the limitations of the speed of light and the restrictions of the CAP Theorem, which states in essence, that a distributed database system can only have two of the following three capabilities: Consistency (the data will always be up to date), Availability (the system will always be available), and Partition Tolerance (the system will continue to operate even if there are disruptions in network communications). In a distributed database, the tradeoff will inevitably be between availability and consistency, and for distributed platforms, it's the secret sauce on how to manage that balance.

Of course, the big question is why we should care.

The stock answer for most enterprises is, "We're not Google," meaning that they don't have the same need for global, online transaction databases as the global financial houses or digital online giants. There's a good rationale for that stock answer, because until recently, the cost of globalized deployments was prohibitive, effectively limiting it to financial services giants or emerging digital online companies where the nature of their business (e.g., online gaming) required stretching the envelope. And if you have to shard the data manually, that can become a very complex process indeed.

Implementation typically required one of several paths: either invest in a separate, high-speed change data capture replication tool or place the distributed transaction logic inside the application. So, architecturally, implementing multi-master capability inside the database is simpler and more elegant. It also makes a good technical argument that geographically distributed multi-master databases like CockroachDB are among the few that take fuller advantage of cloud-native architecture than cloud databases lacking that feature.

Globally distributed multi-master databases can be a modernization strategy that would typically be implemented as part of a larger transition when an organization looks to move to the cloud. It is also a strategy when launching net-new cloud-native applications. Given that the brunt of Cockroach's early adopters are Global 2000 organizations, modernization so far has been the dominant use case.

On the horizon, compliance may also drive the decision to adopt a globally distributed database. For instance, when you look at emerging concerns over data privacy and data localization, out of necessity, enterprises may need to geographically partition their databases to stay compliant of laws requiring data to stay inside the country of origin. In a recent release, CockroachDB added the ability to partition your data geographically and still run the database as a single logical, transaction processing instance.

CockroachDB also points to the resiliency card. Although replication, across availability zones or regions, is not necessarily synonymous with disaster recovery or high availability, most cloud databases that spread their footprint have capabilities for promoting replicas to masters in the event of outages. In some cases, those capabilities are confined inside a region, while in others, they may span across two or more regions. But in such a scenario, transactions on the failed master could get dropped if replicas don't get updated before the original master went down. While no database – geographically distributed or not – will ever be 100% available, a data platform that doesn't rely on a single central master should reduce downtime.

Compliance and resiliency are just part of the message. As a platform taking on the extreme use case of a geographically distributed ACID transaction database, Cockroach Labs needs to shine a spotlight on the use cases to raise awareness that these databases are not just for the Googles of the world. And to that challenge, the cloud should play a key enabling role.

Here's how. Geographically replicated databases are implicitly tied to the cloud because, in all practicality, IT organizations with the budgets to build out global clusters from scratch are likely to be few and far between. While CockroachDB supports deployment on-premises, the majority of its customers are managing their installs in the cloud.

Overcoming the perception that geographically distributed databases only serve a narrow audience will require opening of a managed cloud database service that developers can spin upon demand. Admittedly, mounting a managed service is a tall order as it is in effect an addition to the product portfolio that adds management and deployment automation features, not in the base product, not to mention dedicating resources for those 3am phone calls. That can be a heavier lift for independent providers who lack the deep pockets of the AWSs, Azures, or GCPs of the world. Cockroach Labs does offer a managed service of its own database, currently on AWS and Google Cloud, but we hope that its post-funding investment plans will include opening a self-service offering that would make it accessible to a wider audience.