Database Week: DataStax unleashed Astra managed Cassandra cloud

A couple weeks following AWS’s release of Keyspaces, DataStax now has its answer: a managed cloud service based on DataStax Enterprise.

cassandra-iii.jpg

A few weeks after AWS released Amazon Keyspaces for Apache Cassandra, now it's DataStax's turn. After a false start last year, DataStax is now going live with its long-awaited Astra Database-as-a-Service (DBaaS). Unlike what we reported a couple weeks back, Astra will be based on DataStax Enterprise (DSE), not the bare bones DataStax Distribution of Apache Cassandra (DDC). It's the same offering that was formerly branded as Apollo. And most importantly, being based on a cloud-native Kubernetes (K8s) infrastructure, the new service is designed to be cloud vendor-independent.

Announcement of Astra comes during a week that, with inspiration from Eric David Benari, we're informally calling "Database Week." With DataStax, Redis, and Hitachi Ventara all holding digital online events (in lieu of conferences) this week, there is going to be a flurry of database announcements from them and others over the next few days.

Arguably, Cassandra is the last popular open source database to get a managed cloud service. Excluding SQLite, it's the last of the top dozen databases, as ranked in popularity by db-Engines to get there. Until a month ago, there were none, and now there is a real choice: DataStax's offering, which stays close to the Apache Cassandra open source engine, and AWS's, whose Keyspaces service runs on a different storage engine but is API-compatible with Cassandra. As we noted a few weeks back, Keyspaces follows very much in the pattern that AWS established with Aurora and DocumentDB.

At launch, Astra will be available on AWS and Google Cloud, but with the latter, DataStax has a closer relationship that for now includes joint go to market and integration with the Google Cloud console. Initially, it will be a single-tenant implementation, but that will change later – along with support for other public clouds like Microsoft Azure.

Simplifying Cassandra

The arrival of managed cloud services to Cassandra is key to making this high-performance, highly-scaled distributed database accessible to a wider audience. Cassandra has long been known for its performance and scale, but never for its ease of use. Given those hurdles, it would seem more than a minor miracle that Cassandra ranks as high as the 12th most popular database as tracked by db-Engines. But, as the popularity of AWS's DynamoDB service shows, there is strong demand for distributed databases.

Of course, managed cloud services eliminate most if not all the housekeeping, especially where it comes to patches, maintenance, and upgrades. But especially critical were changes to management and deployment, many of which are related to modernization with the new K8s operator and the associated management API (which works as a K8s sidecar). The management API wraps an abstraction layer around the JMX (Java Management Extensions) that Cassandra uses to provide monitoring; JMX was used because Cassandra was written in Java. Without the API, JMX would be far more brittle, because it is a low-level construct that would otherwise have to be customized when running on different platforms. The new API is modular and operates, not only with K8s, but other operators such as Puppet.

DataStax has also open-sourced their new Metrics Collector for Cassandra which was designed to integrate with Prometheus, the open alerting tool, and Grafana, for visualization. The tie-in with Prometheus and Grafana means that DataStax no longer needs to reinvent the wheel when it comes to monitoring and alerting, and with Astra, it has developed a template that prepopulates the dashboards and best practices that will help customers determine what to instrument and monitor – a major stumbling block with traditional Cassandra implementations.

The cloud-native journey

As noted, Astra will be based on DSE, which is DataStax's  commercial implementation of Apache Cassandra, with added features such as enhanced security, a management console, tiered storage, in-memory support, search, plus options for analytics and graph.

The new K8s operator was a 180-degree shift from DataStax's original strategy for its planned cloud service. The initial iteration was that the platform would be extended to work with each particular cloud, but the partnership with google Cloud, announced a year ago, prompted the change that resulted in Astra. That's where plans for the K8s operator came in, and along with it, the new management API to simplify integrations with JMX.

And, down the road, DataStax will refactor the platform into microservices that would allow separation of compute from storage, support multitenancy, enable serverless operation, and provide far more flexibility in scaling. For instance, once DSE on Astra gets refactored into microservices, the customer could specify whether to scale up a compute node, or scale out across multiple nodes, depending on their required service levels and budget. In the future, DataStax wants to make those optimizations easy and automatic for Astra users.

Realigning with open source

After a few years of emphasizing differentiation with Apache Cassandra, DataStax is now seeking to realign its platform with the Apache project and in the long run will likely follow course akin to Cloudera. The underlying database will be open source, but the binaries that implement features like the management console will be specific to the commercial offering.

That's the approach that DataStax took for the major design shift that resulted in Astra: transitioning to a cloud-native architecture based on microservices, containers, and K8s. That was a 180-degree shift from the original strategy, which took a more monolithic approach in adapting the platform to specific clouds. While the ultimate decision rests with the community, DataStax plans to submit the cloud-native extensions to the open source project.

A first step

DataStax is targeting those who want a purer implementation of Apache Cassandra. Like AWS, it promises a similar developer experience supporting the Cassandra tools and APIs to which they're accustomed. But it will stick closer to Apache Cassandra in its CQL support, tablespace and key management, along with some under the hood differences with functions like load balancing. Beyond the paid tier, DataStax will also offer a free community tier that maxes out at 10 GBytes for developers seeking to learn Cassandra.

While Astra will be available initially on AWS and Google Cloud, it's on the latter where the possibilities get interesting because DataStax is one of the databases that are part of Google Cloud's open source database partner program. In the short run, that means joint go to market and integration with the Google cloud console, but longer term, we'd like to see integration with some of Google Cloud's data flow, analytics, and machine learning offerings.

As noted above, this is just the first step in evolving DSE and Cassandra into a cloud-native database. For now, AWS's approach, that leverages its existing storage engines and experience with DynamoDB, has given it a head-start by supporting serverless operation at launch. Additionally, while DataStax's initial launch accomplished much of the goal of simplification, getting to multi-tenancy will ultimately make the service far more cost-competitive. We expect the initial, single-tenant Astra launch will mostly appeal to DataStax's existing customer base, with multi-tenancy being the key to appealing to a broader audience. But, as we noted in our piece on Keyspaces, there's one more important step that we want to see: better tooling for application developers to model schema and build apps that run against Cassandra.  

cassandra-iii.jpg