Make Apache Cassandra great again: DataStax going cloud, Kubernetes, open source, and multi-model

Actions and words, code and advocacy. DataStax is changing strategy, re-engaging with the Apache Cassandra open source community, and releasing some interesting technical advancements while at it, too.

What is Kubernetes?

Leading with code to drive Cassandra ubiquity. This is the key message DataStax is putting forward on the occasion of releasing its open-source Kubernetes operator for Cassandra, alongside improved data streaming and graph queries. 

In these tumultuous times for both open-source software (OSS) at large, and DataStax as a database vendor built around core open source Apache Cassandra, this is something worth exploring. ZDNet connected with Patrick McFadin, DataStax VP Developer Relations, to discuss the ins and outs.

Towards cloud and open source via Kubernetes

As we've highlighted on Big on Data, data is moving to the cloud. This is happening using OSS, and Kubernetes, too. So the fact that DataStax has chosen Kubernetes to highlight its contribution to the Apache Cassandra community should come as no surprise.

With the Cassandra Kubernetes operator, DataStax claims enterprises and users will have a consistent scale-out stack for compute and data. The question is: Where exactly is this coming from. Has it has been developed by DataStax, and then donated to the community as a tangible sign of a new approach?

DataStax recently recruited a number of executives to renew its leadership. Chet Kapoor and Sam Ramji, the new CEO and CSO, respectively, are both ex-Googlers. In a recent interview, Ramji highlighted some key areas: Reconnecting with the OSS community, emphasizing services and support, making life easier for developers.

"We are embracing open source again," McFadin affirmed. McFadin's role in bringing about the Kubernetes operator was instrumental, both on the technical and the social level. Kubernetes is seeing a rapid update. According to a 2019 Cloud Native Computing Foundation Survey, 78% of respondents are using Kubernetes in production, compared to 58% last year. 


DataStax is going cloud and open source via Kubernetes. (Image: DataStax)

This means that different organizations have been working on getting Kubernetes to work with Cassandra, which is among the top 10 most popular databases in the world, according to DB Engines. This was the backdrop against which McFadin worked.

On the one hand, as he noted, having many implementations of the same thing means that people may be on the same page, as far as what is important to work on is concerned. On the other hand, integration is a balancing act, both technically and socially.

DataStax has been working with Sky, Orange, Netflix, Target, and many other teams in the Cassandra community to improve and advance the operator. McFadin, who has a long-standing involvement in OSS, pointed out the obvious: Each of those teams is focused on solving the problems that matter most to them.

The way DataStax is approaching this, as per McFadin, is not by dumping code on GitHub and expecting the community to adopt it as the singular way to work with Kubernetes. DataStax has been developing more than an operator -- there's also a Kubernetes sidecar and management API. DataStax is using this to develop its own cloud, and now it's available for all to use.

Actions and words, code and advocacy

DataStax's cloud-managed version, previously dubbed Constellation, is now rebranded as Astra. It's expected to be generally available soon. McFadin acknowledged the fact that Cassandra has a reputation for being robust, but hard to manage. McFadin also alluded to the upcoming version 4.0 of Cassandra, to which DataStax has pledged to contribute. He said it will be the best release yet, not because of sexy new features, but because of how stable it will be.

Speaking about cloud, open-source code, and community, opened the discussion to a broader topic. McFadin referred to reconnecting with the Cassandra community, and the Apache Software Foundation (ASF), as a humbling experience. He said that people were eager to listen, but to gain their trust, DataStax wanted to let actions speak louder than words. In other words, DataStax backed its intent with what counts the most in OSS -- code. Or, does it? 


There are many pieces in the open-source software puzzle. (Image: Photo by Hans-Peter Gauster on Unsplash)

Valuing and measuring contributions in terms of code alone is not the only way to think about OSS. The ASF favors community over code. Measuring contribution in terms of code is not trivial, but is sufficiently well understood and can be done. But what about community contribution, in terms of advocacy for example?

McFadin referred to his own experience with the DataStax advocacy team. Drawing from this, he mentioned a few metrics that can be used to measure community engagement and contribution: number of workshops, topics and related attendance, question answering on public forums and so on.

We have previously pondered on the topic of whether measuring contribution and rewarding contributors could be a more equitable way to grow and sustain OSS. McFadin did not have an answer to that. He did, however, point out the fact that healthy OSS communities attract contributions from many actors, and in many ways.

In any case, DataStax is not considering a change in licensing to deter cloud vendors from offering Cassandra as a service. One permissive Apache license and one commercial license is all that's needed, McFadin said, and if Amazon wants to do this, so be it.

Towards a multi-model future via graph

Reconnecting with the community sounds like a good thing. More functionality for the open-source Cassandra -- likewise. For DataStax, however, this creates a well-known and unavoidable tension. Which features stay in DataStax Enterprise (DSE), and which ones make it to open source Cassandra?

McFadin replied by saying DataStax does not expect its product to be used in a 100% DataStax shop. He went on to add that customers do not just value features, but also a partner they can rely on, and this is what DataStax wants to be. The recent acquisition of TLP should be seen in the light of a new emphasis on a service-based model, too.

With those important topics being in the spotlight, however, we risk overseeing something else, which is also important: DataStax's move towards becoming a multi-model database. DataStax has added graph capabilities to DSE long ago. Up until now, however, it was not really possible to mix-and-match native Cassandra data and graph data. 


Adding graph query capabilities to DataStax enterprise native data via Gremlin may be the first step toward a multi-model future. (Image: Apache Tinkerpop)

Starting with the newly released DSE 6.8, graph queries can now utilize native Cassandra data models. Inserting data into DSE makes them available for querying using Gremlin. This enables developers to build multi-model applications with joins, matching and traversals over distributed, large data sets.

Besides empowering graph users, this is a big win for "traditional" DSE users, too. As McFadin noted, few developers are religiously devoted to one or the other data model. Most of them just want to use the right tool for the job.

By enabling DSE users to add graph query capabilities to their arsenal, DSE gains a number of things. First, the ability to do joins. Graph excels at this, and DSE users will benefit. Perhaps more importantly, however, DataStax takes the first step toward a multi-model future. To drive Cassandra ubiquity, this could work well.