DataStax's path from Cassandra

Apache Cassandra may be one of the world’s most popular databases, but not many people know about DataStax. The company is looking to simplification, cloud-first development, and rebranding to change all that. It has a lot of moving parts to get into place.

Apache Cassandra has a long history, and DataStax has a history that is almost just as long. DataStax is the company that has specialized in delivering commercial support for Apache Cassandra. But according to an internally commissioned study, Apache Cassandra, which DB-Engines currently ranks as the 10th most popular database, has far more name recognition than DataStax.

Now, let's raise the stakes. Both DataStax and Apache Cassandra have followings that are narrow but very deep. Cassandra was conceived as a masterless database, meaning that it prioritizes scale over consistency. More to the point, it prioritizes the ability to support extremely fast writes, no matter how widely scaled or distributed the database is. It relies on replication, not sharding, for scaling out. In effect, these are the characteristics that define cloud databases, and not surprisingly, DataStax estimates that the majority of its installed base is running its platform in the cloud. Cassandra, the underlying database, was conceived for classic cloud scenarios such as omnichannel Customer 360 applications; smart power grids; and providing the backbone for digital banking applications, among others.

And then there were the family squabbles. DataStax chose a few years back to vacate leadership of the Apache Cassandra project that it created as the community demanded a bigger voice. It was not the smoothest of breaks, and as we learned the hard way from our reconciliation post last fall, there remains a small but highly vocal minority that is not ready to just let go.

Nonetheless, DataStax is forging on, reconnecting with the community, but ultimately, seeking to widen its appeal. As a database designed for extreme scale and performance, it shouldn't be surprising that Cassandra, and the DataStax platform that is largely based around it, is not known as an easy database for developers. Its trajectory is quite the opposite from MongoDB, which (like MySQL, and SQL Server before it), was designed for developer-friendliness before it took on scaling.

For simplicity, one side of the coin is delivering a managed cloud service that, if designed properly, should alleviate the ugliness of deployment. You should fill out a form that essentially asks you about the basic characteristics of your database, such as size, number of nodes (if it's not serverless), performance or service level requirements, and what data centers or regions you wish to deploy to. DataStax recently inaugurated its managed cloud, and it gives you that form-based deployment experience.

Andrew Brust reported last week of the company's plans to release DataStax Constellation, which rebrands and significantly expands the managed cloud service. At launch later this year, Constellation will include DataStax Apache Cassandra as a Service (CaaS), which adds some proprietary management features to the core open source offering. (By the way, don't confuse CaaS with DataStax Distribution of Apache Cassandra, which is the pure open source offering.)

Constellation will also include a new cloud-specific console called Insight that not only provides the pane of glass, but also acts as a conduit for recommendations to operators and admins for dealing with routine bottleneck and configuration issues. That takes it a few more steps toward simplifying life.

Consider Constellation as a work in progress, as DataStax plans to broaden it out with more features. For starters, it means getting the full DataStax Enterprise platform, which includes graph. Today, Constellation includes just the Cassandra piece. And more to the point, DataStax is just starting to traverse the road of Microsoft, Oracle, MongoDB, and others in going cloud-first in product development with the idea that DataStax Enterprise (DSE) features will appear in Constellation first.

special feature

Sensor'd Enterprise: IoT, ML, and big data

The internet of things embeds intelligence into business processes to let us measure and manage the enterprise in ways that were never possible before.

Read More

But that won't happen tomorrow because there are still a number of steps needed to make DSE an integrated, unitary platform with a common API. At the top of the list is fully integrating the graph engine, rather than have it operate at arms-length through a different API. Whereas today, DSE graph operates more like graph on Cosmos DB, DataStax's goal is having it embedded in the same engine powering Cassandra, so the graph can take advantage of the same performance and scaling capabilities of the mother ship. That requires a few more things to fall into place, like replacing the Solr index with a native one better suited to the more complex queries that graph requires. The next version of DSE will start checking these boxes.

Another gap is tooling. DataStax Studio, the existing toolset, is better known for its power rather than its ease of use. For instance, with Studio, developers must manually lay out the schema based on the types of queries they expect to run (that should sound familiar to SQL relational database developers). Also, if they want to collaborate, they must rely on exporting their notebooks as Studio was not designed to be multi-user.

Later this year, DataStax will introduce AppStax into Constellation to make development more intuitive and collaborative; it will introduce a guided experience that will help developers lay out the database, and will be highly visual and web-based. It won't replace Studio at this point, partly because there will always be developers who prefer the power and flexibility of the command line. But in the long run, we hope there will be more convergence to the point where the toolset becomes common, but with different skins.

Apache Cassandra became popular because it was one of the first operational databases that could globally scale. It's not surprising that today, Cassandra is no longer the only global game in town with cloud offerings like Cosmos DB, Cloud Spanner, and a refashioned DynamoDB. The crowding of its space raises the urgency of DataStax appealing beyond the traditional Cassandra fan base. Constellation could provide that route, but to get there, there are a number of moving parts to fall into place.