Apache Cassandra gets in-memory option with DataStax Enterprise 4.0

Along with better search and management tools, the next version of the Cassandra-based DataStax Enterprise big-data platform offers in-memory computing.
Written by Toby Wolpe, Contributor

DataStax has added an in-memory computing feature in the latest version of its Apache Cassandra-based NoSQL database, as part of a drive to increase the performance of online applications.

As well as the in-memory option, the newly-released DataStax Enterprise 4.0 includes improved search, an updated version of the OpsCenter visual monitoring tool, and certified Cassandra 2.0.

"The focus in this release and some of the others that are upcoming is on performance. The in-memory option is leading that charge," DataStax products VP Robin Schumacher said.

"It brings all the goodness of Cassandra — meaning the flexible data model, multi datacentre support, linear scaleout — to an in-memory database.

"The reason we're doing this is in the industry there's a big emphasis on performance and speed, especially in online for web retail and really any e-commerce business."

The DataStax Enterprise big-data platform, used by businesses such as eBay and Netflix, consists of analytics, search and management tools and support on top of a certified version of the Apache Cassandra distributed database.

With version 4.0, less frequently referenced data can be assigned to traditional spinning disk, with solid-state storage used for faster read-response times, and the in-memory option reserved for the hottest data.

"It's completely transparent to the application and to the developer. They don't need to do anything special to utilise in-memory objects," Schumacher said.

"When you create what is in Cassandra just a normal database table, you can assign it via one of the parameters in the definition of that table to be in-memory. Once you hit enter, it will look, feel, act and taste just like any other Cassandra table.

"You load data into it, massage data, query data from it, and it acts just as a normal Cassandra table. It scales out across multiple nodes, it's available for multi-datacentre support. There's nothing new that a developer or an application has to change to reference those objects."

Once the table has been created, it will be automatically distributed across any new nodes added to the cluster for capacity and scale.

According to Schumacher, performance benchmarks conducted by DataStax on the in-memory feature show significant improvements in speed.

"We've seen anywhere from a general 10 to 100 times improvement in read queries when the in-memory tables are used," he said.

"In cases where you have strong memory exhaustion — the page cache on a Linux system is completely exhausted — you can see up to a 1,000 times improvement in some of the tests we've run."

Finance, e-commerce, telecoms, healthcare and online ad systems are the areas most likely to take advantage of in-memory computing.

"Whenever you need very fast read-response times, whether it's user profile lookups or product searches that are semi-static, they can lend themselves to in-memory use cases," Schumacher said.

Adobe uses DataStax Enterprise for its Marketing Cloud, which has service-level agreements stipulating 95 percent of requests must complete in less than 12 milliseconds.

"This type of demand really pushes in-memory to the forefront," he said.

DataStax has also conducted quality assurance work on Cassandra 2.0, which is included in DataStax Enterprise 4.0, to certify it for production environments.

"With open-source development, the methodology is release early, release often. There's really no formal testing of the code because open source says, 'My user community is going to test this code for me and ensure it works'," Schumacher said.

"That may be fine for open source and for some open-source projects. But if you're talking about something you want to have confidence in when you put it into production, which is battle-tested and is going to stand up under production workloads, that's what our certification process delivers.

"If we find issues in open-source Cassandra, we make the fixes ourselves inside our DataStax Enterprise version and then we give those fixes back to the open-source community. But when they go in, that really depends on the open-source community."

Cassandra 2.0 brings a number of new developer features such as lightweight transactions, as well as improvements to the Cassandra Query Language, whose similarity to SQL make it relatively easy for developers to move from the relational database world, Schumacher said.

DataStax Enterprise 4.0 also includes improvements to the enterprise search feature to provide faster communication between nodes in a cluster for quicker lookup times and search operations, even with thousands of concurrent users.

"We use Apache Solr for enterprise search on Cassandra data and we've certified a new version of Solr that brings a number of new developer features to the table," he said.

DataStax has updated its web-based OpsCenter visual management and monitoring system for Cassandra and DataStax Enterprise, which provides a dashboard for clusters in the cloud and on premise.

OpsCenter 4.1's capacity planning feature collects information about the status of servers and the database to perform trend analysis. Administrators can now see when and why systems are at their busiest and predict workloads.

"You can say things like, 'Based on the history that I'm seeing here, when is my database cluster going to hit 20TB or when is this particular server going to run out of disk space?'," Schumacher said.

Customised timeframes allow users to go back into any time period, forecasting multiple statistics at the same time.

"So you can determine, for example, which table is growing the fastest and is going to grow the fastest. It really helps pro-actively plan for capacity additions whereas before it was really a guessing game," he said.

"Because we've added that in-memory option now to DataStax Enterprise 4.0, OpsCenter can monitor those in-memory tables and alert you if they're becoming too large.

"There is also better drill-down. So if a particular node is beginning to demonstrate some bad performance visually through the tool, you can drill down and find out exactly what's going on."

More on open source and databases

Editorial standards