Couchbase chief: Scale or fail - why databases need to smarten up their act over resources

When the NoSQL Couchbase database is updated this summer, firms will be getting a new scaling technology, designed to cut costs and network bandwidth.
Written by Toby Wolpe, Contributor
Couchbase CEO Bob Wiederhold: Better performance and fewer resources.
Image: Couchbase

Technology unveiled today by open-source NoSQL firm Couchbase, allowing primary database tasks to be handled by separate servers, is the only viable approach to scaling data stores, according to the company's CEO.

Multi-dimensional scaling, due this summer with Couchbase Server 4.0, provides a way for ops teams to allocate querying, indexing, and read-writes to specific servers.

This approach significantly improves performance, cuts the amount of resources required and reduces network bandwidth, Couchbase CEO Bob Wiederhold said.

"To deliver high-performance index-building and high-performance queries, every vendor is going to need to provide their version of multi-dimensional scaling. That's just the way it's going to go," he said.

"You're going to have to do it this way or you deliver a solution where you simply can't deliver high-performance indexing and querying."

At the moment, the principal database functions of querying, indexing, and reading and writing are shared between all the servers in a cluster.

"We're allowing you to scale to support each of those three functions independently. So, today without multi-dimensional scaling, let's say you have a 10-node cluster, you would run all three of those functions on that 10-node cluster," Wiederhold said.

"If you had some intensive queries that were running, that may slow down your basic reads and writes. You can now set up three separate servers to do your indexing and two separate servers to do your querying. You can optimise the servers for those specific functions."

For example, servers carrying out basic reads and writes can have fewer CPUs than those running queries.

"You'll actually use fewer resources. The server cost will be less because right now what happens is you're beefing up your servers to be able to support all three functions. You'll be able to size your servers, or configure your servers specifically for either basic reads and writes or indexing or querying," he said.

"What you're going to find is enterprises using lower-powered machines - machines with more memory and fewer processors - to serve up their basic reads and writes. Then you're going to have bigger machines with many more processors to do their indexing and particularly to do their querying. Today what they're forced to do is have big machines everywhere."

Because the allocation of resources is not automated, the ops team will need to determine the configuration of specific primary tasks on clusters, depending on workloads, number and size of indexes, and the number and intensity of queries.

"You would need to figure out how many servers to set up for each of these three functions. Obviously, we'll give you lots of tools to do that. But it won't be automatic. It's not going to automatically configure this for you," Wiederhold said.

"But one of the key things is that you can do all this configuration at runtime. It's not like you need to set this up ahead of time and then it's fixed. So if your workload changes on the fly, you can reconfigure on the fly."

Wiederhold said giving servers specialist roles will also reduce network bandwidth requirements.

"Today, for indexing and querying you need to use a scatter-gather approach. If you have a 10-node cluster, let's say you have 100GB of data, you'll have 10GB of data on each of your servers. You have to build an index on each server and that index will be based on the data that sits on that server. That's the scatter piece," he said.

"Then you have to gather the indexes from each of the 10 servers and build the index for your database. Particularly if you have a big index, then you going to be gathering often and you're going to be transferring large amounts of data."

Couchbase's new approach should typically result in a smaller number of servers to build indexes and do querying.

"Now you'll have maybe even just one index server. If you have one index server, there's no gathering that you need to do. Even if you have two or three, there's much less gathering and, as a result, much less network bandwidth" Wiederhold said.

Couchbase deals with the issue of duplication of databases onto the different nodes through the core replication streaming protocol introduced with Couchbase Server 3.0.

"We use that Database Change Protocol to just send the mutation information that's changed to the indexing node. That's very efficient so you end up communicating less and therefore needing much less network bandwidth," he said.

More on Hadoop and big data

Editorial standards