When I first encountered NoSQL databases I was a bit shocked to find out that they could only provide a fast lookup of a data record by its ID. The premise of indexing a non-key field/attribute is something taken for granted in the relational database world. But in the NoSQL world, where databases are focused on performing simple operations very quickly, the idea of creating a so-called secondary index was typically deemed an extravagance, and one that could hurt performance overall.
Secondary indexes aren't secondary
That's been changing though, and with its new 4.0 release, Couchbase joins the growing number of NoSQL databases that support and embrace secondary indexes. The interesting thing about secondary indexing in Couchbase's case, though, is that while the product team was adding the capability, it did a bunch of other work to ensure that the primary NoSQL use case of key-based lookups did not suffer a regression in performance.
One size need not fit all
Couchbase now offers a workload isolation feature that allows customers to dedicate specific nodes in a cluster to query, indexing, or data storage. So, for example, an 8-node cluster could dedicate 2 nodes to query, 2 more to indexing and 3 more to data. Not only does this keep indexing from hurting query performance, or vice-versa, but it allows for tuning node-level hardware configurations to specific tasks.
A node dedicated to querying could be outfitted with a lot of memory, while data nodes could be optimized for fast and/or high-capacity storage. This means that storage and compute can be scaled independently of each other, making for much more elastic procurement of resources. With that in mind, the Multi Dimensional Scaling (MDS) moniker that these features fall under is quite an apt description.
Couchbase indexes contain pointers to their rows' corresponding IDs. This allows row-level fetches to be performed via a standard key lookup, as with earlier versions of Couchbase, ensuring that NoSQL query use cases continue to perform well.
Oh, and three more things
Couchbase 4.0 brings other additions to the product, including:
- Geospatial indexes, which permit indexing across two value (typically latitude and longitude)
- New filtering capabilities on Couchbase's Cross-Data Center Replication (XDCR), allowing specific data to be replicated to specific geographical locations
- A new database engine, called ForestDB.
On that last point, ForestDB was developed at Couchbase, and introduced as a beta last year. Couchbase 4.0 is the first version of the product to use it. ForestDB uses something called Hierarchical B+ Trie as its fundamental structure, in place of the more conventional (and, Couchbase says, more limited) B-Tree technology used in relational databases.
A nice facet of these new features is that you don't have to use them until you're ready. Customers can continue using only primary indexes if they'd like. They need not configure their cluster for workload isolation or use filtered XDCR. Once customers feel ready though, they can take advantage of these features one-at-a-time or take on a couple of them together.
Allowing for iterative adoption of new features is a great idea; I would even say it's critical. Because while it's great to see NoSQL databases mature, part of a technology's maturity is allowing customers to deploy new features in a controlled manner.