With distributed architecture considered one of the advantages of NoSQL (or nonrelational) databases, Amazon DocumentDB is addressing a key gap with a new multi-region capability that AWS is terming "Global Clusters." Until now, DocumentDB, like most AWS databases, supported multiple read replicas across different availability zones within a region. Now, AWS is extending that capability across regions. While there is still a single primary instance for writes, the Global Clusters feature will support read-only secondary instances in up to five remote regions.
AWS identified two core use cases for the new Global Clusters feature: disaster recovery, for ensuring that the database stays live in the event of a regional outage, and low-latency reads across different local regions scattered across the globe.
To recap, the database, which is formally branded Amazon DocumentDB (with MongoDB compatibility), is a JSON document data store that is MongoDB-compatible. Specifically, it uses its own autoscaling storage engine, but surfaces data to apps through APIs that AWS has written to be compatible with the MongoDB 3.6 and 4.0 interfaces; it supports most, but not all MongoDB APIs. There are parallels with Amazon Aurora, a relational database with its own storage engine that features compatibility with MySQL and PostgreSQL, also via APIs.
The Global Clusters feature expands on DocumentDB's existing active-passive replication capability, where change events are replicated from the primary instance to read-only secondary instances. Until now, DocumentDB supported replication to a maximum of 15 replicas across three availability zones (AZs), within the same region. With Global Clusters, you can now spread the deployment across up to five secondary regions (the home region remains the primary), with up to 16 replicas in each secondary region. Like Aurora, DocumentDB uses storage-based replication to replicate data across regions.
As noted above, distributed databases have been considered the norm in the NoSQL/ non-relational world. For instance, Amazon DynamoDB offers a Global Tables feature that fully distributes reads and writes across multiple local regions; it commits writes globally via a "last writer" time-stamped approach for eventual consistency. However, since the introduction of DocumentDB, AWS has focused DynamoDB more as a key/value store rather than a document database.
Document DB's architecture has a unique advantage when it comes to replication processes. The task is handled by the storage volume, which departs from the traditional practice of running it from the compute node. As a result, there won't be any resource contention between resources such as CPU and memory that otherwise add overhead to applications.
In the MongoDB world, distributed capabilities have widely varied. MongoDB's own Atlas cloud service started supporting read-only replication (like the new DocumentDB feature) a year after it was launched. MongoDB itself (on-premises and in the cloud) also has a limited distributed write capability that designates primaries at the shard level, meaning that different slices of the database control write capabilities for the portions of the data that they maintain. This capability is useful when data sovereignty policies require that specific records be stored and/or updated only within the country of origin. By contrast, Microsoft Azure Cosmos DB, a multimodel database that has a MongoDB-compatible API (like DocumentDB), supports fully distributed read/write capabilities.
In a blog post that just went live, AWS claims that updates from the primary to read replicas are typically executed within a second. DocumentDB Global Clusters is available now.