MongoDB stitching the cloud and edge together

Although there were few surprise announcements this week at the virtual staging of MongoDB’s annual conference, the underlying theme was unifying the increasingly-diverse platform and promoting its multi-cloud capability.

In reviewing this year's batch of announcements for MongoDB's online user conference, there's a lot that fills the blanks opened last year as reported by Stephanie Condon. But the sleeper is unifying a platform that has expanded over the past few years with mobile and edge processing capabilities, not to mention a search engine, and the reality that Atlas, its cloud database-as-a-service (DBaaS) service, is now comprising the majority of new installs.

Top Cloud Providers

Top cloud providers in 2020: AWS, Microsoft Azure, and Google Cloud, hybrid, SaaS players

The cloud computing race in 2020 will have a definite multi-cloud spin. Here's a look at how the cloud leaders stack up, the hybrid market, and the SaaS players that run your company as well as their latest strategic moves.

Read More

Last year, MongoDB announced previews of Atlas Data Lake, the service of MongoDB's cloud service that lets you target data stored in Amazon S3 cloud storage; full-text search, plans to integrate the then recently-acquired mobile Realm database platform with the Stretch serverless development environment; and autoscaling of MongoDB's Atlas cloud service. This year, all those previews announced last year are now going GA. Rounding it out is the announcement of the next release of MongoDB, version 4.4, which includes some modest enhancements with querying and sharding.

Most roads lead to the cloud

The cloud is clearly MongoDB's future. Four years after the service was introduced, 42% of the overall business is now Atlas, and it is by far, the majority of new MongoDB deployments. That comports with our predictions at the turn of the year that the cloud was becoming the new default deployment option for enterprise data platforms. Admittedly, given that most MongoDB deployments tend to be greenfield use cases as opposed to migrated legacy systems (although we heard about some mainframe redeployments last year), it's not surprising to us that cloud adoption has accelerated faster compared to the installed bases for established enterprise databases like Oracle or SQL Server.

The one bit of major news is that mLab, the third-party MongoDB DBaaS that it acquired a couple of years ago, would be formally migrated to Atlas by the end of the year. mLab customers tended to be more of the long tail of the market, as it was largely small-mid size companies; it filled a gap for MongoDB as it added the missing self-service channel for its cloud business.

Because Atlas is offered on all three major public clouds, MongoDB can claim that its Atlas service is now the most widely available DBaaS service, with a presence in 74 regions worldwide (by comparison AWS and Azure have about 50 – 60 regions apiece). MongoDB is beginning to promote a multi-cloud message: Atlas operates and looks the same, regardless of where it is hosted. Use cases could involve requirements for high availability or data sovereignty, where different cloud providers have a stronger local presence.

Atlas has drawn several enhancements for automating core operations. Among them is automated archiving or data tiering, where data can be moved by policy once it meets an aging threshold from local storage to cloud object storage, with Amazon S3 being the first supported (we expect Azure Blob or ADLS storage, and Google Cloud storage to follow). In conjunction with ramping of Realm, which would open the floodgates to IoT data, automatic archiving will be critical to keeping storage costs under control.

Another feature that will enable more efficient use (and better performance)  is a new schema "autopilot" feature that acts as an automated wizard or alert that could recommend changes to the underlying schema based on query traffic; this feature is especially aimed at DBAs new to MongoDB who are more accustomed to relational data models.

Not directly related to Atlas, but more to private or hybrid cloud is the announcement of a community version of MongoDB's Kubernetes (K8s) operator. The existing operator, which is already certified for Red Hat OpenShift, provides the means for customers to deploy MongoDB in their own homegrown private clouds. Private or hybrid cloud is an area where we expect to hear more from MongoDB.

Stitching mobile and edge into a new Realm

The unity theme came with the integration of Realm, an acquisition made just over a year ago to provide a more native experience for mobile embedded database apps. Realm remains, both a separate brand and a separate database, so the integration is an auto-sync capability that funnels added support for the lighter weight GraphQL for Realm in place of MongoDB query language.

The integration of Stitch, a platform for developing serverless JavaScript functions, with Realm is still in progress. Nonetheless, going forward, MongoDB is deprecating the Stitch brand, becoming the functions platform for Realm. Given the reality that mobile and edge traffic can be unpredictable, serverless is key to making MongoDB's platform cost-effective and practical. There are still more blanks for MongoDB to fill with serverless support. Watch this space.

Meanwhile, back on the mother ship

There are several features for expanding query capabilities in the new MongoDB 4.4 release. The highlight is a new "Union" operator geared for more complex queries. It can combine multiple collections and data aggregation pipelines into a single query. This accomplishes the dual purposes of taking more modular approaches to query design, and also, provides support for more complex queries that otherwise would require a data warehouse. Our take is that this feature won't put data warehouses out of business, but it does push the boundaries of what you could query in MongoDB.

There are several refinements aimed at making distributed deployments perform more efficiently. One is targeted at query: "Hedged Reads" where the query is submitted to multiple nodes, and the client receives the results from the fastest to process it. It's designed to avoid scenarios where a read takes too long to process because the primary node is tied up.

Another enhancement for distributed deployments is a new refinable sharding capability, where the database can be re-partition itself on the fly; this is useful when query patterns shift, resulting in new "hot spots" where bottlenecks can happen.

So what about hybrid cloud?

Although not there yet, MongoDB is reaching the tipping point where the bulk of its revenue will come from the cloud. Admittedly, the use cases for MongoDB traditionally didn't involve the type of sensitive data associated with back-office financial systems, but as the installed base comes up against personal privacy or data sovereignty laws, there will likely be the pull for getting Atlas to work on private or hybrid cloud environments.

Support of the K8s operator is a baby step toward getting there, but for now, this requires MongoDB customers to build their own clouds. Certified on OpenShift, MongoDB is now part of the third-party ecosystem for IBM Cloud Pak for Data. You get MongoDB, integrated with Cloud Pak for Data's governance services, but you still have to deploy and run the database yourself.

The brass ring will come when you can get fully managed Atlas on a hybrid or private cloud that runs inside the data center. For now, the hybrid cloud platform ecosystem has yet to mature, and aside from IBM, none of the platforms have been set up to support third party databases or database services. We expect this situation to change in the next year and at that point, we're wondering when MongoDB will adapt Atlas for K8s.