At its annual MongoDB.Live event this week, MongoDB is unveiling the next major release – version 5.0 – of its eponymous database. To some extent, the highlights for MongoDB 5.0 are not surprising, as there's a greater focus on productivity for its core constituency of developers. But the new release also expands the umbrella of data types with new time series support, followed by features that would be considered enterprise-friendly.
Underscoring this is that MongoDB, and a rapidly growing cross section of its customer base, are going cloud-first. As of the latest quarter, Q1 FY 2022 which was reported back in June, the Atlas managed cloud database-as-a-service (DBaaS) now accounts for 51% of overall revenues.
KEEPING THE LOVE WITH DEVELOPERS
It shouldn't be surprising, given that MongoDB's core constituency has long been with application developers as opposed DBAs or data engineers, that a major focus of the 5.0 release is on developer productivity.
At the top of the list is versioned APIs that start with the 5.0 release. Specifically, this means that MongoDB will commit to backwards compatibility as long as the app uses the commands defined by the Versioned API. The database can change, but your application shouldn't.
This is the type of feature that has been considered standard checkbox for established relational databases like Oracle. It requires that the vendor pretty much control the code and handle all the compatibility testing – something that is labor-intensive, but otherwise straightforward for vendors that control the underlying code, and more of a boil the ocean endeavor for community-based open source platforms like PostgreSQL. MongoDB is promising, for an indefinite period, that the APIs for applications developed for MongoDB 5.0 and later will continue to work without requiring code modifications, even as the underlying version of the database changes. The need for such a feature becomes more essential as, starting with 5.0, MongoDB will be moving to a quarterly release cycle.
Another highlight is the release of what MongoDB terms "native" time series support. This falls under both the productivity and extensibility buckets. It's extensible in that this is a newly supported native data type. More precisely, it's a special type of collection that's optimized for the structure of time series data restructures the schema into a highly compact format that allows for much higher storage density, clustered indexing, and more efficient IOPS. It could also be viewed as a productivity enhancement because, with the native time series support, MongoDB 5.0 handles routine tasks associated with time series data inside the database engine.
Many MongoDB customers, such as Bosch, have been using the platform for time series data. In fact, you could probably say that for most cloud-based NoSQL databases. But prior to 5.0, a lot of the associated grunt work such as de-duplication, rollup bucketing of data, and automatic creation of clustered indexing had to be coded by developers in the application. Now it will be automated as part of the database engine and/or handled with new specialized operators in the query language including window functions and temporal operations.
Time series is a work in progress; downsampling of aging data will come in an upcoming dot release. What about materialized views, as that would be a natural accompanying feature for time series – or analytics in general? Today, they are available manually on demand, but we would like to see them automatically generated.
Another major time saver will be "live" resharding. You set a policy, and then the database will automatically redistribute the data when the time comes. Previously, courageous customers could perform this without taking the system down, but the process was complicated. A key use case for live resharding comes with data that must stay local within a country. MongoDB has had global geo-partitioning for some time, but when data grows in different countries at different rates, this would allow you to change sharding policy, and the database will redistribute data automatically.
Rounding out the developer-focused announcements are the Realm SDK for the Unity mobile game engine, enhancements to Atlas Search, and extension of MongoDB Charts support from Atlas Data Lake. Where this feature especially comes in handy is when data ingest and access patterns change, making the efficient sharding patterns inefficient
KEEPING UP WITH THE JONESES
There are some announcements that have a "Keeping up with the Joneses" theme. On this go round, MongoDB is introducing a preview of a serverless option for its Atlas cloud service. MongoDB isn't first to market here; serverless has been the mainstay in the NoSQL world for a long time. Amazon DynamoDB, Azure Cosmos DB, and Google Cloud Firestore are serverless, it's become the default for DataStax's Astra DB service for Apache Cassandra.
The obvious use case is new databases that do not have predictable workload levels, or where the development team does not have a reliable estimate on the level of traffic it will take. MongoDB is also targeting serverless for a relatively narrow set of use cases such as "sparse" workloads that are data- but not query-intensive, such as an IoT application that is storing a lot of data overall, but only writing new readings to the database a few times a day. Serverless could also be considered a developer productivity feature as developers no longer need worry about specifying how many nodes to provision. Nonetheless, if you have a stable or predictably sized workload, provisioned is probably a better option.
Under the hood, MongoDB 5.0 is tweaking the query engine to support the company's broader mission to displace relational as the default platform for new applications (more about that below). It is modularizing the query engine to make it more extensible, not only for operational queries, but also analytics. A good example is long-running snapshot queries, being introduced in this release, that support point-in-time queries against a globally transaction-consistent snapshot. This is part of MongoDB's broadening appeal; while it will never be viewed as a pure analytic database, there are compelling reasons for integrating real-time analytics into operational platforms. We have seen stirrings of this with other NoSQL operational platforms. As we'll note below, this could provide the springboard for running ML inference workloads in-database.
A subtheme for MongoDB is delivering a "unified data experience." When we asked them what that meant, the response was about meeting customers where they operate, whether it be on-premises, in the cloud, or across any world region. One could also say it's about diverse data or workload types, with the support of time series being a major new thrust. In fact, it's all built on the flexibility of JSON. Prior to version 5.0, there was nothing in MongoDB to prevent customers from using it to store time series data. The same goes with graph data. As MongoDB positions itself as a general-purpose operational database, we expect to hear more of a message about multimodel support.
Which leads us to SQL. Until now, MongoDB has kept SQL at arm's length with a BI connector that enables the Tableaus, Qliks, or other BI visualization tools access to data via ODBC. Earlier on, the company was quite dismissive of the need for SQL, and even in some of the latest briefing materials provided to analysts, included critical statements on SQL's shortcomings.
But if MongoDB aspires to deliver the unified data experience, it has to become more welcoming to the vast SQL community out there, which is not going away anytime soon. And we expect as MongoDB broadens its message to become the next default enterprise database, that it will more actively court DBAs, and maybe data scientists.
As noted, accommodating the SQL crowd would be the obvious start -- the skills base is just too large to ignore. There are several ways MongoDB could get there. For instance, like MySQL, the underlying storage engine is pluggable. Now, we don't know if MongoDB would have to go as far as dropping in a SQL-compatible storage engine. So a more practical option might be adding SQL materialized views. Our guess is that the relational view could come in the query layer, projecting the view onto the underlying document data. As the company's current CTO, Mark Porter, whose back story includes Amazon Aurora and Oracle, MongoDB now has someone well-versed in serving this side of the data organization.
There are a couple other items that we expect will be on MongoDB's agenda. With Atlas growing rapidly (Q1 FY 22 revenues were up 73% year over year), our natural curiosity turns to on-premises customers who want the operational simplicity of the cloud. And there are hybrid cloud platforms, ranging from the Kubernetes variants to offerings like AWS Outposts or IBM Cloud Satellite that could provide the hosting channel for MongoDB to deliver the full vendor-managed DBaaS experience on-premises.
Then there's the 16-ton gorilla in the room: AI and machine learning, and courting data scientists. We've not heard much from MongoDB yet, but it is laying the groundwork to run such workloads more efficiently. Yes, MongoDB does have connectors to Databricks and Iguazio, where data is fed and the models are run and managed. And, yes, there are third parties that would love to grab Mongo data and whisk it to a data warehouse where ML models could perform predictive analytics. With the unified data experience theme, it would make sense for MongoDB to take operational AI workloads in-house.