What should MongoDB be when it grows up?

Over the past year, MongoDB has become a new company. With a new management team in place and a new architecture that uncoupled the storage engine, MongoDB can morph into whatever it wants to be. Where should it go from here?
Written by Tony Baer (dbInsight), Contributor

From its origins nearly a decade ago, MongoDB looked like the equivalent of MySQL invented for the NoSQL era. Like its predecessor, MongoDB was known as a relatively simple, developer-friendly database that was built from the ground up to be quick to implement. It took advantage of the JSON document format, which represents complex data types without the need to force fit them into rows and columns of relational databases. On the strength of its simplicity, a flexible schema, MongoDB has become the fourth-ranked database engine in popularity, behind some pretty major household names, such as Oracle, SQL Server, and MySQL, according to DB-engines.

But with its simplicity came limitations: write performance at scale were hampered by locks, while read performance suffered from poorly-optimized sharding. Other NoSQL platforms like Couchbase, Redis, Cassandra, and HBase have in the past boasted faster writes.

Over time, those issues were addressed. But, more to the point, as MongoDB became more popular, it had to address a broader community of practitioners and end users. While internet developers familiar with JavaScript and JSON were its base constituency, it had to start playing nice with the mainstream SQL world. We were asked last week if demand for DBAs was going away. Actually, it's quite the opposite. Otherwise, why was Hive invented for Hadoop? Why are there well over a dozen interactive SQL frameworks for Hadoop, and why have most, if not all major NoSQL platforms, added SQL skins?

As use cases grew beyond user profiling to more mission-critical applications, such as connected car or service dispatching, the little JSON database-that-could faced greater demands for enterprise-grade security, reliable SLAs, cloud readiness, and adaptability to new forms of query, such as graph.

A year ago, MongoDB became a different company. With a new management team in place and a new architecture that uncoupled the storage engine, MongoDB could morph into whatever it wanted to be. From an architectural standpoint, MongoDB took on the MySQL model. WiredTiger became the new default storage engine, providing superior write performance and lower storage costs thanks to its native compression. That's just the start. Over the past year, it has added separate in-memory and encrypted data stores.

So, we expect MongoDB and/or its partners will go gap-filling. Do you need better analytic performance so you can query data in place rather than set up a separate SQL data mart? A Spark connector has become one of the first steps toward bulking up analytics. Don't be surprised if somebody adds a columnar data store later on. What about taking on streaming data? There probably will be an engine for that.

MongoDB's business has evolved based on the well-established open core model. What was once a fairly bare-bones business of offering an open-source database and a proprietary-monitoring tool has evolved to a portfolio of value-added add-ons available only via subscription. They encompass BI connectivity, management console, DBA tools, along with the additional in-memory and encrypted storage engines plus authentication and access-control features.

And it is now making an aggressive grab for the cloud database business. You could always mount MongoDB on cloud platforms like AWS, Azure, Google Cloud Platform, or others, but you had to manage it yourself. Or you could spring for one of the third-party MongoDB Database-as-a-Service (DBaaS) offerings that boasted simplicity from Compose (now owned by IBM), ObjectRocket (owned by Rackspace), or mLab. MongoDB at long last is now beginning to roll out its own DBaaS, Atlas, which will compete with more aggressive pricing (especially for larger installations), broader replication, and disaster recovery options, as well as a promise for offering the most up-to-date versions.

Getting back to storage -- MongoDB is hardly alone in making its database extensible. That's no longer solely the domain of MySQL but the common motif for the entire database market. Each data platform provider is adding bits and pieces. Household brands like Oracle, IBM DB2, SQL Server, and Teradata are likewise grafting in-memory, JSON, temporal, geospatial, graph, and columnar capabilities through extending the query language or grafting on new storage engines. Hadoop, which isn't a database, is adding capabilities that make it look like one. NoSQL players like Couchbase, DataStax, and Redis are adding SQL query support. All are steadily upping security features, initially with a focus toward authenticating and granting access control. We believe MongoDB will emulate its more-established rivals with data protection features, such as field-level access control, but that will be a longer slog

The result of all this convergence won't be identical products. Each will continue to have their own mix of strengths and weaknesses. For MongoDB, the correct, albeit by now, clichéd line is that it is about "new workloads". By that, we're talking about live, operational workloads that can range from simple counting and averaging processes to manipulating complex data. MongoDB's strength is with the latter, not the former. MongoDB, of course, is not alone there. Incumbents are also targeting them. But let's get realistic here. You won't use a platform like Oracle if your application only consumes JSON data. Conversely, Oracle would make sense if you're extending your existing CRM application or process by enriching it with customer profile data in JSON format. And it could go vice versa with JSON stores like MongoDB, Couchbase, or others.

As MongoDB extends the footprint of addressable use cases, it shouldn't forget where it came from: a developer-friendly platform suited for handling applications requiring the agility of managing complex, morphing document-style data. That's its trump card. It's the same reason -- as open-source alternatives emerge -- Splunk retains viral loyalty among its developer base. And it was the same formula that propelled Microsoft to becoming a force with enterprise applications that started way back with Visual Basic.

So, where should MongoDB go from here? The common thread is stickiness. Atlas makes it possible to hold on to clients that have passed from development to deployment stage -- especially as most of them are opting for the cloud. The BI connector, which lets you run popular tools like Tableau or MicroStrategy, allows MongoDB to claim some analytic workloads that would otherwise go off to a data mart or data warehouse. The just-announced Spark integration is another step, by retaining workloads that might otherwise run on Hadoop.

We believe the next logical step should be for MongoDB to add federated query that starts with its own aggregation framework as the front end but pushes down query processing to wherever the data resides. It's all about owning the query. IBM, Oracle, Microsoft, SAP, and Teradata are staking their claim, as are BI analytics players. The use case should be not to turn MongoDB into a data warehouse but instead extend the operational applications that naturally reside on its platform by embedding analytics.

See also:

Editorial standards