MongoDB version 2.2 is a rounding out of the product that includes a significant number of new lines of code and more than 600 new features. The latest release includes an advanced aggregation framework, new multi-data center deployment features, and more.
10gen has more than 175 people worldwide, with the goal of reaching 200 employees by the end of the year. They recently received an infusion of $74 million, and are backed by In-Q-Tel, a firm in turn backed by the U.S. Central Intelligence Agency. In-Q-Tel invests in technology that the CIA wants to use, to build specific features for them, or to build a community. 10gen also has received funding from NEA, Sequoia, Union Square Ventures and Flybridge.
Keyhole was one of their most famous endeavors: "Founded in 2001, was a pioneering software development company specializing in geospatial data visualization applications and was acquired by Google in 2004." Ultimately, Keyhole became Google Earth.
MongoDB is a NoSQL database offering built in C++, which allows you to support many languages. So, a SQL statement like,
- CREATE TABLE USERS (a Number, b Number)
So, what's new in MongoDB version 2.2?
"MongoDB 2.2 has been a huge effort to make the database even easier to use and operate," says Eliot Horowitz, 10gen co-founder and chief technology officer. "We think that moving to NoSQL should make you a more productive software engineer, and features like the aggregation framework deliver on that promise."
With the accumulation of massive amounts of data, you need tools to allow you to easily and quickly talk to the data through the enablement of real-time queries. This new release also simplifies reporting and provides the foundation for real-time analytics.
According to MongoDB, release 2.2 can accelerate performance of analytics and reporting up to 80 percent compared to using MapReduce. Finally, the enhanced aggregation framework is significantly easier to use and execute than when using MapReduce and offers new operators, new expressions, and a pipeline-processing framework.
Some of the new operators include
- $match -- where clause
- $project -- select clause
- $unwind -- pivot and array - takes one document and unwinds to 3 docs with one author and tag per doc
- $groupby -- key aggregator to group
- $sort -- allowing you to sort
- $limit -- does exactly what you would expect it to
The net of it is that these new operators allow you to simplify your statements further.
New multi-data center features
In addition to being able to scale horizontally, via an auto-sharding architecture which includes load and data distribution, MongoDB can easily scale to up to 1,000 machines with no downtime. And with the new automatic fail-over, along with multi-data center features, allow MongoDB to:
- Have location-aware data storage policies for improved performance in wide-area multi-data center configurations through tag-aware sharding
- Ability to have multiple data centers accept writes at the same time
- Ensure a quick response across data centers regardless of latency
- Use of heterogeneous hardware tailored to different document types
Allowing MongoDB to place load intelligently by tagging by region is an administrative operation that allows it to reshuffle data to optimizing it based on geography thereby reducing latency. The main advantage of being able to write to multiple data centers at one time, of course, is that it provides fault tolerance. Additionally, without the notion of geography, the tags may be used for other uses such as archiving.
MongoDB 2.2 features a new locking architecture that improves performance for workloads that require frequent disk I/O operations. Users will see faster, more predictable performance from MongoDB deployments, particularly in deployments where disk I/O speed is a limiter.
The concurrency improvements allow for:
- Using multiple systems to achieve parallelism
- Shards as clusters
- Elimination of the global lock
- Yield-on-page fault mechanism (“PageFaultException”) – used to get good IO across systems.
- Replication uses a local DB in the background
- Replication secondaries – Secondaries can write at least as fast as the primaries (Inserts, Updates and Deletes)
New features and improvements
There are hundreds of improvements in the latest release, including:
- Time-to-live (TTL) collections
- Query optimizer improvements
- Better performance with Windows Server
- Better usage of heterogeneous hardware
- Reduced space fragmentation
- And more...
Check out MongoDB for your self at www.mongodb.org.
Is your organization using or planning to use MongoDB? Let me know.