On top of business's growing frustration with proprietary software, the relational database model championed by the big vendors is looking increasingly dated, according to the CEO of leading open-source NoSQL database MongoDB.
Relational databases go back to an era before the internet and are now ill suited to the demands of the cloud and high user numbers, Max Schireson said.
"The database market is in need of a big change. The technology that people typically use, the database layer, was designed in 1970 with a very different set of requirements in mind," he said.
Datasets in those days were smaller and more uniform, and development processes were more waterfall with requirements well known in advance.
"You'd spend a year or two building your application, and then you could revise it. Applications didn't need to be live all the time," Schireson said.
"Now you want to build applications in a much more agile way and include much broader and more varied sets of data. You need to be able to maintain those applications without down time and deploy them into cloud-style infrastructures and service massive numbers of users.
"It's common to have applications for all your customers, your partners, your prospects, all interacting over the internet. It just wasn't in mind when the relational database was built."
Issues with the traditional big enterprise vendors, whose relational databases continue to dominate, have fuelled the growth of open source alternatives.
"There is some frustration with the enterprise software model over the past couple of decades. People are moving more and more. Enterprise 2.0 is about cloud and open source as opposed to enterprise software," Schireson said.
"We want to remove all the reasons people would use a relational database, even though fundamentally it's not the best tool for the job. But it is more mature and they've got this tool and that tool and they've got the skillset — it's that maturity gap that we need to close," Schireson said.
In MongoDB's document-oriented model, the fundamental unit of storage is a document, not in the sense of a book but a technical document — something that is hierarchical and irregular and defines itself, as opposed to a row.
"The first rule of a relational database is that every row in a table has to have the same set of columns as every other row, which we didn't think reflected reality," he said.
"Relational databases in general — certainly all the leading products from the big vendors — don't have the capability built into the product to take a query and decide where the data is, execute the query there, bring it back and, if the data is in multiple places, aggregate it.
"If the data on one of the servers is more than it can handle, [they can't] move some of that data to another server that's less heavily loaded. The leading vendors still have to build those capabilities."
The relational model makes it fundamentally difficult to build those capabilities because the data gets split into hundreds or sometimes thousands of different tables.
"When you enter an order into a relational database, it doesn't go into something that just holds the order as is. The order header gets stored somewhere, each order line gets stored somewhere else, the address information gets stored somewhere else," Schireson said.
As an example, he cited an order in Oracle applications, which can be split into about 150 different tables.
"That makes it very difficult to partition across servers because it's one thing to say, 'I want to have all the orders from customers in the first half of the alphabet here and the second half of the alphabet there', when the order itself is in 150 different places," he said.
"Lots of PhD students have tried to figure out clever algorithms for taking data like this and being able to reassemble it quickly when it's split up. But no one has been able to figure out how to reassemble that order from 150 tables quickly when those tables aren't on the same computer.
"The change that we made is simply to say why do you have to put an order in 150 different places? Why don't you just put it in one place?"
This approach has obvious benefits in the cloud where large sets of inexpensive computers are used rather than buying one multi-million dollar Oracle Exadata box.
"What we've developed is database software that actually lets you partition data across 100 servers so that you can take advantage of the fact that 100 cheap servers that you can rent are more powerful than one expensive one that you buy," he said.
But to compete with years of investment and development in relational databases, MongoDB still needs to add features and tools.
"We've been at this now for six years with a team of 100 engineers. Some of our competitors have been at this for 30 or 40 years with thousands of engineers. So there's an enormous amount of maturity we need to add to the product and there's an enormous amount of tools that need to work with it," Schireson said.
MongoDB is working on tools to make it easier to manage large clusters of, say, 100 servers.
"If you have one server and you want to upgrade the software on it, it's pretty straightforward. If you have 100 servers, you have a bunch of interesting options. You could shut them down a few at a time but there's enough redundant data available throughout the upgrade," he said.
"But it's hard work to orchestrate, figuring out which data is where and making sure you never have both copies of the data shut down at the same time and eventually upgrading all your servers without any downtime.
"We are going to be introducing later this year some tools that enable you to perform operations like that at the push of a button."
Schireson said over the next 18 months the query organiser built for MongoDB would become much more functional, faster, more robust and resilient.
"In a cookbook index it says chicken recipes are on pages 467 to 469. The problem is in a large database it might say the chicken recipes are on pages from four million to seven million," he said.
"Now our database can combine multiple indexes to tell you very quickly that the chicken recipes with under 300 calories that can be made without milk are in these four locations exactly.
"That type of functionality is familiar to people because relational databases have done it for over a decade but no other NoSQL database has been able to do it."
Schireson said MongoDB has focused hard on two specific issues: the ability to run in the cloud, and ease of application development for large, varying datasets.
The result is a sophisticated infrastructure that MongoDB calls a continuous development framework, where at the press of a button a developer can take a small piece of code and look at its impact across a large number of test cases and environments in which it could be deployed and get detailed feedback quickly.
"The process for a developer of analysing the change that they're making no longer is something they submit with the nightly build and then wait to see what happens," he said.
"Now it's not a nightly or weekly process, it's real time. That is really going to multiply productivity and the speed at which we can work."
As evidence of MongoDB's ability to enable businesses to tackle previously intractable problems, Schireson cited the example of MetLife.
The insurance company had spent years and millions of dollars trying to consolidate policy information held in 70 different systems to provide agents with a single view of customers when they called.
"By the time they could get a team to agree on a way to represent the data in those 70 systems in a common format, either one of them would change or there'd be a new one," he said.
"With MongoDB it took them 90 days to go from a mock-up of what they wanted the system to look like to a live system in their call centre. They didn't throw out their old databases but they built a new system where all their other systems fed their data into MongoDB."
MongoDB could accommodate and query data that was heterogeneous so the company didn't have to come up with a single format for all 70 systems.