MongoDB chief: Why the clock's ticking for relational databases

MongoDB chief: Why the clock's ticking for relational databases

Summary: Even though it may be the wrong tool for the job, the years of development behind the relational database ensure its popularity — for the moment, says MongoDB's Max Schireson.

SHARE:
MaxShiresonMongoDBJan14300x367
MongoDB's Max Schireson: The database market is in need of a big change. Image: MongoDB

On top of business's growing frustration with proprietary software, the relational database model championed by the big vendors is looking increasingly dated, according to the CEO of leading open-source NoSQL database MongoDB.

Relational databases go back to an era before the internet and are now ill suited to the demands of the cloud and high user numbers, Max Schireson said.

"The database market is in need of a big change. The technology that people typically use, the database layer, was designed in 1970 with a very different set of requirements in mind," he said.

Datasets in those days were smaller and more uniform, and development processes were more waterfall with requirements well known in advance.

"You'd spend a year or two building your application, and then you could revise it. Applications didn't need to be live all the time," Schireson said.

"Now you want to build applications in a much more agile way and include much broader and more varied sets of data. You need to be able to maintain those applications without down time and deploy them into cloud-style infrastructures and service massive numbers of users.

"It's common to have applications for all your customers, your partners, your prospects, all interacting over the internet. It just wasn't in mind when the relational database was built."

Issues with the traditional big enterprise vendors, whose relational databases continue to dominate, have fuelled the growth of open source alternatives.

"There is some frustration with the enterprise software model over the past couple of decades. People are moving more and more. Enterprise 2.0 is about cloud and open source as opposed to enterprise software," Schireson said.

MongoDB, which uses a document-oriented model, lies in fifth place in the DB-Engines Ranking of database management systems classed according to popularity. It is the highest rated non-relational system.

"We want to remove all the reasons people would use a relational database, even though fundamentally it's not the best tool for the job. But it is more mature and they've got this tool and that tool and they've got the skillset — it's that maturity gap that we need to close," Schireson said.

In MongoDB's document-oriented model, the fundamental unit of storage is a document, not in the sense of a book but a technical document — something that is hierarchical and irregular and defines itself, as opposed to a row.

"The first rule of a relational database is that every row in a table has to have the same set of columns as every other row, which we didn't think reflected reality," he said.

"Relational databases in general — certainly all the leading products from the big vendors — don't have the capability built into the product to take a query and decide where the data is, execute the query there, bring it back and, if the data is in multiple places, aggregate it.

"If the data on one of the servers is more than it can handle, [they can't] move some of that data to another server that's less heavily loaded. The leading vendors still have to build those capabilities."

The relational model makes it fundamentally difficult to build those capabilities because the data gets split into hundreds or sometimes thousands of different tables.

"When you enter an order into a relational database, it doesn't go into something that just holds the order as is. The order header gets stored somewhere, each order line gets stored somewhere else, the address information gets stored somewhere else," Schireson said.

As an example, he cited an order in Oracle applications, which can be split into about 150 different tables.

"That makes it very difficult to partition across servers because it's one thing to say, 'I want to have all the orders from customers in the first half of the alphabet here and the second half of the alphabet there', when the order itself is in 150 different places," he said.

"Lots of PhD students have tried to figure out clever algorithms for taking data like this and being able to reassemble it quickly when it's split up. But no one has been able to figure out how to reassemble that order from 150 tables quickly when those tables aren't on the same computer.

"The change that we made is simply to say why do you have to put an order in 150 different places? Why don't you just put it in one place?"

This approach has obvious benefits in the cloud where large sets of inexpensive computers are used rather than buying one multi-million dollar Oracle Exadata box.

"What we've developed is database software that actually lets you partition data across 100 servers so that you can take advantage of the fact that 100 cheap servers that you can rent are more powerful than one expensive one that you buy," he said.

But to compete with years of investment and development in relational databases, MongoDB still needs to add features and tools.

"We've been at this now for six years with a team of 100 engineers. Some of our competitors have been at this for 30 or 40 years with thousands of engineers. So there's an enormous amount of maturity we need to add to the product and there's an enormous amount of tools that need to work with it," Schireson said.

MongoDB is working on tools to make it easier to manage large clusters of, say, 100 servers.

"If you have one server and you want to upgrade the software on it, it's pretty straightforward. If you have 100 servers, you have a bunch of interesting options. You could shut them down a few at a time but there's enough redundant data available throughout the upgrade," he said.

"But it's hard work to orchestrate, figuring out which data is where and making sure you never have both copies of the data shut down at the same time and eventually upgrading all your servers without any downtime.

"We are going to be introducing later this year some tools that enable you to perform operations like that at the push of a button."

Schireson said over the next 18 months the query organiser built for MongoDB would become much more functional, faster, more robust and resilient.

"In a cookbook index it says chicken recipes are on pages 467 to 469. The problem is in a large database it might say the chicken recipes are on pages from four million to seven million," he said.

"Now our database can combine multiple indexes to tell you very quickly that the chicken recipes with under 300 calories that can be made without milk are in these four locations exactly.

"That type of functionality is familiar to people because relational databases have done it for over a decade but no other NoSQL database has been able to do it."

Schireson said MongoDB has focused hard on two specific issues: the ability to run in the cloud, and ease of application development for large, varying datasets.

The result is a sophisticated infrastructure that MongoDB calls a continuous development framework, where at the press of a button a developer can take a small piece of code and look at its impact across a large number of test cases and environments in which it could be deployed and get detailed feedback quickly.

"The process for a developer of analysing the change that they're making no longer is something they submit with the nightly build and then wait to see what happens," he said.

"Now it's not a nightly or weekly process, it's real time. That is really going to multiply productivity and the speed at which we can work."

As evidence of MongoDB's ability to enable businesses to tackle previously intractable problems, Schireson cited the example of MetLife.

The insurance company had spent years and millions of dollars trying to consolidate policy information held in 70 different systems to provide agents with a single view of customers when they called.

"By the time they could get a team to agree on a way to represent the data in those 70 systems in a common format, either one of them would change or there'd be a new one," he said.

"With MongoDB it took them 90 days to go from a mock-up of what they wanted the system to look like to a live system in their call centre. They didn't throw out their old databases but they built a new system where all their other systems fed their data into MongoDB."

MongoDB could accommodate and query data that was heterogeneous so the company didn't have to come up with a single format for all 70 systems.

More on open source and databases

Topics: Big Data, CXO, Data Centers, E-Commerce, Enterprise Software, Open Source, Oracle

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

70 comments
Log in or register to join the discussion
  • A lot of marketing hot air

    Just because something is on the Internet does not mean it is terribly high volume, nor does high volume in and of itself make relational DBs bad.

    The fact is that a lot of CIOs demand incredible reporting, requiring all kinds of data cubing and analysis of the kind really only SQL Server, MySQL/MariaDB, and Oracle can provide.

    There's nothing wrong with NoSQL, big data, Mongo and all that. They serve a need. But the hubris of some of these guys needs to get knocked down a peg - these guys have AN answer, not THE answer. For many of our situations, SQL was, is, and remains the best tool for the job.
    Mac_PC_FenceSitter
    • Agreed.

      Agreed.

      And not just for databases - "cloud computing," mobile devices, etc - they all fill a need, but they are not a complete replacement for other ways of doing things.
      CobraA1
    • Agreed as well - lots of hot air

      Microsoft SQL Server amazes me at how it can manipulate data. It can do just about anything you could possibly want to do with data. This guy probably wants to be the next Bill Gates. What I like about SQL is that is rooted in mathematics. And we all know how powerful that can be.
      There must be something going on for MongoDB to be ranked #5 in the list of most popular database software. But I think it has more to do with the fact that's its "FREE" and maybe it's easier for newbies to grasp the fundamental concepts? I don't know about that aspect since I have no familiarity with the software. So that's just a second guess.
      j4w4
      • There must be something going on for MongoDB to be ranked #5

        a) it's a new toy that b) makes it easy to spin up a bunch of servers. it would have been fine if it didn't require all those "shards" to do the job of just one relational database server. and then there is this argument that you can't store documents of with variable structure in a relational db. please, most relational databases have had that ability for years. some even support indexing and querying the individual fields inside those documents
        vpupkin
    • Maybe, I don't know.

      I am no Db expert and the few times I've messed with schemas I find correctly designing them as very unintuitive. His remark about it being a 70s technology really resonates. SQL has always felt clunky to me.

      Maybe it is time to rethink.
      MeMyselfAndI_z
      • The basics of SQL schema design is actually pretty intuitive to a DB expert

        "I am no Db expert...I find correctly designing them as very unintuitive."

        To the uninitiated, it can be. Designing a SQL database for a large production environment is something that really needs to be undertaken by a database expert - a lot of the hand-holding and graphical design tools that simpler user-level database systems such as Access or FileMaker Pro provide are necessarily absent from larger server technologies. This is because applications built on these systems generally require much more flexibility than these simpler tools can provide.

        For those who understand how relational databases are supposed to work, SQL databases are actually quite intuitive. Logically, everything is stored as a table (imagine a large spreadsheet), and columns within that table contain identifiers (called the Primary Key) that can be referenced by other tables in order to link information together.
        daftkey
      • SQL is not really relational

        SQL is not a good implementation of the model.

        Designing databases is much easier if you forget about SQL and concentrate on learning the relational model.
        jorwell
        • SQL does not make it "relational" or even a Database

          SQL is a "Query Language" for accessing and manipulating data. It may be used to access a database, it may be used to access a Relational Database, but may also access a CODASYL DBMS, indexed files or even clusters of files of any sort.

          It is designed so that you do not have to learn new ways every time you make a system. The structure of the language is based on the relational model, and use terms found in Chris Date's "Relational Calculus". The alternative is to use QBE - that implements Ted Codd's "Relational Algebra" - like in QBEvision, copied by MS in their Access, but where they have distanced themselves from the database theory.
          MS "database" is the first version of Sybase and is not a proper relational database, because it assumes an order and storage method. Download a trial of QBEvision from Sysdeco - it is free, and see the difference between this and and MS "Access" (that is a copy - QBE was shown to MS and then suddenly they terminated the joint venture...).
          knuthf
      • Relational Model is like Maths

        The relational model has a sound mathematical basis (relational algebra and calculus and SQL as it's human readable approximation). Even though the notation may look clumsy to the unexperienced user, the model behind is universal and timeless.
        And, to strech the maths argument a little more - nobody would think of not making use of the Pythagorean Theorem just because it's old.
        Of course there is a case for storing documents in SQL. XML was the traditional data type for storing semistructured contents, now JSON will come as a standard datatype. I'm sure. Check the PostgreSQL JSON capabilities, to see what's happening here. In terms of maths: It's just a small extension of the algebra.
        Jens Albrecht
    • I wouldn't be in a hurry to dismiss MongoDB...

      @Mac_PC_FenseSitter:

      Don't look now but... I am old enough to remember when relational databases were coming on the scene and were being dismissed using frighteningly similar language.

      The people behind MondoDB (and other NoSQL databases) set out to remedy real and perceived deficiencies in relational databases. They recognize that relational databases are good for some things but not for others. In particular, there are some things that we currently shoe-horn into the relational database that can be done faster and better with MongoDB (or NoSQL databases). They know that individuals and corporations are interested in what actually works -- not hype. Thus far, they are delivering!

      We've been here before...
      -- the telegraph vs horses
      -- TV vs movie houses
      -- PCs (and computer networks) vs mainframe computing
      -- etc.

      Do I need to go on?

      Here is the thing: life will not stand still; adapt or die! (Okay, you don't have to die but you can watch those who adapted make out like bandits!)
      auogoke@...
      • The trouble is NoSQL is going backwards

        The NoSQL products are largely revival of methods that the relational model superseded (hierarchical and graph DBMSs). Those among us who remember these methods also understand why they have become obsolete.

        I am not against change but surely it should be change for the better?
        jorwell
        • It may well be true that NoSQL is not new...

          They say timing is everything!

          Sometimes...
          -- a good idea comes along but the technology to implement it is not available.
          -- a product that is intended to fulfill a critical need is badly implemented
          -- sometimes a product is implemented but its makers fail to position/market it properly
          -- sometimes a mediocre product is oversold
          -- speed/accessibility makes all the difference; a products that is accessible and easy to use spurs new uses and even industries

          Among other things, NoSQL databases like MongoDB hold the promise of something you could use for agile development/proof-of-concept. Presumably, if the application works and is found beneficial, it can then be implemented with RDBMS.
          auogoke@...
          • Mr Schireson is a remarkably bad salesman

            "We want to remove all the reasons people would use a relational database"

            I use the relational model because it is based on logic.

            Therefore to remove the reason I use a relational database Mr Schireson would have to remove logic.

            This implies that the results I will to get out of MongoDB are not based on logic and therefore indistinguishable from random.

            This doesn't strike me as a great sales pitch.

            In ten years time everyone will be using RDBMSs and everyone will have forgotten about MongoDB. It's not worth learning about it.

            In the meantime you might want to revise on what problems are inherent in hierarchical DBMSs which make them much LESS flexible than RDBMSs (and therefore presumably much less agile).
            jorwell
          • Guy clearly wanted to make a statement but...

            I also suspect that something was lost in translation.

            What I see is a man on a mission to create a product that exploits gaps in RDBMS technology *and* meets the needs of today's developers and businesses. He strikes me as a tenacious, meticulous, and determined type. I would not bet against him.
            auogoke@...
  • why do you have to put an order in 150 different places?

    Why don't you just put it in one place?
    Because when you later want to change delivery date for the order, you want to change it in one place, not in 150 places. That is all to it. There is a name for it - data normalization.

    There is a fundamental tradeoff between the ease of selects and updates, and it is not like Mr. Schireson just came up with it, it has been known for as long as the data bases were around.

    With a little less bragging he may come across as a more credible advocate for the none-normalized data storage.
    ForeverSPb
    • In fairness to Mr. Schireson...

      ..the problem that you describe where a delivery date would need to be updated in multiple places is actually a common problem in SQL databases (many, especially in the ERP world where I live, are de-normalized for various different reasons and exhibit exactly that problem).

      He doesn't say it outright, but he does imply that having that information stored in a document-based database like MongoDB would actually improve that type of problem.

      Either way, the reality is that this type of issue has more to do with the design of the database schema than it does any limitation of the database software itself.
      daftkey
      • The 150 places to hold the same..

        Is a constraint imposed by the database. Well, the schema should be hidden in a relational database, and so should the way it is stored - one place or 150 or 999 if that matters.

        But, the point here is that maybe the ERP market would be better served with the CODASYL DBMS - that can link together BLOBS, provide simple set navigation, consistent ordering, reference across computers and allow the implementor of system to control more. But, please, if you never studied math, don't educate people in arithemetic, and because we have calculators does not elminate calculus and algebra. If you do not know the relational database theory - don't educate!
        knuthf
        • I'm not sure I follow...

          "The 150 places to hold the same.. Is a constraint imposed by the database."

          I'm not sure we're on the same page here. When you say "constraint imposed by the database", do you mean by the database software (as in, it is a limitation of using SQL/RDBMS to store the information), or are you saying it is imposed by the database as it has been implemented? I would agree with the latter more than the former, as a fully normalized SQL database would only ever have the delivery date on an order stored in one of those 150 tables.

          "But, the point here is that maybe the ERP market would be better served with the CODASYL DBMS - that can link together BLOBS, provide simple set navigation, consistent ordering, reference across computers and allow the implementor of system to control more."

          It probably would, if the information stored was only meant to serve the purpose of a single function of the ERP system. Using the example of an order (say a Purchase Order as an example, since the article wasn't clear on what kind of order is being used), the document model would work great if your use of that data was only to serve the purchasing, receiving, and invoicing function - at each step you're retrieving a single document as a whole and working with it.

          Those of us who work in these systems, however, know that this is rarely ever how the data put into an ERP is used. Often you need to aggregate PO lines in order to analyze vendor lead times of specific part numbers, or to reconcile Job Costing or Manufacturing accounting data back to the purchasing system, or to plan inventory requirements for the next 12 months.

          And this is where SQL beats out MongoDB and other document-based database systems - aggregating data that is structured into tables (such as an "Order Lines" table) with proper identifiers (such as an internal item number, in the given case), is something a SQL database engine is really built for. Hierarchical or document-based systems cannot efficiently index that kind of data.

          MongoDB's only advantage in this case would be that the database structure could be simpler, but the overall performance of that system compared to a SQL system for an ERP application would be about the same for document creation, update, and retrieval, and would be worse for aggregate reporting.

          "But, please, if you never studied math, don't educate people in arithemetic, and because we have calculators does not elminate calculus and algebra. If you do not know the relational database theory - don't educate!"

          Not sure what you mean by this, but many of us have long graduated college and while we may not be able to recite Codd's 12 rules off by heart, we still work in this area day-in an day-out. Don't start talking about who should be educating whom in these conversations, lest you get shot down from your ivory tower.
          daftkey
      • Order Dates

        Not sure which Oracle ERP system is being referred to here but it cannot Oracle E-Business suite. As a consultant who has been working on implementations for nearly 20 years I can confirm that the order date for an order (after being translated from a requsition) is storeed on ONE table only. PO_HEADERS_ALL.

        For performance reasons I know there is a lot of denormisaltion in th Oracle ERP database design. I know of no significant application built on a schema design normalised to 5th normal form or even 3BNF.

        Finally I'm not sure where the 150 tables come from. From memory I can think of only 15 tables representing a requsition, to order (in the above ERP system). This structure come from the need to represent complex order requirements e.g. on an order for a given date you may have delivery schedules for individual items on the order (delivery schedules use dates but that are not order dates). Now because of the widely varying nature of such orders I would love to see how that might be represented in a document database. I would also love to be the guy who gets to write software to parse and process such a document.
        cstewart_4@...
        • Doublespeak from the CEO along with a misunderstanding...

          "I'm not sure where the 150 tables come from. From memory I can think of only 15 tables representing a requsition, to order (in the above ERP system)."

          The 150 tables is a bit of doublespeak and exaggeration on Schireson's part here. If someone needed the level of detail that could only be represented by 150 tables, MongoDB or any other document-based system would choke.

          "I can confirm that the order date for an order (after being translated from a requsition) is storeed on ONE table only. PO_HEADERS_ALL."

          The whole "order date in 150 places" was a red herring introduced by ForeverSPb above. Neither SQL nor MongoDB would have a situation (barring some very bizarre schema design) that would require this.
          daftkey