RDBMS vs. NoSQL: How do you pick?

RDBMS vs. NoSQL: How do you pick?

Summary: An earnest argument for NoSQL...from an RDBMS veteran of IBM and Oracle.

SHARE:
TOPICS: Big Data
13
Jnan Dash head shot

This guest post is from Jnan Dash, a tech visionary/executive consultant in Silicon Valley who spent 10 years at Oracle Corporation and 16 years at IBM in various database leadership positions. Dash serves on several boards and advisory boards, including MongoDB's.

 

By Jnan Dash

The market is abuzz with terms like NoSQL, Big Data, NewSQL, Database Appliance, etc. Often, IT decision makers can get very confused with all the noise. They do not understand why they should consider a newer, alternative database when RDBMSs have been around for 20+ years. However, many leading enterprises are already using alternative databases and are saving money, innovating more quickly, and completing projects they could not pursue before as a result. Let’s discuss how one can determine if NoSQL is a fit for current or future applications.

Nature of data
The first consideration that needs to be made when selecting a database is the characteristics of the data you are looking to leverage. If the data has a simple tabular structure, like an accounting spreadsheet, then the relational model could be adequate.

Data such as geo-spatial, engineering parts, or molecular modeling, on the other hand, tends to be very complex. It may have multiple levels of nesting and the complete data model can be complicated. Such data has, in the past, been modeled into relational tables, but has not fit into that two-dimensional row-column structure naturally.

In similar cases today, one should consider NoSQL databases as an option.  Multi-level nesting and hierarchies are very easily represented in the JavaScript Object Notation (JSON) format used by some NoSQL products.

The next question to ask is "what is the volatility of the data model?" Is the data model likely to change and evolve or is it most likely going to stay the same? Generally speaking, all the facts about the data model are not known at design time, so some flexibility is needed. This presents many issues to the relational database management system (RDBMS) users of the world.

During my time at IBM, we spent many hours cautioning users to design the schema right the first time, as revisions made later slowed or stopped the database from operating. For that reason, any potential changes made down the road had to be minimal. The issue of schema-rigidity still rings true today, leading to little flexibility when it comes to application development and evolution.

This "get it right first" approach may have worked in the old world of static schema, but it will not be suitable for the new world of dynamic schema, where changes need to be made daily, if not hourly, to fit the ever changing data model.  It is no wonder that many NoSQL users are Web-centric businesses which require a greater amount of flexibility.

Application development (high coding velocity & agility)
The key constituency of the DBMS is the application developer community. In the past, the industry delineated the database administrator (DBA) from the application developer. The new world blurs such distinctions and demands very little dependency on dedicated DBAs. The software developer becomes the most important user.

As a database grows in size or the number of users multiplies, many RDBMS-based sites suffer serious performance issues.

The developer requires high coding velocity and great agility in the application building process. NoSQL databases have proven to be a better choice in that regard, using object-focused technologies such as JSON, for example. Even if you are a SQL shop, the incremental time to learn emerging database technologies will save lots of development cost over time.

The learning curve on JSON, for example, is quite fast and programmers can build a prototype in days and weeks. Since many NoSQL offerings include an open system, the community provides many productivity tools, another big advantage over single-vendor proprietary products. Some organizations, such as MongoDB, even offer free courses online that train employees and interested users in how to use the technology.

Operational issues (scale, performance, and high availability)
I know from experience that as a database grows in size or the number of users multiplies, many RDBMS-based sites suffer serious performance issues.

Next, consultants are brought in to look at the problem and provide solutions. Vertical scaling is usually recommended at high cost. As processors are added, linear scaling occurs, up to a point where other bottlenecks can appear. Many commercial RDBMS products offer horizontal scaling (clustering) as well, but these are bolted-on solutions and can be very expensive and complex.

If an organization is facing such issues, then it should consider NoSQL technologies, as many of them were designed specifically to address these scale (horizontal scaling or scale-out using commodity servers) and performance issues. Just like Google’s HDFS horizontal scaling architecture for distributed systems in batch processing, these newer NoSQL technologies were built to host distributed databases for online systems. Redundancy (in triplicate) is implemented here for high availability.

A common complaint about NoSQL databases is that they forfeit consistency in favor of high availability. However, this can't be said for all NoSQL databases. In general, one should consider an RDBMS if one has multi-row transactions and complex joins. In a NoSQL database like MongoDB, for example, a document (aka complex object) can be the equivalent of rows joined across multiple tables, and consistency is guaranteed within that object.

NoSQL databases, in general, avoid RDBMS functions like multi-table joins that can be the cause of high latency. In the new world of big data, NoSQL offers choices of strict to relaxed consistency that need to be looked at on a case-by-case basis.

Data warehousing & analytics
RDBMSes are ideally suited for complex query and analysis. Originally DB2 and Oracle were mostly used for query-intensive workloads. Data from production systems were extracted and transformed (via ETL processes) and loaded into an RDBMS for slicing and dicing. Even in today’s world, Hadoop data is sometimes loaded back to an RDBMS for reporting purposes. So an RDBMS is a good choice if the query and reporting needs are very critical.

Real time analytics for operational data is better suited to a NoSQL setting. Further, in cases where data is brought together from many upstream systems to build an application (not just reporting), NoSQL is a must. Today, BI tool-support for NoSQL is new, but growing rapidly. 

Co-existence of RDBMS and NoSQL databases
IBM just announced the implementation of the MongoDB API, data representation, query language and wire protocol, thus establishing a way for mobile and other next-generation applications to connect with enterprise database systems such as IBM’s DB2 relational database and its WebSphere eXtreme Scale data grid. This could usher in a new wave of flexible applications that add significant value by spanning multiple data systems. 

Oracle also introduced its NoSQL product last year.

Data exchange and interoperability will continue to evolve as other industry leaders follow in IBM's footsteps and the functionality of NoSQL databases will continue to evolve over time. Fortune 1000 companies will be well-advised to look at NoSQL database solutions to meet their needs in a data-intensive business world.

The rapid adoption of these alternative databases in just a few years is a testament to their attractiveness to the new world of Big Data, where agility, performance, and scalability reign supreme.

Topic: Big Data

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

13 comments
Log in or register to join the discussion
  • I'm not that big on NoSQL

    Most of the alleged problems of RDBMS are neatly solved by things like Xml datatypes for columns.
    Mac_PC_FenceSitter
    • Thinking About That

      I'm thinking, if it was that simple we would not have seen NoSQL or Hadoop and Google would have built out on RDBMSes.

      Normalization carried to its full extent can lead to performance degradation as JOINs increase. Think of the state or city in an address. Where an item is static and has no affiliated attributes, it doesn't need to be spun off as a separate relation. Some things are in a gray area and the engineering is about choosing when the mathematic models are to be ignored.

      Three words. Null, Null, Null.

      This is related to the prior point, in set theory, an attribute is either in the tuple or not. There are work-arounds, but they play havoc with the relational concepts. NoSQL has a response to this: an object has labelled edges out. No labelled edge, no value.

      XML is verbose, which is the number one selling point about JSON.

      Perhaps I misunderstand your point about Xml datatypes, but it sound as though you are suggesting a text sub-tuple — which may vary with regards to attribute-value pairs across the domain — fixes the dynamic, varied attribute problem. But, all the data is downgraded to a text type, which means perpetual parsing on reads and suggests there is an auxiliary document which is used to validate inputs and outputs (and for which, again, all the possible attributes have to be anticipated) and now one has to create redundant code to replicate what the database has ready for garden-variety columns and input integrity verification. String compares are O(n) so finding matches in the XML value is nik, where k is the number of rows and i the number of attributes in the xml document.

      I do see that finding matches on attributes in NoSQL may have a similar characteristic, but hashing can make the search for a labelled node and test for a attribute faster than two string compares.

      But, again, I may be misunderstanding what you mean by XML datatypes and I am further disadvantaged for using RBMSes a lot, XML types and NoSQL not at all.

      Let's talk about scale. Splitting a relational database is ugly, because it necessarily voids ACID promises. The NoSQL folks start from the CAP theory and say there are three performance points, pick two.

      RDBMSes exact a cost in the updating of indexes during writes. RDBMSes are extremely costly when the schema has to change. NoSQL is no Garden of Eden, it has its costs, but they are different. Clearly, with different cost characteristics, there must be domains where one is preferred to the other. Or, maybe a hybrid of both is the solution to a problem.
      DannyO_0x98
      • What Happened to my Name?

        I came back to see if someone corrected any mistakes and I'm seeing I'm signed as "anonymous."

        DannyO_0x98 wrote that and this.
        DannyO_0x98
  • mongo db

    Heard Mongo db on the rise !
    ThinkFairer8
  • GASP ......

    All this reminds me of is a PICK database structure.
    Perhaps Richard Pick had it right after all ....
    linux4u
  • dbSpaces gives the best of both worlds...

    dbSpaces is a virtual database that provides both SQL and noSQL api's to RDBMS and noSQL databases. So if you want noSQL access to Oracle etc. you can or if you want SQL access to MongoDB etc. you can. In fact with their api's you can mix both SQL and noSQL if you want.

    I believe its a great product has helped us a lot.
    pjc158
  • RDBMS with indexing across nonrelational types

    You can also get the best of both worlds. If your RDBMS is aware of XML and JSON types, it can effectively index them, and get the same performance and flexibility across these documents while still retaining the ability to keep the traditional columns for situations where that is appropriate. Take an example of JSON documents with some supporting metadata stored in relational fields in the same table, like document owner, creation date, etc. That is likely the real future. I recently attended a presentation at OSCON that the presenter had tested MongoDB against PostgreSQL, with the intention of showing why NoSQL was so great, but on seeing the results come out nearly identical, reexamined this idea.

    The best of both worlds!
    grant@...
  • Re: RDBMS vs. NoSQL

    “SQL” ⊂ “Relational” ⊂ “DBMS”.
    ldo17
    • “SQL” ⊂ “Relational”

      That's not really true. SQL has lots of non-relational features. You could say

      “SQL” ∩ “Relational” ≠ Ø
      lukas.eder
      • “SQL” ≠ “Relational”

        So a NoSQL DBMS could be relational, but a SQL DBMS cannot be relational.

        Unfortunately at present no NoSQL DBMS is relational.

        Until that point NoSQL can be safely ignored.
        jorwell
  • Informative Article

    Am going to tweet about it pronto.
    magnumgrp1
  • RDBMS - no question

    The NoSQL approaches generally use methods of data representation that have already been shown to be flawed in theory and cumbersome, complex, inexpressive and inflexible in practice.

    Important point, a SQL DBMS is NOT a relational DBMS. SQL tables are not relations.
    jorwell
  • Definition

    jorwell,
    There is no such thing as a SQL DBMS. DBMS follows varieties of data models, relational being one of them (besides hierarchical, network, object, etc.). Ted Codd pioneered the relational model of data back in 1970s. SQL is a sublanguage developed at IBM based on relational algebra. UCBerkeley developed another language called Quel based on relational calculus, but SQL became the standard language to manipulate relational database. Hence your comment such as "SQL DBMS is NOT relational DBMS" or "SQL tables are not relations" are inaccurate.
    jnan.dash@...