This guest post is from Jnan Dash, a tech visionary/executive consultant in Silicon Valley who spent 10 years at Oracle Corporation and 16 years at IBM in various database leadership positions. Dash serves on several boards and advisory boards, including MongoDB's.
By Jnan Dash The market is abuzz with terms like NoSQL, Big Data, NewSQL, Database Appliance, etc. Often, IT decision makers can get very confused with all the noise. They do not understand why they should consider a newer, alternative database when RDBMSs have been around for 20+ years. However, many leading enterprises are already using alternative databases and are saving money, innovating more quickly, and completing projects they could not pursue before as a result. Let’s discuss how one can determine if NoSQL is a fit for current or future applications.
Nature of data The first consideration that needs to be made when selecting a database is the characteristics of the data you are looking to leverage. If the data has a simple tabular structure, like an accounting spreadsheet, then the relational model could be adequate.
Data such as geo-spatial, engineering parts, or molecular modeling, on the other hand, tends to be very complex. It may have multiple levels of nesting and the complete data model can be complicated. Such data has, in the past, been modeled into relational tables, but has not fit into that two-dimensional row-column structure naturally.
The next question to ask is "what is the volatility of the data model?" Is the data model likely to change and evolve or is it most likely going to stay the same? Generally speaking, all the facts about the data model are not known at design time, so some flexibility is needed. This presents many issues to the relational database management system (RDBMS) users of the world.
During my time at IBM, we spent many hours cautioning users to design the schema right the first time, as revisions made later slowed or stopped the database from operating. For that reason, any potential changes made down the road had to be minimal. The issue of schema-rigidity still rings true today, leading to little flexibility when it comes to application development and evolution.
This "get it right first" approach may have worked in the old world of static schema, but it will not be suitable for the new world of dynamic schema, where changes need to be made daily, if not hourly, to fit the ever changing data model. It is no wonder that many NoSQL users are Web-centric businesses which require a greater amount of flexibility.
Application development (high coding velocity & agility) The key constituency of the DBMS is the application developer community. In the past, the industry delineated the database administrator (DBA) from the application developer. The new world blurs such distinctions and demands very little dependency on dedicated DBAs. The software developer becomes the most important user.
As a database grows in size or the number of users multiplies, many RDBMS-based sites suffer serious performance issues.
The developer requires high coding velocity and great agility in the application building process. NoSQL databases have proven to be a better choice in that regard, using object-focused technologies such as JSON, for example. Even if you are a SQL shop, the incremental time to learn emerging database technologies will save lots of development cost over time.
The learning curve on JSON, for example, is quite fast and programmers can build a prototype in days and weeks. Since many NoSQL offerings include an open system, the community provides many productivity tools, another big advantage over single-vendor proprietary products. Some organizations, such as MongoDB, even offer free courses online that train employees and interested users in how to use the technology.
Operational issues (scale, performance, and high availability) I know from experience that as a database grows in size or the number of users multiplies, many RDBMS-based sites suffer serious performance issues.
Next, consultants are brought in to look at the problem and provide solutions. Vertical scaling is usually recommended at high cost. As processors are added, linear scaling occurs, up to a point where other bottlenecks can appear. Many commercial RDBMS products offer horizontal scaling (clustering) as well, but these are bolted-on solutions and can be very expensive and complex.
If an organization is facing such issues, then it should consider NoSQL technologies, as many of them were designed specifically to address these scale (horizontal scaling or scale-out using commodity servers) and performance issues. Just like Google’s HDFS horizontal scaling architecture for distributed systems in batch processing, these newer NoSQL technologies were built to host distributed databases for online systems. Redundancy (in triplicate) is implemented here for high availability.
A common complaint about NoSQL databases is that they forfeit consistency in favor of high availability. However, this can't be said for all NoSQL databases. In general, one should consider an RDBMS if one has multi-row transactions and complex joins. In a NoSQL database like MongoDB, for example, a document (aka complex object) can be the equivalent of rows joined across multiple tables, and consistency is guaranteed within that object.
NoSQL databases, in general, avoid RDBMS functions like multi-table joins that can be the cause of high latency. In the new world of big data, NoSQL offers choices of strict to relaxed consistency that need to be looked at on a case-by-case basis.
Data warehousing & analytics RDBMSes are ideally suited for complex query and analysis. Originally DB2 and Oracle were mostly used for query-intensive workloads. Data from production systems were extracted and transformed (via ETL processes) and loaded into an RDBMS for slicing and dicing. Even in today’s world, Hadoop data is sometimes loaded back to an RDBMS for reporting purposes. So an RDBMS is a good choice if the query and reporting needs are very critical.
Real time analytics for operational data is better suited to a NoSQL setting. Further, in cases where data is brought together from many upstream systems to build an application (not just reporting), NoSQL is a must. Today, BI tool-support for NoSQL is new, but growing rapidly.
Co-existence of RDBMS and NoSQL databases IBM just announced the implementation of the MongoDB API, data representation, query language and wire protocol, thus establishing a way for mobile and other next-generation applications to connect with enterprise database systems such as IBM’s DB2 relational database and its WebSphere eXtreme Scale data grid. This could usher in a new wave of flexible applications that add significant value by spanning multiple data systems.
Oracle also introduced its NoSQL product last year.
Data exchange and interoperability will continue to evolve as other industry leaders follow in IBM's footsteps and the functionality of NoSQL databases will continue to evolve over time. Fortune 1000 companies will be well-advised to look at NoSQL database solutions to meet their needs in a data-intensive business world.
The rapid adoption of these alternative databases in just a few years is a testament to their attractiveness to the new world of Big Data, where agility, performance, and scalability reign supreme.