Is Relational Relevant?

By | June 20, 2007, 11:09am PDT

Last week, some friends of mine from Ingres, the early relational database management system, attended a retrospective on relational database systems held at the Computer History Museum in Silicon Valley with other database pioneers from Oracle, Informix, IBM and Sybase. I was an early employee at Ingres which was the second best selling relational database until it unwound itself and eventually got sold. Back in the 1980s when Ingres started, relational was one of the hot topics of computer science. Today the developments in the retrospective are treated with the same distance as World War II and viewed with the same level of relevance as assembler code despite their widespread use today.

I was very early in studying relational databases and was fortunate enough to attend the University of California at Berkeley where a lot of the research in relational theory was happening. I then joined my professors, Mike Stonebraker, Larry Rowe and Gene Wong, as one of the first engineers in the commercial version of Ingres. We competed in very intense deals with a nascent Oracle and Informix and watched in astonishment as Sybase sprung out of the ashes of an early database machine company. In those days, we tracked closely what the researchers in universities and IBM were doing in the field of databases, scaling, theory and data distribution. This became the foundation for running operations in banks and manufacturing, providing the backbone for ERP and CRM, and storing everything from tiny log items to flight plans to battle orders.

(Joke: How many database theoreticians does it take to change a light bulb? Three: one to do the delete, one to do the insert and one to maintain concurrency. Told to me by Gene Wong with typical deadpan delivery.)

By the beginning of the 1990s, Oracle, IBM and Microsoft hired most of the brains in and coming out of the universities and from each other. A good example is the missing Jim Gray who provided a lot of the early database material that I read when I was a student in the late 1970s. You can’t blame bright researchers for trying to find a more lucrative opportunity in a red hot software market. The result was that the big database vendors prevented the others from getting the innovative and basic research breakthroughs that had characterized the previous couple of decades. Some fundamental problems of database management were left unsolved such as developing efficient and effective models for distributed database systems.

There was always an on-going war between the various alternative database models like the relational, hierarchical and network models, followed by object oriented models. In the early 1980s it seemed like the relational model had won, until Rick Cattell of Sun reignited the debate championing the network model as more of an object-oriented model. Although object databases went nowhere, there has always been an impedance mismatch between object-oriented systems and relational databases. Relational databases smashed object databases with standardization, scalability and integrity. However, object-oriented programmers, even ones that understood the core of relational theory, chose design patterns that suited an object-oriented world.

Thus were born various Object-Relational mapping techniques. It is not a well understood fact that enterprise systems such as SAP, Siebel and Documentum are object-relational systems at their core with object-relational development paradigms if not languages. From this perspective the Object-Relational boom was as big as the client/server and enterprise software boom. To simplify new enterprise development, object-relational mapping tools such as TopLink and Hibernate were developed. In the mean time, the database industry created object-oriented extensions in standards like SQL-1999 that nobody uses. SQL, with its COBOL-like language, was increasingly relegated to more of access method for storing information or a Byzantine reconstructor of stored objects.

In the Web 2.0 world populated with unstructured content and XML in huge stores of information flung across the globe, the ultimate set of distributed database is being test, but without the tools of the major vendors being used. Simple databases built upon MySQL are being joined and unioned in a concept known as shards. One of the precepts of relational theory is to hide the physical representation of the database from the logical representation. Shards throw that precept out the window. Huge on-line databases like Flickr, Digg and Salesforce.com are taking the shard approach and managing the query themselves. The complexity of XML structures means that concepts like normalization go out the window. Object-oriented development drives users to construct and marshal objects out of a data store in as efficiently as possible. How can relational theory be relevant when all the rules are being broken?

Of course relational theory is relevant. It is just that the semantics of the data being managed and how we access it is clouding how we perceive the database. Relational theory decomposes the data management problem into the smallest possible chunks of manageability. It provides models of locking and concurrency that ensure that information is updated or retrieved with a high level of integrity. It provides the basic operations of joining, selecting, projecting and unioning that are being used in massive data stores. It provides the notion that you can describe what is being accessed as a declarative, rather than a procedural description. I like to compare relational theory in relation to enterprise object-oriented systems like nuclear physics upon which organic chemistry is described.

What is becoming less relevant is the view that everything will be stored in a centralized database that somehow magically replicates itself all over the world. Or that all queries of information can be expressed in SQL. Or that a database management system is at the core of managing fragments of XML or JSON. As the patterns of local and global storage become well understood, database management systems will be replaced by distributed libraries that provide the relational operators and transaction control to perform on data. It remains to be seen if new models like XQuery, developed by Don Chamberlain, the same guy who helped developed SQL, will fill part of the void.

The losers in this transformation are likely to be the big database vendors. The winners are likely to be the small, lightweight databases like MySQL acting as transactional stores. The big vendors still hold on to some of the talent that would be working on modernizing the relational theory to today’s requirements, but hopefully new research and talent will emerge to help rationalize these changes into a coherent, new relational model.

Kick off your day with ZDNet's daily e-mail newsletter. It's the freshest tech news and opinion, served hot. Get it.

Topics

Disclosure

John Newton

http://blogs.zdnet.com/Newton/?page_id=2

Biography

John Newton

John Newton has spent the last 25 years building information management software, including co-founding Documentum, the enterprise content management software company with Howard Shao in 1990. John is currently Chairman and CTO of Alfresco, an open source enterprise content management system founded in 2005. John started his career in 1981 in databases as one of the original engineers at Ingres and ultimately ran the database development group. John was also one of the first entrepreneurs in residence in Europe at Benchmark Capital. John has been frequently blogging for the last two years on the change in information management as it evolves with open source, Web 2.0 and the commoditization of software and hardware.

See his personal disclosure page for all John's industry affiliations.

Related Discussions on TechRepublic

Did you know you can take part in these discussions with your ZDNet membership?
5
Comments

Join the conversation!

Just In

SAP is pre-relational really
jorwell 9th Jul 2007
SAP was first developed in the 1980s and SQL-DBMS were even more backward with respect to constraint definition than today's products.

Thus SAP were compelled to put a lot of logic that should have gone into the database into the application.

Life would be a lot simpler for everyone using an ERP if you could just update the tables directly in the full knowledge that there was no chance of you violating the integrity of the database because all the integrity rules are in the database. As it is products like SAP and Oracle Applications have interfaces using laughably primitive methods like APIs. These remind me of the subroutines we used to wrap around ISAM files in COBOL. The thing's a joke really.

The current ERP products only make use of the relational model in the most superficial way and should rightly be seen as primitive first generation examples of this kind of software.
0 Votes
+ -
We're doomed...
Erik Engbrecht 21st Jun 2007
Does no one care about data integrity anymore?

Or systems based on sound mathematical principals?

Good blog, BTW.
how to manage the miss-match between the newer data/applications and the data store. The question is how to create efficient mappings between applications/data that makes the lives of applications developers a lot easier, and also guarantees data integrity.
0 Votes
+ -
The elegance, sophistication and mathematical soundness of the relational model will mean that it will be around for a very, very long time to come.

On the whole however Codd's ideas have been very badly served by implementations. SQL routinely violates relational theory (duplicate rows in results being the most glaring violation but there are many others).

I think it is unfortunate that time and time again the shortcomings of SQL are put forward as shortcomings of the relational model. The model places no restrictions on what types of data can be represented in an attribute (column in SQL speak) for example.

Once again I recommend the work of C. J. Date and Hugh Darwen in the this area, those interested but unfamiliar with their material might want to start with C. J. Date's "Database in Depth".
You don't need one a "new relational model", Codd got it right the first time.

Given relational theory's basis in predicate logic and set theory you would have to invent something that moved mathematically beyond these to supercede the relational model. Maybe in a hundred years or so something like that will happen.

Of course we can move on from SQL-DBMSs, for the simple reason that SQL isn't relational. The relational model requires that all data be represented as attributes within tuples within relations. Every tuple in a relation must be unique; as soon as you have a SQL query that generates duplicate rows you have left the relational model goodbye (and probably generated a whole bunch of nasty bugs in the process - I suspect this is how my life insurance once got debited twice in one month).

MySQL offers absolutely nothing new and is basically a very boring product.

As for Shards, XML and OODBMSs, well if you really want to revive 70s style hierarchical and network models under fancy new names then you have a lot of suffering ahead of you - suffering you deserve, but your customers don't.
0 Votes
+ -
SAP is pre-relational really
jorwell 9th Jul 2007
SAP was first developed in the 1980s and SQL-DBMS were even more backward with respect to constraint definition than today's products.

Thus SAP were compelled to put a lot of logic that should have gone into the database into the application.

Life would be a lot simpler for everyone using an ERP if you could just update the tables directly in the full knowledge that there was no chance of you violating the integrity of the database because all the integrity rules are in the database. As it is products like SAP and Oracle Applications have interfaces using laughably primitive methods like APIs. These remind me of the subroutines we used to wrap around ISAM files in COBOL. The thing's a joke really.

The current ERP products only make use of the relational model in the most superficial way and should rightly be seen as primitive first generation examples of this kind of software.

Join the conversation!

Formatting +
BB Codes - Note: HTML is not supported in forums
  • [b] Bold [/b]
  • [i] Italic [/i]
  • [u] Underline [/u]
  • [s] Strikethrough [/s]
  • [q] "Quote" [/q]
  • [ol][*] 1. Ordered List [/ol]
  • [ul][*] · Unordered List [/ul]
  • [pre] Preformat [/pre]
  • [quote] "Blockquote" [/quote]
ie8 fix

The best of ZDNet, delivered

ZDNet Newsletters

Get the best of ZDNet delivered straight to your inbox

Facebook Activity

White Papers, Webcasts, & Resources
ie8 fix