Last week, some friends of mine from Ingres, the early relational database management system, attended a retrospective on relational database systems held at the Computer History Museum in Silicon Valley with other database pioneers from Oracle, Informix, IBM and Sybase. I was an early employee at Ingres which was the second best selling relational database until it unwound itself and eventually got sold. Back in the 1980s when Ingres started, relational was one of the hot topics of computer science. Today the developments in the retrospective are treated with the same distance as World War II and viewed with the same level of relevance as assembler code despite their widespread use today. I was very early in studying relational databases and was fortunate enough to attend the University of California at Berkeley where a lot of the research in relational theory was happening. I then joined my professors, Mike Stonebraker, Larry Rowe and Gene Wong, as one of the first engineers in the commercial version of Ingres. We competed in very intense deals with a nascent Oracle and Informix and watched in astonishment as Sybase sprung out of the ashes of an early database machine company. In those days, we tracked closely what the researchers in universities and IBM were doing in the field of databases, scaling, theory and data distribution. This became the foundation for running operations in banks and manufacturing, providing the backbone for ERP and CRM, and storing everything from tiny log items to flight plans to battle orders.
(Joke: How many database theoreticians does it take to change a light bulb? Three: one to do the delete, one to do the insert and one to maintain concurrency. Told to me by Gene Wong with typical deadpan delivery.)
By the beginning of the 1990s, Oracle, IBM and Microsoft hired most of the brains in and coming out of the universities and from each other. A good example is the missing Jim Gray who provided a lot of the early database material that I read when I was a student in the late 1970s. You can’t blame bright researchers for trying to find a more lucrative opportunity in a red hot software market. The result was that the big database vendors prevented the others from getting the innovative and basic research breakthroughs that had characterized the previous couple of decades. Some fundamental problems of database management were left unsolved such as developing efficient and effective models for distributed database systems.
There was always an on-going war between the various alternative database models like the relational, hierarchical and network models, followed by object oriented models. In the early 1980s it seemed like the relational model had won, until Rick Cattell of Sun reignited the debate championing the network model as more of an object-oriented model. Although object databases went nowhere, there has always been an impedance mismatch between object-oriented systems and relational databases. Relational databases smashed object databases with standardization, scalability and integrity. However, object-oriented programmers, even ones that understood the core of relational theory, chose design patterns that suited an object-oriented world.
Thus were born various Object-Relational mapping techniques. It is not a well understood fact that enterprise systems such as SAP, Siebel and Documentum are object-relational systems at their core with object-relational development paradigms if not languages. From this perspective the Object-Relational boom was as big as the client/server and enterprise software boom. To simplify new enterprise development, object-relational mapping tools such as TopLink and Hibernate were developed. In the mean time, the database industry created object-oriented extensions in standards like SQL-1999 that nobody uses. SQL, with its COBOL-like language, was increasingly relegated to more of access method for storing information or a Byzantine reconstructor of stored objects.
In the Web 2.0 world populated with unstructured content and XML in huge stores of information flung across the globe, the ultimate set of distributed database is being test, but without the tools of the major vendors being used. Simple databases built upon MySQL are being joined and unioned in a concept known as shards. One of the precepts of relational theory is to hide the physical representation of the database from the logical representation. Shards throw that precept out the window. Huge on-line databases like Flickr, Digg and Salesforce.com are taking the shard approach and managing the query themselves. The complexity of XML structures means that concepts like normalization go out the window. Object-oriented development drives users to construct and marshal objects out of a data store in as efficiently as possible. How can relational theory be relevant when all the rules are being broken?
Of course relational theory is relevant. It is just that the semantics of the data being managed and how we access it is clouding how we perceive the database. Relational theory decomposes the data management problem into the smallest possible chunks of manageability. It provides models of locking and concurrency that ensure that information is updated or retrieved with a high level of integrity. It provides the basic operations of joining, selecting, projecting and unioning that are being used in massive data stores. It provides the notion that you can describe what is being accessed as a declarative, rather than a procedural description. I like to compare relational theory in relation to enterprise object-oriented systems like nuclear physics upon which organic chemistry is described.
What is becoming less relevant is the view that everything will be stored in a centralized database that somehow magically replicates itself all over the world. Or that all queries of information can be expressed in SQL. Or that a database management system is at the core of managing fragments of XML or JSON. As the patterns of local and global storage become well understood, database management systems will be replaced by distributed libraries that provide the relational operators and transaction control to perform on data. It remains to be seen if new models like XQuery, developed by Don Chamberlain, the same guy who helped developed SQL, will fill part of the void.
The losers in this transformation are likely to be the big database vendors. The winners are likely to be the small, lightweight databases like MySQL acting as transactional stores. The big vendors still hold on to some of the talent that would be working on modernizing the relational theory to today’s requirements, but hopefully new research and talent will emerge to help rationalize these changes into a coherent, new relational model.