Gartner to DBAs, BI vendors: Time to reinvent yourselves

Gartner to DBAs, BI vendors: Time to reinvent yourselves

Summary: As long as the reach, bandwidth, and targeting of networking technologies -- particularly the wireless kind -- continues to improve on a nearly Moore's Law like pace, relational database management systems as we know them may eventually be a thing of the past.  So said Gartner analysts Donald Feinberg and Ted Friedman at Gartner Symposium ITxpo in Orlando, FL during a session entitled "The Death of the Database.

TOPICS: Big Data

As long as the reach, bandwidth, and targeting of networking technologies -- particularly the wireless kind -- continues to improve on a nearly Moore's Law like pace, relational database management systems as we know them may eventually be a thing of the past.  So said Gartner analysts Donald Feinberg and Ted Friedman at Gartner Symposium ITxpo in Orlando, FL during a session entitled "The Death of the Database."

The premise of Gartner's argument is that as improvements in networking technologies eventually lead to real-time DBApullquote.jpgconnectivity to any data, that that data is best kept closest to its natural source rather than at the intersection of a row and tuple of a database that, as it turns out, is actually little more than a remote cache.  An RFID-tag equipped can of soup was given as an example of why inventory data needn't persist in a database in order to facilitate the business processes of a grocery store.  

Instead of walking the aisles, taking inventory of everything on the shelves, and then storing that inventory data in a database, the Gartner analysts said to just leave the "data" with the can of soup. Then, in the business process of restocking, the nightly, hourly or however frequently scan of all the RFID tags in the store bypasses the step of storing the inventory data in a database and goes directly to placing an order for more of that can of soup.  Not only does the resulting business process come closer to achieving real time timing, but a step is eliminated from the process.  Said the analysts "if I only have a millisecond need for persistence, processor and memory can handle that.  The data ends up existing for less time than it takes to store the data."

At one point, Feinberg picked a more morbid example but it really made the point of questioning how, when, and where data should persist.  Feinberg rhetorically asked where his health records are better off being stored: in a database in California, on a credit card in his wallet, or a chip that's embedded in the back of his hand.  The answer, as you can imagine, was in a chip in the back of his hand.  It's there that the health record of Donald Feinberg stands the best chance of always being as up-to-date as possible; at least moreso than in a database across the country that a local hospital in Orlando, FL (should Mr. Feinberg require emergency care) may not be able to access (or update) in real time. 

Feinberg said that we could store our health records on a credit card that gets stored in our wallets, but that in that location, the data is already further away from its source than it needs to be.  Feinberg talked about how people can become separated from their wallets in the course of an emergency and then jokingly talked about how, if we become separated from our hands, we may have a problem that's too serious for our health records to be of much help.

The point made by the Gartner analysts is that there's a bit of urban myth to the idea that data must always be stored -- or cached -- in a database.  Sometimes when you really think about the business processes that the data must support and then the degree to which the data must persist to support that process, you may realize that you don't need a database after all.  As data is moved closer to its source and only kept in one place, not only is  the quality is better, according to Friedman, "the data is where you need it, when you need it and only lasts for as long as you need it."  

To prove their points, the analysts talked about how data is becoming more and more distributed and the need for databases to house that data is becoming less and less, the analysts talked about how, in the future, only 20 percent of the data that's stored will be structured data anyway -- the kind of data that is stored in a database and can be accessed with the Structured Query Language (SQL) for querying relational database management systems (RDBMS).  The result is that structured data and SQL will take a back seat to XML and XQuery.  "Searching [unstructured data] will be important" warned the analysts.  "Hence the high value of Google."

The analysts also warned that structured data and SQL won't be the only things that take the hit as a result of highly distributed and often unstructured data. Database administrators (DBA) as we know them today could be an endangered species as well.  "I don't need a DBA to manage the data on a can of soup" said one of the analysts (I can't remember which one). 

Feinberg and Friedman advised the DBAs in attendance that they should be thinking about which of two new roles they'd prefer; that of a repository administrator or that of a data service administrator.  Whereas the former's job is to know all there is to know about the data (where it's located or should be located, how it's structured, how it's modeled or needs to be modeled, etc.), the latter's job is to manage the data services that are consumed by an organization's various applications and business processes.  The latter's job also implies that many of the former functions of the database as they relate to the data (ie: reliability, security, policy, etc.) become the job of the services layer of the software infrastructure rather than the RDMBS layer (in other words, services oriented architectures or SOAs play a big role in this somewhat database-less future).

Not only well the RDMBSes decline in relevance due to the distribution of data and heavier reliance on the services layer, Feinberg predicted that the increasingly real-time nature of the entire software infrastructure means that business intelligence (BI) -- normally a function reserved for a discreet class of software -- will be woven directly into the line-of-business application layer, the result being a decline in relevance of BI software as well.  "The press can quote us," said Feinberg.  "We're debunking BI.  It's not an application anymore.  It's a service that's accessed when and where it's needed and [as said earlier] the data persists only as long as we need it."

The analysts also identified the one caveat where databases will continue to play an important role: where, for reporting purposes, retention of historical data is a requirement for exercises such as long term analytics.  Some data will need to be kept.  But not all.   In terms of recommendations, the Gartner analysts suggested that user organizations take the following five steps:

  • Develop new applications for DBMS independence
  • Develop new policies for persistence based on other mechanisms besides DBMS
  • Create clear service levels for persistence of information when designing systems
  • Foster overlap between middleware and DBA skills
  • Identify vendors concentrating on the services and policy vision

Topic: Big Data

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.


Log in or register to join the discussion
  • ?? I'm speachless ??

    I don't know what they are feeding these Gartner analysts... I think it's too much stupid sauce.
    • There are other issues to consider....

      I think the Gartner analyst has made a valid point. But he must also look at the cost of acquiring the data by users who need that piece of info that is not kept in the system, as and when he or she needs it. Would it be more economical to store in database vs that of reading off RFID and wireless on a adhoc or JIT basis?? What are the risk and probability of not being able to access the data when you need it??
  • In theory,,, yes... In practicum.... No

    Think about it. Gartner is only covering the OLTP aspect of these databases. What about OLAP? Historical Data. You in turn would have to manage a relational database on those items if you wanted to generate a history of what's going on. Not real time of course. But I do agree.... DBA's should not only worry about the system but the data and how to best extract it!
    • agreed

      What happens if your warehouse burns to the ground? How are you going to know what your loss in inventory is if it doesn't exist anymore?

      I hope these people at Gartner don't get paid for coming up with this stuff.

      Database companies do need to come up with tools that use these "real-time" data sources though. But they won't replace relational databases.
      • RE: Gartner to DBAs, BI vendors: Time to reinvent yourselves

        The absolutely <a rel="follow" href="">online casino roulette</a> for <a rel="follow" href="">videopoker online</a> age can be headed for accept <a rel="follow" href="">the online roulette</a> equally an different.
        videopoker online
  • Only 20%...?

    Gartner predicts eventually 20% (was that what I saw in the article) of data will be in a database, eventually, and most of it historical?

    Perhaps, but consider that any information you want to "tag" onto a person to carry around with them is going to be modifiable by that person or other persons near by. I suspect few companies/entities will be willing to risk losing ownership of that data.

    Medical information? Okay, maybe. Changing my own medical information could be dangerous so I have an incentive not to change it. But if it's in my hand, how easy will it be for another person to change it? How close will they have to get? How quickly could they change it? What equipment would they need? Would I even notice?

    Also, what about backups? Is there a nightly save of the stuff stored in my hand? Or, perhaps the technology in my hand will just never, ever, fail.
  • ... more ....

    The inventory of soup cans is neat though. I wonder how far the can has to get away from the inventory monitoring system to not be counted. If a can falls off the shelf and rolls under a freezer, does it still get counted?

    If an employee steals two cans and takes them home, I'm guessing they're removed from the inventory because they can no longer be counted. While you still have an accurate inventory count, how will you detect the loss? I suppose sales won't match up with orders, but you lose what would normally be an inventory discrepancy.
    • RE: Gartner to DBAs, BI vendors: Time to reinvent yourselves

      Entirely The People know, <a rel="follow" href="">online slots</a> and <a rel="follow" href="">these online casino slots</a> be capable of achieve a assortment of good qualification you do it evenly.
      videopoker online
  • Why this won't work

    Let's take the supermarket example.

    All of the information for a can of soup is stored in the RFID tag in the soup can.

    This information presumably includes the price.

    Say, marketing of our supermarket chain decides to change the price of this particular brand of soup making a special offer of 1.99 Euros.

    We would need to change the RFID tag on every can of soup in stock in every store in the chain remotely. I would seriously question whether this can be achieved reliably.

    Now what happens if failures occur? We have two options, either we update none of the cans of soup (rollback the transaction) or we update only the ones we can (the palette that fell off the back of the lorry probably has some defunct RFID tags and so these don't get updated with the new price).

    Imagine going to the CEO and saying we can't have the special offer on soup because one can in a store in Stuttgart has a faulty RFID tag. I think this means you are forced into option two.

    So the customer goes to the check out having selected one can of soup that has been successfully updated and one that hasn't. Therefore he gets two different prices. He then complains about not getting the special offer.

    The next point is that the cashier has now no way of knowing from the computer system what the correct price of the soup is (remember we don't have a database with prices for products, only prices on individual cans of soup).

    The correct way of course is to only hold the product number in the RFID and define the relation between the product and the price in a relation variable (table in SQL speak) in a database. This way we avoid redundancy and the consequent risk of inconsistency that will inevitably cause all the problems described above.

    By this means we show that a method proposed by people who dismiss the relational model can be easily shown as unworkable by applying the methods of the relational model (in this case normalisation). Thus we demonstrate the enduring value of relational methods in the face of all alternatives proposed so far.

    For all the introduction of words like tuples into their presentation, it is abundantly clear that the authors of this proposal have no proper understanding of the relational model. All the talk about persistence makes this very clear.

    The relational model is a way of logically representing data to prevent redundancy, preserve consistency and integrity and to allow the data to be manipulated to allow new facts to be derived from existing facts by logical inference. The model has nothing to say about how or even if the data may be stored permanently.
    • Nice comeback

      Here's an example for you - A "Dollar" Store. This is a store where everything is priced at $1. In this case, you eliminate the pricing relation in the can of soup example. IS there any other data on that can of soup (RFID) that is variable? I can't think of any.
      Roger Ramjet
      • In a Dollara store

        They wouldn't bother with a implementation of RFID. It's too costly and that goes entirely against their whole concept of the dollars store. That can of soup for a dollar suddenly costs 8 cents more for RFID. Is the owner of the store going to eat that or does he change the name to the dollar 8 store.

        Also what store in thier right mind would do this? I can see a RFID tag for a case lot of soup cans coming which get scanned and put into the database before going on the shelves. That makes no sense. Putting RFID on every can is just a waste of money and good way to put yourself out of business.
        • Inventory Control

          You want to know when you are getting low on soup and need to order more. THAT is why you "chip" them.
          Roger Ramjet
          • But you aready know that

            A case of soup come in, it's inventoried. The POS records what is sold and subtracts it from total in the database. When threshold is passed a report is generated for your next order.

            How does RFID on every single can simplfy this? If anything it sounds like it adds cost to every can. What does that expense gain you?
          • RFID on every can gains you CASH!

            In the grocery store example the future works likes this:

            The RFID tag is on every can.
            Individuals are also carrying around some kind of tag that can be id'ed (whether on a card in their wallet or in their hand is hard to say, hah).
            People walk in, take the can, and walk out with it.
            A scanner placed in the exit doors will scan for all the RFID tags in their groceries and bill the individual directly to their bank or credit card account.

            You can now eliminate a bunch of your costs in the checkout area, especially people and space.

            Yes, there are still some 'issues' to work out in the above example (like how to ensure your scanning person A's groceries and not person B's whose walking next to them, etc), but then this is the perfect future ;)
          • Will never happen

            I put a $45 leg of lamb in my bag and rip the RFID tag out. Then cheerfully wave to the cashier on my way out with bags full of 25-cent cheese doodles.

            That's why.
    • It's just unbelievably silly...

      to expect that core back-office business functions like accounting or compliance can be done on a 100% distributed model. Or to expect to do historical or trend analysis for marketing.

      You might be able to query real-time to get back a count of all your RFID soup cans on the shelf, or reset prices simultaneously, but where is the history of all of this being kept? You can't query the can after someone buys it and takes it home.

      The power of relational databases comes from being able to ad-hoc joins on bodies of data from different sources with reasonable performance. Unless you have network speeds to your original source data that allow you to access that as fast as a disk access, you're going to need to pull the data and stage it somewhere to do your join.

      This is a problem with all distributed database models ... as long as you are pulling the expected data sets already sorted in the right way to do a merge join with other sources, you can make it work (although it's slow). But if you need to do a join that isn't pre-defined and built into the system, then you end up either with a (very slow) double-loop iterative join over the network, or else you need to pull the data back to staging tables on a server to either sort it or do some sort of hash join.

      The next generation of virtual data warehouses, called EII or Enterprise Information Integration tools, are trying to do just that, but I don't think they will replace traditional RDBMS's for quite some time. Once the network is as fast as a hard disk (and the day will likely come), then maybe we can have this extreme sort of data distribution.

      And, as jorwell points out in a round-about way, the more distributed your database is, the less control you have over enforcing data consistency checks. How do you implement a foreign-key constraint over the network? Do you need a transaction manager to do a 2-phase handshake to intermediate between the individual RFID sources? Well, gee, that kind of sounds like a database again.
      • It isn't just that it is distributed

        ... it's that it isn't normalised. The can of soup doesn't have a price, the product does. You only need to update the price for the product in one place (in the central database) and not on every can (with the inevitable introduction of redundancy and inconsistencies that this implies).

        Also attributes like reorder level and lead time belong to the product for a particular supermarket so you need the database to represent these relations too.
      • Yes a good point

        The RFID tag is part of the database (a relation with the attributes of a unique identifier for the can of soup and a product id). If we want to count our stock we need to do a distributed query over thousands of distributed databases.

        If someone buys (or steals) the can of soup it has effectively been deleted from the database and as far as we are concerned no longer exists (and has never existed).

        When we sell a can of soup we have to record the transaction (oh yes in that database again). If we do an inventory we need to be able to make a comparison of the stock we actually have with what we expect to have. Therefore we need to record the existence of the can of soup separately from the can of soup (which having been stolen is now deleted without any record of its disappearance) again in a database.
  • Nothing remarkable

    Actually, if you ignore the sensationalist headline, Gartner didn't say anything remarkable. Analysts only give away free advice when it makes them money.

    Here is what they really said:

    1) Distributed heterogenous and federated databases are the future. (With all the major vendors going grid, thats more a statement than a prediction.The implications are that less DBA'ing is needed for the individual databases and more coordinating services and data. ))

    2) Unstructured databases are growing in importance (probably responding to IBM's UIMA initiative and the purchase of SRD)

    3)RFID Materials Management doesn't require a database (UWB/WiMax has been saying that for half a decade.)

    4) Real Time analytics are the future (Pretty much a mantra in the BI world)
    • RFID does need a database

      "3)RFID Materials Management doesn't require a database (UWB/WiMax has been saying that for half a decade.)"

      The reason that UWB/WiMax have been saying this for five years and nothing has happened is because they are totally in mistaken in this belief (see my post above one of many examples that disproves their position).

      I think UWB/WiMax and these two Gartner consultants are logically (and logistically) naive.