Big Data forces IT to press reset button on architecture

Summary: From data warehouse to 'data lake': immersing your company in new depths of information awareness.

After decades of struggling, organizations finally thought they had it all figured out: critical data could be maintained through a constellation of relational database management systems, abstracted and made available to enterprise users and BI/analytics applications through data warehouses and data marts.

Now, Big Data -- hundreds of terabytes' worth of information -- barges on the scene and starts to mess things up.  Time for a rethink on data architecture?

Yes, says Dan Woods, who explains what is needed in his latest Forbes post: "Big Data Needs a Big New Architecture." Woods makes a strong case for a new approach to data architecture:

"To take maximum advantage of big data, IT is going to have to press the re-start button on its architecture for acquiring and understanding information. IT will need to construct a new way of capturing, organizing and analyzing data, because big data stands no chance of being useful if people attempt to process it using the traditional mechanisms of business intelligence, such as a data warehouses and traditional data-analysis techniques."

Perhaps it's time to stop thinking in terms of "data warehouses," which evoke images of industrial-era, highly structured framework, and think of something more fluid and expansive, such as a "data lake." (Perhaps the term "data cloud" applies here?) Woods credits James Dixon, CTO of Pentaho, who coined the term "data lake."

The issue, Dixon says, is data warehouse architecture pre-categorizes data at the point of entry. Big Data is too unpredictable for such structure. Users simply won't know how the data needs to be interpreted and leveraged later on.  Woods describes some new forms of repositories that have "data lake" flexibility - such as Pentaho's use of Apache Hadoop, Pervasive’s Data Rush, and complex event processing tools. Enterprise search technologies will also play a key role in the new data architecture. I even speculated on the possibility a couple of years ago, after hearing about ING's database-free deployment. Plus, the ability to analyze search data may open new possibilities for understanding future directions for the business.

Topics: Big Data, Enterprise Software, Software

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

15 comments
Log in or register to join the discussion
  • Extremely doubtful

    The relational model (on which RDBMSs are based) is based on predicate logic and is thus founded on the work of some of the greatest minds in western civilization.<br><br>Could it be that a Forbes analyst is cleverer than these people? When will he be publishing the papers that lie behind his thinking for peer review?<br><br>Pure and utter hokum.
    jorwell
    • RE: Big Data forces IT to press hit button on architecture

      @jorwell This is one way looking at it. I do think that the author does not suggest moving away from relational. I do believe that the author suggests that other methods than current data warehouse strategies need to be looked at. When you consider things like Big Table or Cassandra, it is clear that relational concepts are here to stay, but the underpinnings of products like Oracle, DB2, SQL server, My SQL and others alike are not the answer to the needs of the future. There is a reason why entities doing business in this world are looking at offerings such as Green Plum, Netezza, Cloudera etc.
      mikies
      • RE: Big Data forces IT to press hit button on architecture

        @mikies The future, like the past has multiple needs and different types of data. To say "The future needs" makes a generalization. Relational databases are here to stay, and so are NoSQL, BigTable, etc. I really don't think I need to offer examples of why relational databases are here to stay, if they aren't self evident to you it's probably not worth explaining.
        snoop0x7b
      • RE: Big Data forces IT to press hit button on architecture

        @snoop0x7b Actually, I think you misread my post. It happens, so no worries...Nobody, including me, is questioning the longevity of the RDBMS concept. I think these products are here to stay. However, as we look forward into the future, the growth and usefullness of RDBMS is not going to resemble the past. The problems associated with the data volumes in the future cannot be handled by traditional RDBMS solutions. This reality of RDBMS not really being able to support large data sets has been known to individuals supporting large installations like 200TB data warehouses. Even at volumes like 20TBs, software solutions based on say Oracle are just not doing it. This data volume growth is actually one of the reasons why Oracle is becoming a marginal player in the technology world.
        mikies
      • Relational is the future

        @mikies

        The scaleability is an implementation issue - which can be solved. It makes no sense at all to abandon the enormous advantages of the relational model.

        There is nothing about the relational model that makes it inherently unscaleable.

        If you don't like SQL use an alternative relational language that is simpler and more faithful to the model - there are already concrete proposals for this. This should in itself have significant performance advantages as the optimizer would no longer have to deal with SQL's complexity.

        I would say the relational model is the only approach that will meet the needs of the long term future.
        jorwell
  • Relational is excellent.

    There is no sensible, viable alternative.<br><br>The RDBMS won't be going away any time soon.<br><br>"Big data" needs relational even more than small data does.<br><br>Do I put my data in a "lake" or somewhere where I can perform logical inference on them to reach new conclusions from existing facts. I go for the latter - which is relational.
    jorwell
  • RE: Big Data forces IT to press hit button on architecture

    Great - now I'm going to have to fish for my data.
    gwd3@...
  • RE: Big Data forces IT to press hit button on architecture

    Anyone who calls this 'hokum' hasn't dealt with big data. True, RDBMS won't go away ... probably ever. But they will take it's place alongside a host of new technologies that solve a new class and scale of problems that they never will.
    stewart.allen
    • What's the logical difference between data and big data?

      @stewart.allen <br><br>What advantage do these other tools give? I don't understand. If it if<br>is just a question of scaleability then clearly you could make RDBMSs more scaleable than current implementations which are far from optimal.<br><br>So why switch back to obsolete and inflexible methods like key value pairs? RDBMS is way, way ahead of this stuff. <br><br>RDBMS is the modern way. We've only just scratched its potential.

      Every "post-relational" technology I have seen so far has proved on closer inspection to actually be pre-relational: hierarchical, pointer based and lacking a coherent logical model.
      jorwell
      • RE: Big Data forces IT to press hit button on architecture

        @jorwell<br><br>The smart thing to do is to use some combination of the two. Key-value is compelling when you have several keys that refer to several blobs of data (a many to many relationship between keys and data). <br><br>I'll go through the facebook use-case for you, they use cassandra for their inboxes. Those keys could be something like sender user_id, recipient user id, keywords in the text/subject, and a message ID. In Facebook's case, the keys are recipient ID, sender ID which are primary keys in their relational database. Facebook uses MySQL to manage relationships between people (friends), interests, and likes. <br><br>My experience has been that that there's usually something relational about most applications and it's important to work with that in the appropriate manner, but that sometimes you end up with data that doesn't make as much sense in that model. The traditional data warehouse has a fact table with many rows and a dimension table (of indeterminate size, but generally not that big) and a table relating the two. <br><br>If you know that you have a limited number of dimensions types, and a large number of dimensions (user_ids in facebook's case) but a limited or even constant number of dimensions associated with each piece of data it could make sense to represent your data using something like a key-value pair for each dimension-type. Where it's compelling is when you can expect to have few keys of a given type into a set of data, and expect to have a lot of data.

        My opinion is a smart engineer goes for relational first, then if key/value works better, uses taht.
        snoop0x7b
  • RE: Big Data forces IT to press hit button on architecture

    Not sure I get it either. Yes, there may be challenges to 'big data', but like the other poster said, why wouldn't relational be able to scale?

    In the case of Microsoft's Azure, I'm sure the inner workings are likely more scalable than a standard SQL Server engine. Though applications interacting w/ Azure and SQL Server could be essentially the same, the 'engine' of these databases may be processing the requests to account for the different environment (cloud vs internal company db).
    jaypatel1
    • RE: Big Data forces IT to press hit button on architecture

      @jaypatel1 Why relational would not scale is tough to speak to as the relational products are mostly sold as compiled binaries. The truth is that data warehouses running Oracle with say 20TBs of data are behaving really badly performance wise. For companies like MasterCard, where the data warehosue is about 2PB (1PB=1000TB=1000000GB) the relational approach with Oracle or DB2 has been abandoned over a decade ago. This does not mean that RDBMS is no longer relevant. It very much is. However, RDBMS is like a traveling by train. Many people did it in 1940s, percentage wise, not as many do travel by train today. however, when you look at the sheer number of people traveling by train today, it is probably still larger than in 1940s, just percentage of population is way way way down. Same with RDBMS. Usage will continue, but it is a technology that is mature and it is not going much further.
      mikies
      • But relational is a purely logical model

        @mikies <br><br>And one that has a very well defined logical foundation. <br><br>The alternatives lack this.<br><br>There are considerable opportunities to change the physical representation used in an RDBMS without abandoning the very considerable advantages of the logical relational model.<br><br>Part of the point of the relational model is that it is purely logical and says nothing about physical implementation. This gives the RDBMS implementer the advantage that they can use any kind of physical implementation they like (including key value pairs if that is appropriate) but the database designer and user does not need to know anything at all about the physical representation and works purely at the logical level.<br><br>My principle objection to the so called "Big Data" tools is that they force the designer down to the physical level. This is why I do not believe that these "new" technologies have a long term future.<br><br>Moore's law, faster and more reliable solid state storage, better physical implementations and cleverer optimization algorithms will ensure that RDBMSs will soon be every bit as scaleable as the alternatives (and more fundamental knowledge among designers about relational principles will help too). So I see no need to abandon the enormous advantages of the relational model to ensure that my data is not just quickly accessible but also consistent and correct.<br><br>In actual fact the majority of companies use relational and a tiny minority use "big data" techniques. This will continue to be the case for the forseeable future (probably the next 100 years or so).

        In fact relational will eliminate "big data" techniques as it becomes more scaleable in the future.
        jorwell
    • Agreed

      @jaypatel1

      It doesn't make sense to say relational can't scale. Relational is a mathematical model of how to represent data logically.

      To say relational doesn't scale would be a little like saying that square root doesn't scale because on pressing the square root key on my (somewhat ancient) HP calculator there is a noticeable delay before getting the answer.

      The reason for the so-called "big data" solutions is that with current hardware and RDBMS implementations you possibly need to revive some obsolete (but inflexible and error prone) methods to deal with very large data volumes.

      As, inevitaby, RDBMSs become more scaleable, these half-baked stop-gap "big data" solutions will disappear and never be heard of again.
      jorwell
  • RE: Big Data forces IT to press hit button on architecture

    Big Data is the volume, VARIETY, and velocity of data. The VARIETY is what challenges relational systems. Relational requires advanced knowledge of the data to model it, it requires clean data, etc. -- when a large variety is arriving quickly you don't have the time for all of that and we need solutions that embrace the heterogeneity versus force it to be carefully normalized/conformed in advance.
    bigdata2011