Big Data: More than just analytics

Big Data: More than just analytics

Summary: Some say that Big Data is just Analytics repackaged, but there is a difference. It is especially important to understand the distinction because data driven companies have been found to outperform those who are not.

SHARE:

Analytics provides an approach to decision making through the application of statistics, programming and research to discern patterns and quantify performance. The goal is to make decisions based on data rather than intuition. Simply put, evidence-based or data-driven decision are better decisions. 

Analytics replaced the HiPPO effect ("highest paid person's opinion") as a basis for making critical decision.

So, what is the difference between Big Data and what we have traditionally called "analytics"? According to Andrew McAfee and Erik Brynjolfsson, the difference is massive volumes of data that we have access to, the speed at which data are accumulating and the variety of different data points.

According to the authors, “Each of us is now a walking data generator. The data available is often unstructured – not organized in a database – and unwieldy, but there is a huge amount of signal in the noise simply waiting to be released”.

Volume of data: As of this year the amount of data being created is in the range of a few exabytes (2.5) and doubles every 40 months.

An exabyte is 1,000 petabytes, or a billion gigabytes.  According the authors, more data crosses the Internet today than was stored in in the entire Internet twenty years ago. So the amount of data available is staggering.  

Managing that amount of data has spawned NoSQL players like of Hadoop and MongoDB.

Velocity of data: The speed at which data is created is sometimes more significant than the amount of data. The ability to react to large amounts of data in real, or near real time equates with agility today.  The example cited is that of the MIT Media Lab using location data from mobile phones to determine the volume of shoppers at a Macy's parking lot on Black Friday. The goal was to estimate the retailer's sales ahead of Macy's actually recording those sales. Analysts kill for this sort of predictive edge.

Variety of data: The advent of social media changed the data landscape significantly. Today we have many relatively new sources of data. When we think of traditional data points, or those that are found in relational databases, we don't tend to consider photos, tweets, status updates, location or GPS coordinates. These are all relatively new.

So, the challenge for the new tools is to help us unlock this 'signal'. While analytics bring us techniques to help us to make better decisions, big data provides us something more powerful as it has served to expand the traditional data set.

Topics: Big Data, Data Management

Gery Menegaz

About Gery Menegaz

Gery Menegaz is a Chief Architect for IBM with more than 20 years supporting technologies in the financial, medical, pharmaceutical, insurance, legal and education sectors. My Full-Time Employer is IBM. I write as a freelancer for ZDNet.

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

12 comments
Log in or register to join the discussion
  • The data type red herring again

    " When we think of traditional data points, or those that are found in relational databases, we don't tend to consider photos, tweets, status updates, location or GPS coordinates. "

    A relational DBMS can represent any type of data, it's just a question of extending the set of types in the DBMS. All the things mentioned here can be represented in a relational DBMS.

    The idea that relational DBMSs can only represent characters and numbers and not pictures, videos, short messages or sound is a complete myth.
    jorwell
    • Can you point me in the direction of those doing such?

      You are right these things "can" be represented in a typical RDBS, but the question is are they the "right" tool for the job and more importantly at the "right" price point.

      The thing that turns the typical DB/BI/DW model on it's head is the concept of never having to throw away or pair down data. Now you don't have to use it all, but it is now cheap enough to store (S3/Azure/etc) and process (Hadoop/Drill/BigQuery) that the typical funnel of data over time is going away.

      Even better that data can be accessed way up the stack, this is exactly how "People You May Know" on LinkedIn or Amazon's product recommendations work?
      mobile_manny
      • Absolutely the right tool

        To do logical inference on data you need to express the data as propositions.

        The relational model is directly based on predicate logic and the only tool that can do this.

        The so-called "big data" tools have no such underlying model. Therefore the methods to access them are ad-hoc and you cannot rely on them to get consistent results.

        Why do you have to throw away data with relational? It's a purely mathematical model for data representation. The data volume is an implementation issue. The idea that the relational model isn't scalable is a myth.
        jorwell
        • Absolutely wrong!

          That is just not the case, the tools today are very sophisticated a produce predictable results. If you like I can get you in touch with any number of companies that would be happy to run you through a demo.

          Thanks for the comment.
          gery.menegaz
          • I don't want a demo

            I want a clear explanation about the theory behind these supposedly "new" methods.

            If they cannot supply that then I must assume that the products have been built on an ad-hoc basis and cannot be trusted to produce reliable results.

            If any of these companies are so far advanced that they have developed something that replaces predicate logic, set theory and statistics then surely the newspaper headlines would be full of this astounding step forward in mathematics?

            I haven't seen that.
            jorwell
  • thoughts

    "The goal is to make decisions based on data rather than intuition. "

    Well, that's been the goal ever since numbers were invented. Early uses of mathematics were often used to quantify business practices.

    This has also been the goal of recording various statistics.

    "So, what is the difference between Big Data and what we have traditionally called 'analytics'? "

    It's being done on a larger scale. Other than that, nothing.

    "Volume of data"
    "Velocity of data"
    "Variety of data"

    Basically different ways of saying "we're doing it on a larger scale."

    Gathering statistics and number crunching is nothing new. Been doing that for a lot longer than computers have been around. It just so happens that computers can do it on a larger scale and on vastly larger data sets.

    Which frankly doesn't impress me all that much. And honestly, it's inventing a new jargon for the sake of marketing. "Large scale statistical analysis" doesn't impress much, but "big data" is new, and therefore sounds hip and exciting and different.
    CobraA1
    • Though

      Agree to disagree.

      It is a larger scale, but the type of data that we are looking at is different, as is the speed at which it is produced and required. How are you using data?

      Thanks for the comment.
      gery.menegaz
      • The type of data is diffferent

        So you have some new data types - I don't see why you need anything new for this.

        If you want to perform inference on the data you have to transform it into a representation that is amenable to that. As the only data representation that does this is one based on logic then the only possible answer is relational.
        jorwell
        • Connections...

          I understand that I am not representing this in a way that makes sense to you. So, again, if you would like, I can put you in touch with guys that are doing this who may be able to package it in a way that is more palatable for you.

          Thanks, again, for your comment.
          gery.menegaz
          • Well IBM have a product

            that makes a lot of sense to me.

            It's called DB2.

            Otherwise it seems that IBM have gone off course a lot in the data management field since Edgar Codd's days with the company.

            Shame about that.
            jorwell
  • Re:

    Gery, very insightful article. I think it’s worth mentioning HPCC Systems to tackle the three Vs of big data, as well as Value. Designed by data scientists, HPCC Systems is an open source data-intensive supercomputing platform to process and solve Big Data analytical problems. It is a mature platform and provides for a data delivery engine together with a data transformation and linking system. The main advantages over other alternatives are the real-time delivery of data queries and the extremely powerful ECL language programming model. More info at http://hpccsystems.com
    H-M
    • HPCC

      Thanks for your comment. I will make sure to check out your product.
      gery.menegaz