Reality check: Big Data BS

Reality check: Big Data BS

Summary: Big Data is the latest fashionista buzz expression hitting enterprise buyers. But is it new and does it matter? I sense we're focusing on the wrong things.

TOPICS: Big Data

When I first heard the expression Big Data my immediate reaction was similar to when I heard about Enterprise 2.0. Here we go, another buzz phrase to get the marketers, anal-ysts and other hangers on to go nuts over. Only this time, rather than taking years to die, I believe the 'big data' crapolathon will be squished in the next year. Here's why. 

In a post teasingly titled The Big Data Laggards, Vinnie Mirchandani catalogs a number of 'big data' use cases he has reviewed. Poignant to this discussion and putting to one side the snark aimed at the usual suspects (IBM, SAP, Deloitte, Infosys etc) is the observation that:

The long and short is these projects have been on-going for years, if not decades, and are spread across industries and processes. They may not call their projects “Big Data” but their results are surely impressive.

In other words this is nothing new. At least conceptually. 

As I think about this topic I cannot help but go back to the Harrahs case study from ten plus years ago where CEO Gary Loveman applied his understanding of customer loyalty, developed during his tenure at MIT Sloane to the rejuvenation of a tired gaming hall and turned it into a gaming powerhouse. He understood the value of blended real time data coming from multiple sources. More important, he understood that data has to be actionable, business models have to be refined and processes need recasting to reflect the new information and what it is revealing. In Vinnie's example of Union Pacific we see a company that is no longer just a transportation business but a software development house. 

If there is anything new here it is merely in the exploding growth of data from many new sources. The industry vendors like to position this as a scary monster that needs harnessing.  Our own Zack Whittaker sucked up the Oracle warning that: 

...that the world was "drowning" in vast amounts of data

No its not. That's a less than subtle pitch for Oracle X-boxes. Vinnie and others have no problems coming up with stories about businesses and organisations turning data into breakthrough value. I see a very different problem. 

Take one of the examples that Tom Raftery quotes in a related post titled Sustainability, social media and big data:

Another fascinating use case I came across is using social media as an early predictor of faults in automobiles. A social media monitoring tool developed by Virginia Tech’s Pamplin College of Business can provide car makers with an efficient way to discover and classify vehicle defects. Again, although at early stages of development yet, it shows promising results, and anything which can improve the safety of automobiles can have a very large impact (no pun!).

Now cross reference to Vinnie's stories. What do you see?

  • Each use case is describing a unique problem that is industry specific
  • Each case study talks about in-house development or partnering with relatively unknown developer or academic organisations. (There are exceptions, but they are just that, exceptions.)
  • Most cases involve a level of predictive analysis, not looking through the rear view mirror. 

In other words, what we are most likely to see from 'big data' examples in the short term are not solutions that are readily productised except in limited industry specific terms, but unique differentiators. These can only be developed by those with detailed and industry specific knowledge. 

I have a couple examples of my own. One I really like is the energy utility example (video) coming from Basis Technology that predicts when customers have the propensity to switch. It has been tweaked to predict when customers will likely need service. What does this mean for the big SIs. Vinnie pre-empts that question:

Against this backdrop, I see so much marketing of the concept from IBM, SAP, Deloitte, Infosys and others. You would think the pioneers above have been using their tools and their talent for years now. Have they? I hope prospects look at a broader list of success stories, not just the references the vendors show them.

Even more, I hope customers do not accept the excuse “the concept is so new, nobody has references” or the proposition that customers should sign up for their MDM, change management and a variety of other products/services they push as “essential” for Big Data success. 

Vinnie and I have had a more robust discussion on this topic in back channels. I think there is a genuine business case for bringing change management experts (and I don't mean form fillers of the BPR variety) into the equation. It is where I think their puck is going.

From what I have seen, the real problem is not drowning in data as Oracle (and others) would like you to believe but drowning in possibilities. During a recent conversation with Vishal Sikka, exec board member SAP on what might happen as a result of being able to cost effectively mashup data from many sources and then expose it in real time, he correctly pointed out that in finding solutions, 'We will only be limited by our imaginations.'

I go further. I believe we will need people who can work out how to take the many potential answers to innumerable questions and turn that into strategies that drive exponential growth. The software companies cannot do it and I doubt the current crop of tenured SIs would know how. It will absolutely demand process changes on an unprecedented scale. Curiously though, I think this will be a lot easier than it might seem. Why? When people can see the answers to long held or vexing questions, the process changes they imply are often self evident.  

Couple this with the need to build hundreds if not thousands of applications for every industry and you start to see the emergence of a completely different landscape. It is one that initially looks horribly confusing to the well ordered ERP mind-set but which is essential if 'big data' is to deliver on its promise. 

That is why I am more than happy to see the power house vendors push the 'big data' message. But only as long as they recognise they are delivering weapons of mass creativity and not the specific solutions themselves. They don't have the depth or breadth of industry understanding. That's why I am equally happy to be associated (and compensated) by the surfacing of startups in the SAP HANA space. these are companies you've never heard of - in one case a youngster in grade school - doing things that were unimaginable in the recent past. 

Bonus points: James Governor on the new kingmakers. Plus see video above where we talk about abundance and the future of development.  

Topic: Big Data

Dennis Howlett

About Dennis Howlett

Dennis Howlett is a 40 year veteran in enterprise IT, working with companies large and small across many industries. He endeavors to inform buyers in a no-nonsense manner and spares no vendor that comes under his microscope.

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.


Log in or register to join the discussion
  • thoughts

    "Here we go, another buzz phrase to get the marketers, anal-ysts and other hangers on to go nuts over. Only this time, rather than taking years to die . . ."

    In part due to your fellow ZDNet bloggers, who refuse to let them die . . .

    "The cloud" is basically another name for "the internet."

    "BYOD" is a phrase that came into being purely because some bloggers noted that people were using their own smart phones at work, which has been the case long before the phrase was invented.

    "Big Data" is something that my college called "data mining" well over 10 years ago. It's the exact same principle: Take huge amounts of data, and crunch it looking for patterns.
    • yay and verily so...

      ....only this time we can actually get things done.
      • How many different payroll programs do we really need...

        I agree with everything you wrote, but as far as how many APPLICATIONS are really needed to run enterprise and how much actaul data is requried to be glanced over before new action is either taken or not, I have a different outlook than you and the HANA approach.

        Let's say that software APP developers could be drawn from the ranks of industry in every industry and say IBM owned these developers. Like SAP they could sit there for the next year and design from the bottom up the next 10-20-30 and maybe 40 if really needed Enterprise Applications, that would serve all the needs of 99.5 % of all business enterprise outfits. That's where we are headed. You are consulting to that business.

        The .5 percent of special needs is Wall Street where all of this started, to get that edge in the High Volume Trading game. Now all those who got there start on Broad Street are making there way over to SAP, IBM, etc. etc. BIG DATA might be a bad name, but speed & performance have made there way back into computing, right up there with cost they are now a big part of the over all equation.

        Foundational software will have as much impact on the future of industry as Applicational software will have, paralleling processing, (use of multiple cores), in memory Compute, (optimazation of RAM) etc. etc. It is going to get fun.
  • big data

    I don't totally gel with the article. Sure, buzzwords are annoying - but I agree with dahowlett. Yes data mining has been around - well since there was digital data. What is different - is the shear volume of consumer/people data.

    And this data is set to exponentially increase - as sensors - start recording everything - like a sea of eye's and ears embedded in electronics/consumer goods/cars/public transport - everywhere - all cleverly recording and identifying.

    In a way - the buzz word - "big data" - is attempting to encapsulate - this on coming data tsunami.

    However, there are some realities. It is actually very expensive to store and analyse all this stuff. And that is the phase where lots of companies are at now - what's the business case?
    David Wright 122
    • Expensive?

      Check announcements coming this week...
    • Big Data - & Buz Words in general

      We do have an unfortunate habit of using buzz words, not an uncommon behavior in industries that deal with complex concepts and this is surely one! That said there are many good points in this thread, some of the ones that jumped out at me:

      1 - Big data is REALLY big - So true its also very diverse in type and format
      2 - CLOUD is really the Internet - I like that its very close to the mark

      A couple on the flip that I found a bit trite:

      1- Big data is just Data mining - Not in my opinion these are two quite different technologies
      2 - Big companies don't get Big Data - Oh yes they do they just have vested interests to protect

      All up this is shaping up as a big shift a lot of which is being fueled by the convergence of Internet computing and Enterprise computing both have plenty to contribute but as the article says "what will come out at the end will be a totally different technology landscape" and a good thing too in my humble opinion.
      Rob Whiter
      • much more mining only gave us so much - what I see coming down the pipe is an order of magnitude different.
      • Same principle, from what I can tell

        "1- Big data is just Data mining - Not in my opinion these are two quite different technologies"

        Other than scale, how?

        Sure, it's on an "order of magnitude" bigger, but that's not really in my book buzzword worthy, or really all that much different.

        It's the same thing, really: Take large amounts of data, and look for patterns. And to be honest, data mining was certainly not done on small scale - Terabytes to nearly Exabytes was not unusual for that kind of stuff.

        I suppose today we can do it faster on larger data sets (perhaps multiple Exabytes), and in some cases closer to real time, but it's still the same idea at work.
      • Not sure why my prev post did not go thru moderation. So, here it is again

        We can look at this via two dimensions: Old/New technology and Old/New terminology. With a bit of generality:
        "Predictive Analytics" is the new trending name instead of old "Data Mining"
        "Machine Learning" is the trending name to substitute "Artificial Intelligence"
        "Big Data" doesn't sound like new (even I can remember "1TB Club" rewards just few years ago) name, but in fact it is addressing new requirements coming from relatively recent explosion of machine- and crowd-generated data in variety of forms. "Map-Reduce" on the opposite sounds like something new, but in fact is an old concept, now popularized thanks to application of massive parallel processing. "Parallel Processing" became wider obtainable thanks to technology not available mere 15 years ago, when I studied the topic at the university.
        Agree, "Big Data" became a buzz-word, especially in the area of Analytics. Good news: I noticed the trend of demystifying had started already some 3 months ago. "Big Data" hasn't happened to be the cure for all business needs. But "Big Data" Hadoop-based technologies became important step forward for some businesses to sustain their operations on the reasonable cost. Yes, "sustain" and not that much of "innovate" yet.
        Big Data technologies and products mainly enable storage and processing of bigger data volumes, but all the known issues of moving data up in the pyramid of Data->Information->Knowledge remain the same and still require different tools and lots of human intelligence.
  • There will be no important change in technology

    RDBMSs are directly based on predicate logic and implement methods for ensuring the logical integrity of the data.

    The data in an RDBMS is represented as logical propositions which can be manipulated by standard logical or set operators to infer new facts from existing ones.

    In order to do any meaningful analysis on data you need to represent your data in a manner that is amenable to logical inference.

    If you want to perform logical inference on your data then you have to go for relational as relational is based directly on logic.

    The so-called "big data" technologies like Hadoop, MongoDB and Cassandra (etc, etc) are short term fads that will be out of fashion shortly.

    Every couple of years this "relational is dead" story comes around. How many companies are running mission critical systems on object-DBMSs? XML-DBMSs?. None that I know of.

    The future is relational. The so-called "big data" technologies are revivals of approaches that have long ago been shown to be flawed in theory and error-prone and inflexible in practice (hierarchies, graphs, key-value pairs)
    • sigh

      "If you want to perform logical inference on your data then you have to go for relational as relational is based directly on logic."

      Set logic, that is. Turing machines and such are also based directly on logic (lambda calculus, propositional logic, etc).

      So that's like saying that astronomy is science and implying that chemistry isn't.

      You're like an astronomer who complains about chemistry because the astronomer understands general relativity but doesn't understand electron shells.

      "are revivals of approaches that have long ago been shown to be flawed in theory and error-prone and inflexible in practice (hierarchies, graphs, key-value pairs)"

      I'd love to see the proof texts.

      "How many companies are running mission critical systems on object-DBMSs? XML-DBMSs?."

      'How many astronomers are using S-shells and P-shells? None that I know of.'

      Maybe if you stepped outside of backend development every once in a while . . .

      Last I checked, app and application development was a thing, too. Not every piece of software is a pure database.
      • Every application is about logic

        You have a choice, do you do your logic in logic or in machine level constructs like sequence and iteration?

        It's highly misleading to think of an RDBMS as just a database server. It's actually a business rules engine with direct support for standard logical constructs like quantification.

        I use old-fashioned procedural languages like C# and Javascript* for user interface work (depending on whether the front end is Windows or Web) but I would never used them anymore for business logic - they simply lack the expressive power for that.

        User interfaces are about logic too, though some people create some very inconsistent user interfaces by insisting that the UI is something touchy-feely and intuitive.

        I mention object-DBMSs and XML-DBMSs because various people have proposed them as alternatives to RDBMSs. There are obvious reasons why they are wholly inadequate to the task. Object-DBMSs are Codasyl revisited and XML-DBMSs are IMS revisited - and therefore of archeological interest, but little else.

        * Yes, yes I know Javascript is a functional programming language really, but I still seem to have to write loops in it.
        • sigh . . .

          "they simply lack the expressive power for that. "

          Might want to check out the Church-Turing thesis. Anything that can be expressed in a relational language can be expressed in any Turing complete language, in a different way.

          If that weren't the case, in fact, then databases wouldn't run on existing hardware, as the silicon is certainly not using a relational language.

          So keep in mind that most database software is in fact written in a lower level language like C++, and everything is eventually compiled all the way down to machine code.

          You can never get rid of the lower level stuff. It's required for the higher level stuff to function.

          "Yes, yes I know Javascript is a functional programming language really, but I still seem to have to write loops in it."

          Nothing wrong with a loop. The machine simply converts it to a compare and a jump at the machine code level - which can be very fast.

          However, a join can be slow, especially if you're not careful about how you're joining stuff.

          "do you do your logic in logic or in machine level constructs like sequence and iteration?"

          You're missing the point here. There is not a singular "logic." It simply doesn't exist. There are set theory, propositional logic, predicate logic, mathematics, and all sorts of others.

          At the lowest level, transistors are arranged to form "AND," "OR," "NOT," "XOR," logic gates. And somebody who works with them would likely claim there's more logic in them than your relational stuff. And it is in fact a type of logic.

          You are mostly talking about relational stuff - which has a place, if the boss wants to know about all people who bought a red truck. It's great for CEO level stuff.

          But it's really shallow minded to think that set theory and relational logic is the only type of logic that exists.
          • I don't want to know about the low level stuff anymore

            "You can never get rid of the lower level stuff. It's required for the higher level stuff to function."

            But if you want to program efficiently and accurately then the less you know about the lower level stuff the better.

            Predicate logic is the formalization of natural language so you can perform logical reasoning on it.

            In a business people make informal statements about business rules and it is the job of IT to formalize these rules so they can be automated.

            An RDBMS is ideally suited to this job because it is directly based on predicate logic. The table and view definitions are predicates and the rows are propositions.

            RDBMSs go far, far beyond writing reports for managers in terms of what they can represent and express logically. Current implementations have barely scratched the surface.
          • A small correction

            if you want to program efficiently and accurately then the less you NEED know about the lower level stuff the better.

            Of course knowing about the low-level stuff won't necessarily hurt you, though it might lead you to use some highly inappropriate approaches in some situations.
    • Agree...

      The future is Relational DB's run in RAM, which will take advantage of new hardware, (Massive amounts of RAM and mutiple core CPU's), with software running a new parallel processing platform, designed from the ground up, to run closely coupled, first on the HD, now in RAM, because RAM grew too us. RAM grew so quickly, that the SSD may become unnecessary. Why pay extra for secondary storage, the higher capacity slower HD's will do just fine.

      The platform is here, and will be brought out in short order. Strangely, DARPA just this last Sunday promoted the need to rebuild from the ground up, we thought the same thing 15 years ago, but on the software side.
  • The good and the alternate view

    Good article with a lot of valid and interesting points and observations.

    But misses the key point: For the first time in IT's history one can store all the data you want (without going bankrupt) AND analyze that data afterwards on questions you DID NOT know before. This is what Big Data is all about. (Yes - governments used to do and do that - but I am talking about a start-up with a single digit million investment being able to pull that of.) And there is data explosion - the famous Las Vegas loyalty model would be negligent, even obsolete to run today. With the addition of social, mobile and 100s of relevant start-ups with key data for loyalty I estimate the basic data to be 5-1000 times larger than back 10 years ago.

    With that we come to Dennis' great point: It won't be the previous vendors like the ERP vendors or the SIs to make the difference here. The ERP vendors don't understand businesses well enough in detail as they try to provide generalized functionality and the SI vendors don't understand enough customers at the same time to create replication / re-use.

    The holy grail in the Big Data game (and Analytics) will be, who can provide affordable and easy to use tools that can run on Bigda Data and make a positive difference in the decision making of the business user using them.

    Look forward to comments!
    • .errr.

      ....thanks for clarifying what I thought I was saying only in longer form
    • Foundational Software.....

      Those tools are foundational software revamps.

      Application new-creates or re-creates really or subjective to one's shop, or maybe not. How those applications perform in real time does matter. May matter most. Application design may be considered someone's guess or desire, it may work or maybe not so well. And Tools and APPS will as hyped up by the marketing departments as the term "BIG DATA" was.
  • The Problem With Big Data...

    Is more always better? Or is quality over quantity? Here lies the problem with so called "Big Data".

    Big Data today refers to the massive amount of "data" being produced by Social Networks. The issue is the majority of that data is crap, being produced in many cases by pre-puberty teenagers. There is this "herd" mentatility when it comes to anything new. We're afraid of being left behind or looking stupid so we always jump on the "next big thing" for fear of being left behind or looking like we are not hip.

    I'm not saying there isn't good information in social networks. The problem is with storage being so cheap today, we keep everything. Including the non-important dribble of conversations that have no real value.

    Big Data is a chance for people to sound smart and become "experts" in yet another useless tool. It is the chance for DB companies to sell us a new version and BI companies to sell us the holy grail of customer understanding. Sounds good on the slides until the sales rep leaves the building.

    Yes there are pockets of success we can point to. But they are very specialized as the author states. The only ones making money today in Big Data are the ones selling their wares.