How is big data faring in the enterprise?

How is big data faring in the enterprise?

Summary: With the hype sometimes seeming to reach a fever pitch, we take a look at how enterprises are really using big data along with the overall maturity of the new industry itself.


It's certainly one of the hottest new buzzwords in technology, yet the meaning of big data typically depends on whom you ask. Yet it's also clear that big data, an important reformulation of how we store and process our digital information, continues to make a big splash as a major IT trend of this half-decade. Certainly the market estimates are optimistic, with Deloitte recently pegging the size of the market at between $1.3-$1.5 billion this year, while IDC forecasts the industry will be whopping $16.9 billion by 2015.

But these large numbers tend to obscure the fundamental changes that currently seem to be taking place under the rubric of big data.

The first of these is the data-first ethos that's embodied by trying to tap into and process ground truth (by seeking out the best raw data) and then deriving insight from what is uncovered (domain-specific business intelligence), rather that trying to find data to support one's already-completed strategic decision making.

One of the better known examples of data-first thinking is the famous "Moneyball" story, as told in the 2003 book by Michael Lewis, relating the story of how the Oakland A's bucked tradition and switched to heavy data analysis to identify their highest performers, with considerable success. Though only one data point, this story -- and a growing list of others -- are leading many believe that data-first thinking may be the solution to many long-standing problems to help combat everything from crime and disease to pollution and poverty.  It's also perhaps the key to resolving somewhat more mundane challenges in our businesses as well.

The second major change is the shift away from the relational data model as the definitive standard for how to process information for the first time in over a generation. To be sure, the growing adoption in customer-facing technology of emerging platforms such as Hadoop and NoSQL-style databases, is still most prevalent in Web startups and consumer services. Yet the peta and exa-bytes of today's data volumes in many business contexts practically demands technologies that scale well in the face of unrelenting datasets and shrinking time scales that are growing exponentially.

For a variety of reasons too long to enumerate here, the relational model has at long last encountering both a serious challenge to its hegemony as well as real challengers that can frequently do better at handling todays data volumes and types. And though many organizations will continue to use relational technology to create some of their big data solutions, it's no longer the only option, particularly as the growth in unstructured data is now much faster than classical structured data.

Enterprise Use of Big Data: Hadoop, NoSQL, IaaS, Data Scientists, Core Domains, Analytics, and more

The third change is the move towards making big data a more operational component of the way organizations work and how externally-facing products function. While data scientists are often required to get the best outcomes, the results of their work are often applications or data appliances that are usable by just about anyone. Just like Google enabled the layperson to query the entire contents of the Web with a few keywords, the next generation of enterprise big data seems to be about connecting workers with the data landscape of their organizations in a way that doesn't typically require IT wizards in white robes. Thus business solutions based on big data technology must be a readily approachable end-user technology for the average line worker in order to have a sustained and meaningful business impact.

Related: Bursting the Big Data Bubble

The data points on enterprise adoption of big data

Let's take a look at what organizations are actually reporting when it comes to big data implementation and usage today. Looking at a broad cross section of companies both large and small, the O'Reilly Strata Conference survey published a useful breakdown this year of what its attendees were doing with the technology:

  • 18% already had a big data solution
  • 28% had no plans at this time
  • 22% planned to have a big data solution in six months
  • 17% planned to have a big data solution in 12 months
  • 15% planned to have a big data solution in two years

Admittedly, attendees of this particular conference were more likely than average to be adopters of big data, so these numbers are a little optimistic, even given that big data is a big tent for a great many technologies that handle large data volumes and analytics.

However, the story becomes even more interesting when we look at specific sectors. For example, the insurance industry recently reported that 15-20% of insurers are actively preparing for big data solution. Government, one of the larger potential beneficiaries of big data according to the seminal McKinsey report on the subject, is itself experiencing relatively slow adoption, with a recent survey of public sector CIOs and IT managers reporting it will take three years to start processing their data this way. If we look at function, instead of industry, we can see that sales processes are likely poised to be revolutionized by big data. A recent analysis by CSO Insights reveals that 71% of companies expect big data to have a significant impact on sales, despite only 16% currently doing so, a gap that many organizations will clearly want to close.

Related: The enterprise opportunity of Big Data: Closing the "clue gap"

However activating on the large set of changes that big data entails will clearly happen incrementally, yet broadly, in most companies. There's technology, process, infrastructure, and management that all has to be put into place, plus the hiring of data scientists that understand your business (or learn to), as well as such still-esoteric concepts such as DevOps, which will marry the operational aspects with the development aspects of big data to quickly solve business problems by applying data-first analysis combined with just-in-time R&D and deployment.

In addition, companies will also have to deliver on a big data "stack" in the enterprise. This stack will invariably consist of the following components, designed out of a conglomerate of open source software, commercial applications, on-premises and cloud infrastructure, combined with data from just about everywhere. The visual above also depicts this notional big data stack and you can see another, more technology and product-specific view I pulled together last year.

Breaking down enterprise big data

  • Technology. In general, these seem to be breaking down into three major families, two of which are new and one of which is legacy. There are Hadoop and its variants, the NoSQL family, and relational databases which have added big data features.
  • Infrastructure & Development. This includes Infrastructure-as-a-Service (IaaS), Software-as-a-Service (SaaS), Data-as-a-Service (DaaS), Open APIs, DevOps, and data scientists, the latter which craft solutions from an array of internal and external components from this palette.
  • Big Data Applications. This list of popular application models for big data includes business intelligence, social analytics, decision support, visualization and modeling, behavioral prediction, and business process optimization (BPO), but there are many others..
  • Domain-Specific Solutions. Once the big data tech, infrastructure, and app are in place, businesses must focus their efforts on extracting industry-specific value for them. Top industries and/or functions for big data (ones most likely to benefit) include marketing, R&D, scientific/technical/engineering/mathematics (STEM), health care, financial services, retail, and insurance.
  • Big Data-Powered Business Processes. To be useful, big data solutions must then be incorporated into an organization's business processes including operations, line of business, and support functions. In particular, the high-value and common business processes will provide the largest ROI.

To summarize all this, it's still early days yet for this era's growing data deluge. In fact, one of the best quotes of the year about big data is from Ben Werther, who recently observed that we're still "in the pre-industrial age of big data." Most organizations aren't yet doing it at scale, but the writing is on the wall that significant competitive advantage can be had for those that want it. As I predicted earlier this year, social analytics will be one particularly bright spot in big data this year, and organizations already have a good array of tools and vendors to pick from.

Ultimately, the biggest challenge will be in integrating big data effectively into updated and revised business processes. Thus again, change itself will be the large overall obstacle as technology out-paces the ability of most organizations to absorb it. This will likely push big data into the cloud for most organizations look for strategies to speed adoption, further hastening cloud-related migration of so much of IT. This may not be a bad thing.

Topics: Big Data, Emerging Tech, Software

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.


Log in or register to join the discussion
  • Pretty good but there's four major Technology families ( not 3 )

    The 4th family....Ever hear of semantics and RDF ? The Google Knowledge Graph uses this and BBC is using Dynamic Semantic Publishing to Empower BBC Sports Site & 2012 Olympics.

    • Thanks, and there are no doubt more database families

      I listed the major big data tech families that I think most organizations will likely be considering. Semantic Web, RDF, as well as Linked Data are very important, but still not likely to break through in the near term. I do think REST and the Web of data are already happening but you could say most NoSQL databases are already document-oriented with URI support.
  • The future is relational

    Hadoop and (most of) the other NoSQL DBMSs are a throwback to cumbersome, programming intensive, inflexible pre-relational methods.

    Of course a DBMS can be NoSQL and still be relational (in fact more relational than SQL). SQL != Relational!

    The big data thing is a fad and will soon be replaced by some other fad.

    The relational DBMS will still be around in 20 years time.

    1996 - Relational dead - Object DBMS
    2002 - Relational dead - XML
    2012 - Relational dead - key value pairs, a technology last seen on 1970s vintage mini-computers that didn't have enough memory to run ISAM. This is a joke, right?

    If you want to make yourself look utterly ridiculous then I suggest becoming a big data advocate is good place to start.
    • It depends

      Declarative programming models and other techniques will no doubt make Hadoop more efficient. And don't forget Hive, which lets those with SQL skills query Hadoop datastores. I don't think Hadoop is going anywhere, but I would like to see it get easier and I feel that's likely to happen.

      But I agree, relational isn't going away, probably ever. Instead, we are getting more choice and this gives us the right tool to do the job. This is why I listed relational as one of the core big data technology families.
      • Why do we need more choice?

        The relational model does everything with 8 operators.

        The more "choice" we have the greater the complexity and the greater the possibility of confusion and error.

        Show me a system that can do everything with less than 8 operators then I will be impressed.

        The important thing about relational is that it is based on sound, proven mathematical principles that greatly simplify the tasks of data representation and manipulation.

        The "big data" techniques are not new, they are a throwback to approaches already shown to be flawed in theory and cumbersome, intellectually unscaleable and inflexible in practice.
        • I can reduce it to one.

          "Show me a system that can do everything with less than 8 operators then I will be impressed."

          Actually, you can *theoretically* do everything with a single operator, using a one instruction set computer.

          From the wikipedia:

          "Common choices for the single instruction are:
          Subtract and branch if less than or equal to zero
          Subtract and branch if negative
          Reverse subtract and skip if borrow
          Move (used as part of a transport triggered architecture)"

          "The important thing about relational is that it is based on sound, proven mathematical principles that greatly simplify the tasks of data representation and manipulation. "

          So is a Turing machine. So is pretty much everything in computing, in fact. Without sound, proven mathematical principles, we would not be using computers at all, regardless of whether you're using a relational model or not.
          • I agree to some extent

            But I am talking about working purely at the level of predicate logic and set theory, not at the machine level.

            Actually you don't need all 8 operators, join for example can be defined by product and restriction. In fact many SQL implementations didn't have a join operator at all until relatively recently.

            The important thing about the relational model is that it represents data in exactly one way, so you only need a small set of operators. The more ways of representing data you have the more operators you need. If your data representation method lacks a sound theoretical background then you end up needing a huge number of operators.
  • These are some of the most significant elements of big data.

    While Big Data has become a buzz word, the actual utilization of Big Data at industrial strength in large enterprises is just getting realized and discussed. I very much agree with Dion’s points and would add highly distributed and diverse to the list, and add ‘integration’ to standard data sources (SQL and NoSQL) as a consideration. For example, if I have massive data bases in Teradata, how do I integrate with Big Data? How does that change my analytics landscape? Let’s start defining data science and big data. Delighted that Dion is moving the dialog forward...let’s collaborate and enhance our thinking re big data.