Big data: What to trust – data science or the boss's sixth sense?

Big data: What to trust – data science or the boss's sixth sense?

Summary: While the technology to run big-data projects may be opening up to more firms, progress has been hampered by a lack of skills and a corporate preference for gut instinct.


Even if businesses had the developer skills to exploit big data — and most haven't — they invariably prefer to make decisions using untested opinion rather than data science.

And where firms are using data, many are choosing it selectively to back up currently held views, which is just opinion-based decision-making in another guise, according to Guy Cuthbert, managing director at visual analytics firm Atheon Analytics.

Data science involves producing and testing hypotheses, an approach shunned by most of the businesses he encounters in the retail goods sector, Cuthbert told a recent Actian big-data round-table event in London.

"I could reel off a horrifying list of stories of retailers that believe that customers behave in a particular way because that's what they were told when they joined the business," he said.

"They've never really questioned whether that's what happens in that category, or in that particular part of the country, or with that dress size. There are hundreds of examples of people doing what they were told, rather than thinking afresh about what else is going on.

"A lot of our work is trying to move organisations from this opinion-operated world into a data-driven world and start to use facts, hypotheses — science — as a method."

Of the companies Cuthbert has tried to help understand how their products perform, very few could be called analytical. By his reckoning, probably only the top one percent or even the top 0.1 percent of the world's businesses are truly data-driven.

"I see a huge number of opinion-operated businesses that don't get why decisions could be made on data. I've listened to executives spout all sorts of opinions with no fabric or no substance behind them at all," Cuthbert said.

"So if data animators and data scientists can do anything, it's to try and teach the rest of our peers in businesses that there are a fascinating number of facts located in their organisation if they just choose to look at them."

However, getting businesses to debunk corporate myths and accept factual, data-based conclusions is not easy.

"We have some really hostile presentations where we are showing people stuff that they'll flatly dispute and tell us we're completely wrong," Cuthbert said.

Another problem is companies adopting too narrow a focus, even if they are trying to use data scientifically.

"Most businesses we work with focus on known knowns. They look at, 'We expect to operate with an increase in revenues of six percent in the next year, let's make sure we make six percent'," Cuthbert said.

"They don't go looking for the 30 percent or the 120 percent opportunity. A lot of our work is surfacing that kind of thing and showing patterns that they simply don't know."

Unfortunately, despite advances that make it easier to process billions of rows of data, analysis is still an area where human skills remain essential.

"The gap with machine learning and all the rest of the computer sciences at the moment is that as yet there is no machine inspiration," Cuthbert said.

"The inspiration comes from humans understanding how to interpret signals in the data."

Steve Shine, CEO of big-data analytics company Actian, formerly Ingres, said that until relatively recently, the development skills required for big data have made such projects the preserve of a certain set of customers with big budgets.

"If you've been anywhere close to a Hadoop project over the past three or four years, you'll realise that it's a fairly rarefied set of skills that can write an efficient MapReduce program to get anything efficient out of Hadoop," Shine said.

"It was passionately protected by the community for a long period of time. That has changed radically in the past 12 months. There's a broad acceptance that things need to get much easier in terms of how you use these new technologies.

"We took people back to the 1980s in terms of how productive you were in being able to generate code to get at all that data and discover insights."

An issue now is the proliferation of big-data technologies, with various versions of Hadoop, NoSQL, and new ways of preparing and integrating the data.

"No CIO gets rewarded for gluing all that together. The business doesn't care how well you glued all that together or how quickly. It says, 'How fast can you help me get to my customer churn data'," he said.

However, the technology now does permit businesses to discover unexpected commercial potential in the data it routinely produces.

Shine cited a customer in the benefits and payroll business, which has understood that the data it processes on salary changes and leavers and joiners is information that probably provides a more accurate view of macro economics than their own government possesses.

"There are organisations that traditionally look like they have one business that are realising that if they are data centric and can take that data — possibly combining it with other data that's out there — they can deliver insights that are fundamentally more intrinsically valuable than what they have today," he said.

More on big data

Topics: Big Data, CXO, Enterprise Software, Open Source, Business Intelligence

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.


Log in or register to join the discussion
  • There are two things data won't tell you.

    There are two things data won't tell you.

    1) Whether a decision is ethical or moral.

    Sure, the data would likely say that more people bought a product at gunpoint, under the threat of death if they didn't buy the product. But that does not mean that you should go around selling products by threatening the lives of people.

    It's a bit of an extreme example, but it does go to show that one should not be replacing one's own sense of morality and ethics with "data science." We've already got really bad issues with businesses blundering around morality and ethical issues.

    Indeed, even USING "big data" has ethical issues: Should "big data" be used on information that is considered to be PII? What are the privacy implications of using "big data?" Do customers even want businesses to be concerned with this type of hyper-optimization, or should the business be concerned with other things first?

    2) Cause and effect. I remember a logic teacher telling me a story about aliens observing rainy days and windshield wipers. The aliens observed that windshield wipers were on while it was raining. Therefore, the aliens concluded, the windshield wipers caused the rain.

    "Big data" could tell you that there's a correlation between two things. But as far as I know, it can't tell you *why* there's a correlation between two things. You still need to use your brain to figure that part out. The data is true and factual, but it still needs to be interpreted.
    • The why...

      Machine learning is closing the gap on the "why" and drawing correlation. Give it another year or 2.
      • I gave it a year or two 15 years ago.

        I gave it a year or two 15 years ago.
  • In the end...'s the boss' job to make a decision. Statistical modeling (assuming it's done right) can help him to make better decisions, but the responsibility is still his; remembering that like all theories, models are approximations of reality, not reality itself.
    John L. Ries
  • CEOs

    Often get to that position by being able to successfully navigate the internal politics of a company. They have to build alliances, keep competing factions at bay, placate stockholders and a dfc BoDs. Business decisions are not just about data. They require organizational support. It means restructuring organizational priorities. That means there will be winners and losers. Managing that fight is as important as data
  • How to deal with HiPPOs

    I've been working with big data and A/B testing for 14 years and have seen this phenomenon repeatedly. A few of us dubbed it “the HiPPO syndrome,” because organizations often make suboptimal decisions based on the Highest Paid Person’s Opinion. Of course, we refer to the suboptimal decision makers as HiPPOs. I wrote a popular blog post a while back on how data driven folks can deal with HiPPOs:
    • Agreed

      But in the end, the responsibility for making decisions and policies still rests with the boss. The computer is an advisor, not the one in charge.
      John L. Ries
    • That and...

      ...the vast majority of decisions anyone makes are suboptimal, as it's usually too much work and/or there is insufficient data to determine what the optimal decision is.
      John L. Ries
  • Another Big Data Problem

    A company may have massive amounts of well organized data they still need to ask the correct questions before crunching the data. Some of the questions are industry specific (obvious to anyone in that industry), others are relatively generic, and others will require intuition to ask. GIGO is a real issue, if the wrong questions are asked, the answers, while technically correct, are garbage.

    To me the issue is how data is available but are the correct questions being asked regardless of what data is available.

    Another flaw in this analysis is a tendency for to enforce excessive product conservatism. If you asked people 15 years ago would they even consider a smartphone or tablet in a marketing focus group I do not know they would answer as positively as the real market sales show they are. Sometimes it takes a well executed visionary product to excite people about a product.
  • Sure companies selling Big Data Solutions want to make sure you buy...

    Big Data Solutions. The trick would be getting the sales people to insure you a ROI for their solution. Very few will, because sometimes you can slice the data MANY different ways and it won't help the bottom line. Usually a decent idea to have some smart people asking smart questions and the see what the answer are instead of trying to expecting them to "surface" too much.