"What if the data tells you something you don't like?" Three potential big data pitfalls

"What if the data tells you something you don't like?" Three potential big data pitfalls

Summary: Big data is on the verge of hitting the big time, but there are traps that lay in wait for the unwary.


Big data is likely to quickly become big business. The ability to isolate the nuggets of insight inside the huge volumes of structured and unstructured data hoarded by most businesses could improve customer service, make processes more efficient and cut costs.

According to analysts Gartner, adoption of big data is still at a very early stage: just eight percent of companies have initiatives up and running, 20 percent are piloting and experimenting, 18 percent are 'developing a strategy', 19 percent are 'knowledge gathering', while the remainder have no plans or don't know. But that could change rapidly: the analyst firm is predicting 4.4 million people will be working on such projects within two years, while a recent survey by Harvey Nash found that four out of ten CIOs planned to increase their investments in the next year.

However, because big data uses untested technologies and skills that are in short supply inside most organisations, there are number of hurdles for organisations seeking to exploit it:

1. Letting politics derail your big data project before it gets moving

'You're a data what?!' Five jobs that will win out in the big data world

'You're a data what?!' Five jobs that will win out in the big data world

'You're a data what?!' Five jobs that will win out in the big data world

Getting a big data initiative up and running might be one of the hardest parts of the project because the tech team and the rest of the business may have different ideas about what the goals should be, warn tech chiefs consulted by ZDNet: a big data project run solely by IT may fail because it's unconnected to the needs of the business, for example, while a badly articulated request from the marketing department may leave IT confused about what to deliver.

As Rohit Killam, CTO at Masan Group points out: "The real bottleneck is conceptualising a value-driven big data programme with [the] right stakeholders," while Duncan James, infrastructure manager at Clarion Solicitors notes: "Understanding what the business requires is the hardest part, especially if the business can't articulate what it wants in the first place."

In many organisations, whenever you want to do any project there has to be a business case before there can be any budget, says Frank Buytendijk, research vice president at Gartner.

"That is how organisations work and think, which is great for anything established — but for anything innovative that is really hard because the whole point of playing around with the technology is trying to figure out what it does for you. This is not unique to big data, but big data suffers from it as well."

According to Buytendijk, big data projects don't have to cost a lot, thanks to the availability of open-source tools. As a result, these projects can be used as a low-risk way to explore an organisation's big data strategy. "The business case should not be the starting point; the business case should be the outcome, and it's realising this that creates the right conversation within businesses," he told ZDNet.

2. The big data skills crisis

According to the Harvey Nash CIO Survey carried out earlier this year, one in four CIOs reported difficulty in finding staff for big data projects. This is compounded by the complex array of skills needed for these projects, which are often outside of the standard skillset offered by the in-house tech team, according to tech chiefs canvassed by ZDNet.

"A shortage of big data skills doesn't hold back big data projects, but it does have implications for the success factors and execution of the projects. There is certainly growth in demand for this area of skillset," says Clarion's Duncan James. Brian Wells, associate VP health technology and academic computing at Penn Medicine, adds that this is an issue in areas related to interpreting results and developing analytical hypotheses.

"Skills has been an issue from the beginning, and this will remain an issue for the foreseeable future," says Gartner's Buytendijk. "How do you find people who have a background in econometrics, statistics and mathematics, and who know how to programme in modern environments and have business sense, because big data analytics is all about interpreting context, why something is happening in a certain context. This skillset is really, really hard to find."

One problem is that big data requires inductive rather than deductive thinking, whereas most IT organisations are good at deductive thinking: inductive thinking — using data to create likely connections — is a little outside their usual way of working.

Another problem is that big data technologies are very programming-intensive: while the typical ratio between software and implementation on a project is one to five, in big data that's leapt to 1 to 25 as these tools are not very user friendly and they don't integrate with other tools, and won't for a number of years.

Not all tech chiefs agree on this, though: "I think the complexity of big data is way overrated," maintains John Gracyalny, VP IT at SafeAmerica Credit Union. "We just kicked off a project to build a data warehouse/analytics tool internally. We only have a four-person IT department. I'm providing the 'vision thing' and database design, my newest guy is writing the code to handle external data extracts and imports, and my right hand will integrate an off-the-shelf reporting tool."

3. The looming governance headache

When organisations start dredging through their digital detritus, they risk discovering information they might wish had remained buried. Consequently, they need to have some governance in place before they start delving into the huge piles of customer transactions and other data they've been storing.

For example, last year a New York Times story revealed how a retailer could use shopping patterns to spot when a customer was pregnant and offer them money-off vouchers — and how to do it without making them feel they were being watched. Thus organisations must beware of using their own data and other third-party data that together may lead them to discover information about customers that customers might not wish to have known.

As Gartner's Buytendijk puts it: "If you start to work inductively, you let the data talk: what if the data tells you something you don't like?".

"Big data answers questions that weren't even asked, and that can be quite embarrassing — so how do you create a governance situation with a sandbox with big walls where you contain things you don't want the organisation to know?".

According to Buytendijk, organisations need some kind of governance that shields them from over-using (and oversharing) the fruits of big data: "In lots of countries there have been reputational issues around big data being too clever for its own good. With great power comes great responsibility," he warns.

Further reading

Topics: Going Deep on Big Data, Big Data, Emerging Tech, Enterprise Software, Open Source

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.


Log in or register to join the discussion
  • "What if the data tells you something you don't like?"

    It's obviously wrong, change it. :-)
    • More like "change yourself."

      If you don't like it, but the source is accurate and looked at in proper context, then that should mean you don't like what's causing the data.
      Jacob VanWagoner
  • Keep in mind . . .

    "What if the data tells you something you don't like?"

    Keep in mind, it just tells you where things are, and it may not be perfect. A business is free to take note of the data, but do something different because of factors other than the data presented. Personally, I think the importance of "big data" is a bit overhyped.

    And it looks like ZDNet, the king of hype, is hyping up big data again. Several articles just came out. *yawn*, tell me when you have some actual news to talk about.
  • difference

    There's a difference between "things you don't like" and "things you don't want known". Finding things you don't like can lead to improvements in performance; finding things you don't want known otoh....
  • "What if the data tells you something you don't like"

    Let's ask economist John Maynard Keynes:

    "When the facts change, I change my mind. What do you do, sir?"
    Jacob VanWagoner
    • Except he didn't actually say that

      instead having said "When my information changes, I alter my conclusions. What do you do, sir?".
      This is of course as opposed to the simpletons of the Austrian School, like Hayek, who felt Economics was inscrutable and not subject to empirical confirmation. They might have replied, "Facts? They didn't inform my opinion in the first place."
  • Disappointing...

    I expected this article would address the question in the title - "what if the data tells you something you don't like?" The question is out-of-context with the article. The question addressed by #3 in the article is "what if the data tells you something you did not expect?" or possibly "what does it mean if the data answers a question you did not ask?". They are all valuable questions to answer, but I was looking for the answer to the first question.
  • Lie

    Not Lay. The phrase is "Lie in wait". The word "lay" doesn't mean that.
  • Things can go horribly, horribly wrong

    Suppose that your datamining system identifies when one of your customers gets pregnant and starts sending her relevant ads.

    She miscarries, and you then keep sending her more adverts for products relating to the supposedly impending birth for the next five months, as she gets more and more distraught, until she finally cracks and puts a bullet through the brain of your CEO, and the court finds her not guilty on the grounds of unreasonable provocation.

    Or .. she kills herself and her family announces on Twitter that they believe that her death was caused by your company hounding her about her failed pregnancy.

    Or ... she's trying to keep quiet about the pregnancy because it's the result of an assault, and she's still trying to decide what to do, but her significant other sees the adverts and thinks that she's been unfaithful and leaves her (or shoots her).

    Or ... her significant other sees the ads and thinks that she's trying to keep the pregnancy secret because she's planning on terminating his child, and other bad stuff ensues.

    Or ... a range of other possible nightmare scenarios with attendant nighmarish PR consequences. Medical records are kept confidential for very good reasons, and if datamining starts straying into that territory without being invited, then the company responsible will automatically be blamed for anything bad that results.

    Getting into grey areas regarding people's personal medical data is like driving drunk on an apparently empty road. It's high-risk even if you drive slowly and carefully, because if an accident happens that involves your presence, then even if the accident was mostly due to other factors, you're still going to be blamed for it because you shouldn't have been there.

    If your company misuses personal information, even if it's with the best intentions, and something goes wrong, then you will automatically get the blame for whatever disaster or tragedy results.
    Eric Baird