"What if the data tells you something you don't like?" Three potential big data pitfalls

Big data is on the verge of hitting the big time, but there are traps that lay in wait for the unwary.
Written by Steve Ranger, Global News Director

Big data is likely to quickly become big business. The ability to isolate the nuggets of insight inside the huge volumes of structured and unstructured data hoarded by most businesses could improve customer service, make processes more efficient and cut costs.

According to analysts Gartner, adoption of big data is still at a very early stage: just eight percent of companies have initiatives up and running, 20 percent are piloting and experimenting, 18 percent are 'developing a strategy', 19 percent are 'knowledge gathering', while the remainder have no plans or don't know. But that could change rapidly: the analyst firm is predicting 4.4 million people will be working on such projects within two years, while a recent survey by Harvey Nash found that four out of ten CIOs planned to increase their investments in the next year.

However, because big data uses untested technologies and skills that are in short supply inside most organisations, there are number of hurdles for organisations seeking to exploit it:

1. Letting politics derail your big data project before it gets moving

'You're a data what?!' Five jobs that will win out in the big data world

Getting a big data initiative up and running might be one of the hardest parts of the project because the tech team and the rest of the business may have different ideas about what the goals should be, warn tech chiefs consulted by ZDNet: a big data project run solely by IT may fail because it's unconnected to the needs of the business, for example, while a badly articulated request from the marketing department may leave IT confused about what to deliver.

As Rohit Killam, CTO at Masan Group points out: "The real bottleneck is conceptualising a value-driven big data programme with [the] right stakeholders," while Duncan James, infrastructure manager at Clarion Solicitors notes: "Understanding what the business requires is the hardest part, especially if the business can't articulate what it wants in the first place."

In many organisations, whenever you want to do any project there has to be a business case before there can be any budget, says Frank Buytendijk, research vice president at Gartner.

"That is how organisations work and think, which is great for anything established — but for anything innovative that is really hard because the whole point of playing around with the technology is trying to figure out what it does for you. This is not unique to big data, but big data suffers from it as well."

According to Buytendijk, big data projects don't have to cost a lot, thanks to the availability of open-source tools. As a result, these projects can be used as a low-risk way to explore an organisation's big data strategy. "The business case should not be the starting point; the business case should be the outcome, and it's realising this that creates the right conversation within businesses," he told ZDNet.

2. The big data skills crisis

According to the Harvey Nash CIO Survey carried out earlier this year, one in four CIOs reported difficulty in finding staff for big data projects. This is compounded by the complex array of skills needed for these projects, which are often outside of the standard skillset offered by the in-house tech team, according to tech chiefs canvassed by ZDNet.

"A shortage of big data skills doesn't hold back big data projects, but it does have implications for the success factors and execution of the projects. There is certainly growth in demand for this area of skillset," says Clarion's Duncan James. Brian Wells, associate VP health technology and academic computing at Penn Medicine, adds that this is an issue in areas related to interpreting results and developing analytical hypotheses.

"Skills has been an issue from the beginning, and this will remain an issue for the foreseeable future," says Gartner's Buytendijk. "How do you find people who have a background in econometrics, statistics and mathematics, and who know how to programme in modern environments and have business sense, because big data analytics is all about interpreting context, why something is happening in a certain context. This skillset is really, really hard to find."

One problem is that big data requires inductive rather than deductive thinking, whereas most IT organisations are good at deductive thinking: inductive thinking — using data to create likely connections — is a little outside their usual way of working.

Another problem is that big data technologies are very programming-intensive: while the typical ratio between software and implementation on a project is one to five, in big data that's leapt to 1 to 25 as these tools are not very user friendly and they don't integrate with other tools, and won't for a number of years.

Not all tech chiefs agree on this, though: "I think the complexity of big data is way overrated," maintains John Gracyalny, VP IT at SafeAmerica Credit Union. "We just kicked off a project to build a data warehouse/analytics tool internally. We only have a four-person IT department. I'm providing the 'vision thing' and database design, my newest guy is writing the code to handle external data extracts and imports, and my right hand will integrate an off-the-shelf reporting tool."

3. The looming governance headache

When organisations start dredging through their digital detritus, they risk discovering information they might wish had remained buried. Consequently, they need to have some governance in place before they start delving into the huge piles of customer transactions and other data they've been storing.

For example, last year a New York Times story revealed how a retailer could use shopping patterns to spot when a customer was pregnant and offer them money-off vouchers — and how to do it without making them feel they were being watched. Thus organisations must beware of using their own data and other third-party data that together may lead them to discover information about customers that customers might not wish to have known.

As Gartner's Buytendijk puts it: "If you start to work inductively, you let the data talk: what if the data tells you something you don't like?".

"Big data answers questions that weren't even asked, and that can be quite embarrassing — so how do you create a governance situation with a sandbox with big walls where you contain things you don't want the organisation to know?".

According to Buytendijk, organisations need some kind of governance that shields them from over-using (and oversharing) the fruits of big data: "In lots of countries there have been reputational issues around big data being too clever for its own good. With great power comes great responsibility," he warns.

Further reading

Editorial standards