Big data: What to trust – data science or the boss's sixth sense?

While the technology to run big-data projects may be opening up to more firms, progress has been hampered by a lack of skills and a corporate preference for gut instinct.
Written by Toby Wolpe, Contributor

Even if businesses had the developer skills to exploit big data — and most haven't — they invariably prefer to make decisions using untested opinion rather than data science.

And where firms are using data, many are choosing it selectively to back up currently held views, which is just opinion-based decision-making in another guise, according to Guy Cuthbert, managing director at visual analytics firm Atheon Analytics.

Data science involves producing and testing hypotheses, an approach shunned by most of the businesses he encounters in the retail goods sector, Cuthbert told a recent Actian big-data round-table event in London.

"I could reel off a horrifying list of stories of retailers that believe that customers behave in a particular way because that's what they were told when they joined the business," he said.

"They've never really questioned whether that's what happens in that category, or in that particular part of the country, or with that dress size. There are hundreds of examples of people doing what they were told, rather than thinking afresh about what else is going on.

"A lot of our work is trying to move organisations from this opinion-operated world into a data-driven world and start to use facts, hypotheses — science — as a method."

Of the companies Cuthbert has tried to help understand how their products perform, very few could be called analytical. By his reckoning, probably only the top one percent or even the top 0.1 percent of the world's businesses are truly data-driven.

"I see a huge number of opinion-operated businesses that don't get why decisions could be made on data. I've listened to executives spout all sorts of opinions with no fabric or no substance behind them at all," Cuthbert said.

"So if data animators and data scientists can do anything, it's to try and teach the rest of our peers in businesses that there are a fascinating number of facts located in their organisation if they just choose to look at them."

However, getting businesses to debunk corporate myths and accept factual, data-based conclusions is not easy.

"We have some really hostile presentations where we are showing people stuff that they'll flatly dispute and tell us we're completely wrong," Cuthbert said.

Another problem is companies adopting too narrow a focus, even if they are trying to use data scientifically.

"Most businesses we work with focus on known knowns. They look at, 'We expect to operate with an increase in revenues of six percent in the next year, let's make sure we make six percent'," Cuthbert said.

"They don't go looking for the 30 percent or the 120 percent opportunity. A lot of our work is surfacing that kind of thing and showing patterns that they simply don't know."

Unfortunately, despite advances that make it easier to process billions of rows of data, analysis is still an area where human skills remain essential.

"The gap with machine learning and all the rest of the computer sciences at the moment is that as yet there is no machine inspiration," Cuthbert said.

"The inspiration comes from humans understanding how to interpret signals in the data."

Steve Shine, CEO of big-data analytics company Actian, formerly Ingres, said that until relatively recently, the development skills required for big data have made such projects the preserve of a certain set of customers with big budgets.

"If you've been anywhere close to a Hadoop project over the past three or four years, you'll realise that it's a fairly rarefied set of skills that can write an efficient MapReduce program to get anything efficient out of Hadoop," Shine said.

"It was passionately protected by the community for a long period of time. That has changed radically in the past 12 months. There's a broad acceptance that things need to get much easier in terms of how you use these new technologies.

"We took people back to the 1980s in terms of how productive you were in being able to generate code to get at all that data and discover insights."

An issue now is the proliferation of big-data technologies, with various versions of Hadoop, NoSQL, and new ways of preparing and integrating the data.

"No CIO gets rewarded for gluing all that together. The business doesn't care how well you glued all that together or how quickly. It says, 'How fast can you help me get to my customer churn data'," he said.

However, the technology now does permit businesses to discover unexpected commercial potential in the data it routinely produces.

Shine cited a customer in the benefits and payroll business, which has understood that the data it processes on salary changes and leavers and joiners is information that probably provides a more accurate view of macro economics than their own government possesses.

"There are organisations that traditionally look like they have one business that are realising that if they are data centric and can take that data — possibly combining it with other data that's out there — they can deliver insights that are fundamentally more intrinsically valuable than what they have today," he said.

More on big data

Editorial standards