Getting past big data's 'known unknowns': Google marketing evangelist tells all

In a new and entertaining talk, analytics expert Avinash Kaushik says all the big data in the world doesn't mean a thing if decision makers can't act on it.
Written by Joe McKendrick, Contributing Writer

Avinash Kaushik, digital marketing evangelist for Google and author of books on web analytics, says all the big data in the world doesn't mean a thing if decision makers can't act on it.  And, more often than not, data is monopolized by analysts. Companies won't see payoff from big data until there is "data democracy" in organizations, meaning it is available and actionable by employees at all levels.

That's the gist of Kaushik's latest talk at Strata 2012, in which he declares that most companies' thinking about big data is either wrong or misguided. The opportunity lies in the ability to act on information gleaned from the data, not in its sheer volume, he says.

Kaushik has two favorite quotes that inspire him, he relates. The first is:

"Information is powerful. It is how we use it that will define us."

The problem is "if you listen to all the vendors, 99% of speakers at conferences, there’s a lot of emphasis on this first part, and very little on the second part," he says. "The models that we’re used to -- take data and internalize it -- has been been broken a very long time," he says.  "The model is we take a lot of data and we'll Hadoopify it and have a lot of fun with it. Then we hire these gods and princelings to go and take this data and convert it into reports and hopefully, pray that it's useful."

This approach no longer scales, he continues. Before, when organizations managed smaller quantities of data, it could neatly be tucked into data warehouses for analysis. "As the users multiply, we'll be running around trying to find people that don’t exist whose job it is to produce ever more data, and start hitting people with it every single day. Ironically, what happens is the company becomes more inefficient."

Kaushik advocates movement toward data democracy. "Your employees will figure out how to use data better, and get them making love to the data directly, so that they can make decisions and improve their lives every data." He adds that "actionable and useful decisions need to be will be made by people closest to the data. And this scales because its part of every persons job, rather than having these people out in the blue. This drives a lot of innovation."

The second quote that inspires Kaushik comes from former US Secretary of Defense Donald Rumsfeld:

"There are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns – there are things we do not know we don't know."

"I just love the idea that the entire data world can be boiled down into these key problems," he relates.

How does an organization get past the the "known unknowns" to start exploring the "unknown unknowns"? In a related post, Kaushik shares some data "life lessons" for making the most of big data:

1. Don't buy the hype of big data and throw millions of dollars away. But don't stand still. "Take 15% of your decision making budget and give it to one really, really smart person and give that person the freedom to experiment in the cloud with big data possibilities for your companies. It is cheap. You can do dirty data warehousing pretty darn fast. You can find all the ugly warts and problems. Don't build the biggest, baddest big data environment over 32 months, only to realize it was your biggest, baddest mistake."

2. Big thinking about what big data should be solving for is supremely important. "I've championed the need to leverage frameworks like the Digital Marketing & Measurement Model, in the web context, to ensure that the analysis we do is deeply and powerfully grounded in what's important to the business. All it takes is a little business analysis. When you have access to all this data, the answers you find will be surprising, the insights you deliver will be brilliant, and your impact on the business will be huge. But that can only happen if there is a model that defines the purpose of your sweet big data adventures."

3. The 10/90 rule for magnificent data success still holds true. "For every $100 you have available to invest in making smart decisions, invest $10 in tools and vendor services, and invest $90 in big brains (aka people, aka analysis ninjas). Let the 10/90 rule be an inspiration to simply over-invest (way over-invest) in people, because without that investment big data will absolutely, positively, be a big disappointment for your company. Computers and artificial intelligence are simply not there yet. Hence your BFF is natural intelligence."

4. Shoot for right time data, not real time data. "Real time data is almost insane to shoot for because even for the smallest decisions, you'll have to do a lot of analysis first (5 hours), then present it to your superior (1 hour), who will add two bullet items and send it to a team of people (20 hours), who will in turn argue about priorities and how much the data is wrong (16 days), but ultimately come to an agreement because the deadline to make the decision passed 7 days ago (20 seconds), and send the data to the big boss who'll read just the first part of the executive summary (3 days), and decide that the data is telling her something counter to what she has always known works, and she'll make a decision based on her gut feel (5 seconds), and some action will be taken (14 days). Total up those numbers. If you can't react in real time, why do you need real time data? "

5. Data quality sucks, just get over it. "Data on the web will never get to 95% clean and it will have big holes and it will be sparse in some areas. We should aim to collect, process and store data as cleanly as humanly possible, but after that we should move on to using the data."

6. Eliminating noise is even more important than finding a signal. "Thus far in the history data analysis the objective for our queries has been trying to find the signal amongst all the noise in the data. That has worked very well. We had clean business questions. The data size was smaller and the data set was more complete and we often knew what we were looking for. Known knowns and known unknowns.  With big data, it is so much more important to be magnificent at knowing what to ignore. You must know how to separate out all the noise in the disparate huge datasets to even have a fighting chance to start to look for the signal."

This post was originally published on Smartplanet.com

Editorial standards