Data science: 'Machines do analytics. Humans do analysis'

Two leaders of Booz Allen's data science team talk talent, building a data science team and the machine-human link in analytics.
Written by Larry Dignan, Contributor

Booz Allen has been pushing data science, developing tools and publishing field guides to advance analytics. Internally, Booz Allen also has a Pinterest for data science. The goal: Advance data science so it just happens in the background.

However, it's way early before data science and analytics simply happen. Josh Sullivan, who leads the data science and analytics practice at Booz Allen, likens analytics to where computing was in the 1950s.

I caught up with Sullivan along with Angela Zutavern, vice president in the strategic innovation group, to talk about the issues surrounding data science. Here are the highlights of our chat:

Talent. Booz Allen has a team of 500 devoted to data science projects and 50 of them are "national treasures," says Sullivan. Those elite 50 data scientists have worked on multiple projects in many industries and have all the traits required for asking the right questions needed to transform business.

"These people are curious and relentless in the face of failure," Sullivan said. "They keep pushing and no matter and they think they can contribute no matter how big the problem is." For instance, a team may fail 340 times before finding the pattern that means something. You can't get disappointed easily.

The machine/human link. Sullivan isn't big on analytics technology that serves as a magic bullet to data science. No machine can be a miracle cure. Humans have to find the patterns, ask the right questions and make the connections in the data. "Machines do analytics," explained Sullivan. "Humans do analysis." Computers are good at detail and examining the past, but real data science requires imagination and cognitive ability.

"I can take 10 tools, U.S. Census data and agriculture data and determine that people who were strangled by their bed sheets tracks cheese consumption," Sullivan said. "A human knows that makes no sense. You can't commoditize reasoning by a human."

Another way to put it is that machines are used as "data janitors" to clean data and crunch numbers, but it's a small part of the overall process.


Growing talent. Sullivan said Booz Allen has largely chosen to grow its own data science base of talent. "At first we started throwing out brains on a stick," Sullivan said, referring to a move to hire a bunch of PHDs. The problem: Booz needed people with soft skills, knowledge of the industry and processes.

Bottom line: Data science is a team sport and you need a diverse team to explore multiple angles.

"We'll aggressively hire, but we want to grow our own so we know what we're getting," Sullivan said. "There's also a benefit to that for our people." Booz Allen's data science program includes rotations to different industries so employees can be well rounded when it comes to projects. Booz Allen, like IBM and others, are involved with crafting data science curriculums at universities. Booz Allen also puts aside time to apply data science to big social projects. Rotating workers on social projects refreshes them and encourages big thinking.

Industries where data science shines. Sullivan said health care and all the data thrown off by the "quantified self" movement will be promising. Wearables are one part of the equation, but the data from classified medical equipment will be just as important if not more so. Transportation is another promising area for data science. Booz Allen also has a large sports practice too.

Sullivan said his firm has worked on a predictive model that forecasts a Major League pitcher's next pitch with 95 percent accuracy. The prediction is based on historical data, the pitcher's history, weather, situation in the game and other variables. Fraud is another key category as is machine-to-machine applications. One caveat: Predictive models won't be perfect when analyzing unpredictable humans. "Human behavior is hard to predict," Sullivan explained. "For instance, in fraud it's hard to analyze a human who is doing everything to defeat you and avoid detection."

Where operations research ends and data science begins. Sullivan's transportation examples revolved around things like route planning and networks are the traditional turf of operations research.

Sullivan said data science and operations research (OR) are complementary. "OR is about simulation. Data science is about putting the right parts in the right places. The OR team brings data science into the real world," Sullivan said. "We have a big OR team and have knitted it together with our data science practice."

Return on investment. Sullivan said it's difficult to show ROI on data science capabilities in the first year. Companies should think of that first data science year as a bootstrapped effort where you're discovering the unknown unknowns, curating data and tagging it so it can later be linked to a business outcome. "It's about small wins at first," Sullivan said. Another key point from Zutavern: No company has perfect data ontology and categorization so you shouldn't put off an analytics effort in hopes of perfection. Every company has data gaps and the information is likely to be messy.

Editorial standards