Do we really need data scientists?

Do we really need data scientists?

Summary: They are powerfully intelligent, expensive and rare. You could argue that they don't even exist. So why does every business executive think they need one?

SHARE:
TOPICS: Big Data
5
executives-line-woman-stock-620x348

A few weeks ago, I enjoyed a dinner at which the topic of conversation was simple: big data.

At one point, the subject of data scientists came up.

"Is there really a shortage?" one person asked.

"It's even worse than people think," another dining companion responded.

"But really, what is data science? It's not like people went to school to major in it," a third chimed in.

"Exactly," the second replied. "Anyone with 'data scientist' on their resume is suspect."

Later on, that same fellow admitted that he had just hired a "superstar" engineer with this very reason in mind.

What gives?

If you read ZDNet, you know that big data is this year's big buzzword. (OK, it's technically two words, but come on.) Every executive wants to deploy it in their business, even if they're not really sure what it is, how it functions or what it can really do for them. In the enterprise, "big data" is the new social -- meaning it's the word you attach to everything to seem like you're on the cutting edge.

Still, reports of shortages for data scientists -- the wizards to make it all work -- continue. If you thought it was hard to hire away the industry's top software developers, competing for these oracles of insight is exponentially more difficult.

Is it all hype? Are data scientists really necessary (and worth all that trouble)? And for who are they appropriate, really?

I wanted to dig into these questions. So I rung up Stephen Purpura, CEO/CTO of Context Relevant, a Seattle-based company known for its machine learning platform. Purpura recently announced the hiring of three more members of his company's "brain trust," even as he develops a resource to make such people unnecessary for certain kinds of businesses. 

In other words, he's a guy who sits on both sides of the hype spectrum. I spoke with him.

ZDNet: Give us a quick introduction to your company before we dive in.

SP: The company was founded in March 2012. We've been working on this for nearly a decade, the business idea that is the company today.

Back in 2001, during the dot-com bust, I was working for Madrona Venture Group that had [intellectual property] in big data. But we shelved it.

I've spent the last decade trying to bring it back. We formed the company after creating the proof-of-concept and we hit the ground running.

What most vendors do is sell an electric set, dump it on the table, and expect you hire a bunch of experts to put it together and solve the problem. These are people who go from project to project, company to company and make a lot of money. This happens in advertising, or Wall Street. It's highly manual work, and a lot of that knowledge is stored in their heads is not encapsulated in tools. Most projects that are successful these days start from scratch.

It turns out that the problems that Wall Street faces are not that different from those that companies in the Fortune 2000 face. The only difference is that on Wall Street, there's a need for speed because speed is money. So they're the front-runner. The needs of high-traffic websites come after that.

Many companies are understanding that a lot of their business depends on understanding the needs of their customers. These types of things are the difference between the haves and the have-nots today. There is growing demand and expectations for results.

ZD: Let's get down to it, then. Data scientists! They're in demand. They're rare. They're expensive. Business leaders think they need them, even if they're not sure what they do. What gives?

SP: The industry believes that a person is a data scientist when they have the ability to visualize projects that make the company money by looking at the data that exists and turn that into a statistical process that predicts whether or not to invest in a customer, or upsell them, or whatever.

These people have statistical skills, business savvy, and elite programming skills. They know how to do things that regular developers don't know how to do. That makes these people very hard to find. How many people do you know have the technical skills of the best developers, but can also visualize projects that make the company money?

That's why our company exists -- we are trying to bridge the gap. Ruby on Rails allowed people who didn't know how to build the structure of a website to build something that basically worked; it wouldn't fall over when 1,000 people visited it. We've done the same thing by providing the same set of guide rails and example applications so customers can get up and running quickly and find value. Contrast that with a world in which companies try to hire a person who can do all this, or a manager who cobbles together a large team across various skill sets.

ZD: Do we really need data scientists? Living, breathing ones, anyway?

SP: Well, when you have the opportunity to buy LeBron James, you buy. (laughs) But not everybody has that.

They hire superstars with track records because the systems haven't been rolled out. You can't hire Accenture to do something like this. Superstars are what get it done. That's what Google does -- they have a substantial bench of people who are really, really smart. Companies that already have established businesses where a company like Google can't go after them, they can hire people who aren't necessarily LeBron James and still stay ahead of their competition.

ZD: Indeed, you just hired some. So where does that leave you? Sir, you're no Google.

SP: It's really hard as a young company to hire them. We don't expect them to be fantastic to write great code. We have a team that's good at that, and they make it easy for our data scientists to do less work. That's step one: reduce the requirements for our data scientists to write production code.

Step two is hiring people that are really creative and can visualize how to solve problems. Their backgrounds aren't necessarily computer science. One guy on our staff is a very well-known computational sociologist. You never see those things together. He's never going to be able to write production code for a Fortune 2000 company; in fact, he'll be too bored by it. So he gets to see lots of customer problems and tweak problems to make them more general.

So we paired these people who were really good at statistics and analysis with people that are experts at building distributed systems.

We struggled through the last 10 years when the tools were so bad that it took days, weeks, months to do a single iteration of a problem -- to look at what was actually happening in the data. It took seven years to become an experienced data scientist. It took that long to do the 10,000-hour estimate. You have to have experience to solve data problems. Now we've sped that up so that you can fail in seconds. So you can iterate rapidly. A better user interface was a major experience change that allows you to create better people faster.

ZD: Given your point about building experience, can someone really call themselves a data scientist today?

SP: If they work at Facebook, yes. If they work at LinkedIn, probably. If they worked at Google, maybe. There are people who are good at this. I've adopted this term because the people who adopted it early were actually the people who were using science and data to actually develop products. When you interview them, it's easy to tell -- we can screen someone and tell within a half hour whether they know what they're doing or not.

ZD: Data science is a relatively new term, but the concept isn't all that new. What changed in the last decade?

SP: I was a software engineer and program engineer at Microsoft before I went back to school. Take my cohort at Microsoft who have moved on to senior management positions -- I talk to them about what I'm doing now, and they're flabbergasted. It's completely foreign to them, because most of these things have been invented in the last five years. And these are rockstars! People I think are among the smartest people in the world. I'm not one of those people, but I'm now armed with methods that use this data better. The faster they realize that, the faster they'll be the people in their company that can actually make a difference.

Companies like mine are going to bring the Fortune 2000 into this world. We are dedicated to helping them in every way possible. That's our job -- to understand how to help them be successful with data without having their whole team retool.

You know, I have every incentive to tell you that data is going to drive everything in this industry. But I take my salary in this. My money is where my mouth is.

Topic: Big Data

Andrew Nusca

About Andrew Nusca

Andrew Nusca is a former writer-editor for ZDNet and contributor to CNET. During his tenure, he was the editor of SmartPlanet, ZDNet's sister site about innovation.

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

5 comments
Log in or register to join the discussion
  • And if you read ZDNet . . .

    "If you read ZDNet, you know that big data is this year's big buzzword."

    And if you read ZDNet, you know they worship pretty much every buzzword as if it were a new fact or trend, and as if everybody, without regard for their situation, could benefit. The buzzword of the day is also the panacea of the day, solving all problems everywhere.

    Want to solve world hunger? Use a buzzword. Want to fix your kitchen sink? Use a buzzword. Want to get rid of a pimple? Use a buzzword. They fix anything, anytime, anywhere.
    CobraA1
    • You do realize that "they" can hear you, right?

      And by "they," I mean "me." And I don't appreciate it, CobraA1!

      A term like "big data" exists to quickly (and memorably) summarize a family of related concepts. "Big data" means a lot of different things to a lot of different industries, but using that term helps us make it clear to others what topical ballpark in which we're playing for any given post.

      So what I'm saying is: we may use the terms a lot, but we're not blind to the pros and cons of the tech we're talking about. We aren't good at everything here at ZDNet, but endlessly scrutinizing and debating hype is definitely one reason to read us.

      I think we do a pretty good job trying to tease purpose out of airless corporate proclamations. That's the very purpose of the article above, isn't it?
      andrew.nusca
      • Re: A term like "big data" exists to quickly (and memorably) summarize a fa

        Isn't it what used to be called "data mining"?
        ldo17
  • HOW do you use data scientists?

    I think everybody sees how DSes can add value in sales or marketing. But how do they add value in proposing a new business venture? Or a new product? It seems like a DS would need a deep understanding of their business and its market in order to perceive or reveal untapped opportunities. How often does that really happen?

    Or is the DS moniker just another form of industrial optimization suited only to lightweight business subunits like advertising or sales? If so, do they really add significant value beyond the old stale trend econometrics that have been used to guide ads/sales since Adam & Eve?
    randcraw
  • Well, some organizations do ...

    "Jobs in Data Mining and Analytics
    http://www.kdnuggets.com/jobs/

    Plenty of listings here for data scientists. And, for the record, statisticians, data miners, machine learning specialists, expert system specialists, visualization specialists, reporting specialists, OLAP specialists and ETL specialists, are all data scientists too. And, more often than not, a data scientist is a member of a multidisciplinary team that includes developers, DBAs, data modelers, data quality specialists, domain experts and managers.

    A high-profile example is IBM's Watson. Lots of people behind this, even for the Jeopardy appearance that introduced Watson to popular culture (I saw every one of the shows). Watson has already been unleashed in medicine and will, eventually, find a place in many domains.

    Data science is for real. Just don't get too caught up with the name as it's been around for a long while.
    Rabid Howler Monkey