Data scientists don't scale

Data scientists don't scale

Summary: In last week's ZDNet "Great Debate," Robin Harris and I faced off on the question of whether "we need data scientists to make sense of this tidal wave of information." I think data scientists are important, but they're not the solution. What follows is my argument, in essay form.

SHARE:
TOPICS: Big Data
10

"Data scientist" is a title designed to be exclusive, standoffish and protective of a lucrative guild.  To be clear, people who have the skills to qualify for this moniker are very valuable, but the title itself isn't.  The blocker to broad adoption of Big Data analytics isn't a shortage of data scientists; it's our current dependency on them.

Big Data and analytics are powerful, and the technologies around them are exciting. But if they can only be harnessed by highly-paid specialists, then they haven't fully evolved. We need data and analytics technologies, but we shouldn't need expensive, scarce, Shaman practitioners to use them. More than data scientists, we need tools that empower knowledge workers to do Big Data analytics on their own.

Is crossing over possible?
People can certainly obtain the literacy necessary to carry out analytics on Big Data.  Business people can be made capable of working with the data, and developers who are not currently analytics-focused can be made capable of collecting the data and performing analytics in their code.

Of course, certain people can be trained to become very highly-skilled specialists, but that would be the exception more than the rule, and that's OK.  We don't need people to retool en masse into scientists, we need them to obtain a competency.

Beyond the hype
The term "data scientist" is over-hyped.  But in fairness, so are the terms "Big Data" and "analytics," and yet these are still quite valid areas of specialization.  The problem with the term "data scientist" goes beyond the hype; there's an attitude and adversarial tone to the term. This tone discourages people from obtaining analytics competency, as it transmits an implicit message that the work must be outsourced to highly-trained individuals.  Aside from the hype, it's pretension and snobbery that make "data scientist" an unhelpful term.

Dilution of the term
There a risk that many technologists will become "data scientists" in the name of finding a better gig, in exactly the same way that happened with other lofty titles in technology ("architect," for example).  Title inflation happens in any field, but in the tech field, terms and titles are in any case viewed as metaphors, more than literal descriptions.  Tech folks tend to take poetic license with titles, and those who don't do so find themselves at a disadvantage compared to those who do.

It's the tooling, stupid
Analytics in general, and Big Data specifically, have terribly immature tooling compared to mainstream relational database and BI products.  That being the case, it's no wonder that only "scientists" can get real work done.  These tools were built for laboratory use, not business use.

Just as self-service BI is in vogue (and is legitimately quite powerful) today, so too should self-service Big Data and predictive analytics be a market phenomenon.  Once it is, people with the skills that we classify under data science today will still have an important role, but it won't be nearly so central as it is now.

Data literacy, and what it could look like
It won't come as a surprise that I believe a scenario where we have more data literacy -- and business and tech people who are "bilingual" -- to be the one that will most successfully solve the labor issues we face.  Data science is about having a command of both data technology and business domain expertise.  If the technology becomes simple, and business people become more adept with it, then business users can be bona fide analytics professionals.

If I had my druthers, a perfect business analytics wonk would be a sales, marketing or planning professional who was also a tech power user, had a command of statistics, knew Excel very well and could do some light programming.  But that's an ideal…and in order for analytics technology to take off, we shouldn't need people to fit this ideal in order to be productive Big Data analysts.

Data science algorithms?
Implicit in the definition of data scientist is possession of business intuition and instinct that mere algorithms can't replace.  If you accept that the term is legitimate, then you accept that a combination of human intelligence and technology expertise is what makes someone an authentic data scientist.  While I'm not a huge fan of the term "data scientist," I do feel the experience of the business user and her non-algorithmic intimacy with the semantics of the data is very important.

What we will need, and what we won't
Expertise in data exploration and visualization tools, programming/developer skills, an understanding of statistics, and high-level database design skills will remain important, regardless of whether the data scientist role remains in vogue.  Equally important will be a deep understanding of the business, and the data sources that measure its activity and outcomes.

The term "data scientist" will subside and may well sound dated five years from now.  The skills will become more commonplace and commoditized.  When that happens, the real boom will begin, because the technology will become widely adopted and thus more useful.  But for the relatively small club of people clinging to a data scientist identity and pay scale, it may seem like a bust.

Summing up
Big Data technology is powerful, and it keeps getting better. But the technology does, right now, require niche specialists to derive the greatest business value from it. These specialists have to be renaissance people – possessing a combination of technology, mathematics and business skills, and knowledge. It's not clear that being so clever and versatile makes these specialists into "scientists," but it does make them rarefied.

Nonetheless, for Big Data and analytics implementations to grow and become truly mainstream, having such diverse skill set requirements for them is not a sustainable situation. Market need is going to drive evolution in the technology such that the barrier to entry will not be nearly so high as it is now. If for some reason that didn't happen, then adept use of Big Data would continue to be an option open only to a relatively small group of customers.

The solution to our problem isn't legions of new data scientists. Instead, we need self-service tools that empower smart and tenacious business people to perform Big Data analysis themselves. The specialists will still have an important role, but they won't be the linchpin to scaling Big Data across industries.

Topic: Big Data

Andrew Brust

About Andrew Brust

Andrew J. Brust has worked in the software industry for 25 years as a developer, consultant, entrepreneur and CTO, specializing in application development, databases and business intelligence technology.

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

10 comments
Log in or register to join the discussion
  • what Data Scientists really do is uncover, connect, and automate

    Most of the power here is the ability to scale well beyond the human capacity.

    That you need a human creating a report means you are probably doing it wrong. Real time intelligence systems that leverage historic behavior and outcomes to shape the future, that is where we are headed. Why report on something when you can act?

    I do think your premise is right, most of this isn't rocket science. But it is not analysts we will need, but developers who can build on top of Big Data.

    You can call in the Data Scientists for iteration 3 or 4, after you've got the easy wins.
    mobile_manny
  • Spreadsheets are the best tool we have for manipulating data

    Talk to "data scientists", statisticians, quants or actuaries and data consumers and many will confirm that a spreadsheet is one of the most important tools in their arsenal. Business intelligence tools and data analysis tools create abstractions of the data and impose someone else’s view of data and how it can be manipulated. Spreadsheets provide WYSIWYG. Transparency. Perhaps that is the best we can do as far as tools are concerned.

    What is lacking in data and what the professions surrounding data manipulation are lacking is transparency. Tools that show what assumptions were made, which variables were included and excluded and how the algorithms manipulate the data are needed. And finally, if these same tools could expose the biases and subjectivity that went into the data manipulation, then we would have real useful tools that anyone could use, practitioner and consumer alike. In the meantime; spreadsheets are the closest we have to this.
    richord@...
  • Spreadsheets are BI tools

    Unlike many, who tend to trash-talk Excel, I see it, and its add-ins like PowerPivot, Power View and Geo Flow, as important BI tools. So I would agree with richord@..., even if it seems otherwise.
    andrewbrust
  • Big data is nothing new??? What's the hype about?

    Quick comment
    http://www.smartcubeit.co.za/?p=830

    More..on big data
    http://www.smartcubeit.co.za/?p=1&preview=true&preview_id=1&preview_nonce=7aac70529d
    Julian Rigotti
    • Woops wrong URL - 2nd ONE

      http://www.smartcubeit.co.za/?p=1
      Julian Rigotti
  • Data "science" is NOT science

    Hate to break it to all you software engineers, but neither is "computer science", "information science", etc.. Science is a very particular thing. It is not studying things, or using tech enology, or building gadgets. Science is the systematic study of an hypothesis using the scientific method. No scientific method, no science.
    Most of these "sciences" are actually engineering. Just like many people who call themselves artists are actually craftsman. This is not a bad thing. There is nothing wrong with engineering, just as there is nothing wrong with being a craftsman. There is, however, something wrong with claiming one is something that one is not.
    .DeusExMachina.
    • For once, I agree with your comments.

      ;)

      I argued against the term "data scientist" a few months ago, right here in this forum, because, what is being done is not "science". What is being done with the data, is analysis, and proposals for the efficient use and presentation of that data.

      I liken a "data analyst" to a pharmacist. A pharmacist is not a scientist, nor a research scientist. A pharmacist uses the knowledge already created by the real scientists in the field. Scientific research for creating new drugs, is science. The use of that "settled" science is does not make one a scientist. A pharmacist does not have to do any scientific research in order to prescribe a medicine; the practices for that field are already established. It's the same with what a supposed "data scientist" would be doing. There is no "real science" in what a data analyst would do.
      adornoe
    • Decision Science not Data Science

      Science has always been about setting a hypothesis and proving it right or wrong. We're just getting started with BigData. The proper and most beneficial use of big data will be when everyone becomes a decision scientist. The data, the analytics, dashboarding and interfaces will simply be tools that will help a business/enterprise/organizational decision maker verify the viability of their hypothesis. That's the real promise of bigdata. If it isn't, it should be about being the toolkit for Decision Science.
      Sharda Parthasarathy
  • Science has outstripped this entire conversation

    What most [people are not yet aware of, but will soon be.. Science has outdated this entire conversation, and all it's vaunted complexity .. Associative DB are here.. and will replace everything and anything that uses COLUMN/ROWS as base structure..
    jean letennier
  • It's not just skills that are needed

    I'm by no means advertising, but the company I work for makes a totally code free hadoop platform. That said, even a platform that requires no programming experience at all will not remove the one true requirement, and that is a the ability to think about Data, Architecture, BI, etc... the way a human can. We can automate whatever we want, but there is no substitute for the computer that we have in our head.

    It's kind of like baseball. You can be like Billy Beane and use sabermetrics all you want, but without the gut instincts of people like Tony LaRussa, Jim Leyland, Bruce Bochy and the like, your teams won't win.
    Josh Xplenty