Data scientists don't scale

In last week's ZDNet "Great Debate," Robin Harris and I faced off on the question of whether "we need data scientists to make sense of this tidal wave of information." I think data scientists are important, but they're not the solution. What follows is my argument, in essay form.
Written by Andrew Brust, Contributor

"Data scientist" is a title designed to be exclusive, standoffish and protective of a lucrative guild.  To be clear, people who have the skills to qualify for this moniker are very valuable, but the title itself isn't.  The blocker to broad adoption of Big Data analytics isn't a shortage of data scientists; it's our current dependency on them.

Big Data and analytics are powerful, and the technologies around them are exciting. But if they can only be harnessed by highly-paid specialists, then they haven't fully evolved. We need data and analytics technologies, but we shouldn't need expensive, scarce, Shaman practitioners to use them. More than data scientists, we need tools that empower knowledge workers to do Big Data analytics on their own.

Is crossing over possible?
People can certainly obtain the literacy necessary to carry out analytics on Big Data.  Business people can be made capable of working with the data, and developers who are not currently analytics-focused can be made capable of collecting the data and performing analytics in their code.

Of course, certain people can be trained to become very highly-skilled specialists, but that would be the exception more than the rule, and that's OK.  We don't need people to retool en masse into scientists, we need them to obtain a competency.

Beyond the hype
The term "data scientist" is over-hyped.  But in fairness, so are the terms "Big Data" and "analytics," and yet these are still quite valid areas of specialization.  The problem with the term "data scientist" goes beyond the hype; there's an attitude and adversarial tone to the term. This tone discourages people from obtaining analytics competency, as it transmits an implicit message that the work must be outsourced to highly-trained individuals.  Aside from the hype, it's pretension and snobbery that make "data scientist" an unhelpful term.

Dilution of the term
There a risk that many technologists will become "data scientists" in the name of finding a better gig, in exactly the same way that happened with other lofty titles in technology ("architect," for example).  Title inflation happens in any field, but in the tech field, terms and titles are in any case viewed as metaphors, more than literal descriptions.  Tech folks tend to take poetic license with titles, and those who don't do so find themselves at a disadvantage compared to those who do.

It's the tooling, stupid
Analytics in general, and Big Data specifically, have terribly immature tooling compared to mainstream relational database and BI products.  That being the case, it's no wonder that only "scientists" can get real work done.  These tools were built for laboratory use, not business use.

Just as self-service BI is in vogue (and is legitimately quite powerful) today, so too should self-service Big Data and predictive analytics be a market phenomenon.  Once it is, people with the skills that we classify under data science today will still have an important role, but it won't be nearly so central as it is now.

Data literacy, and what it could look like
It won't come as a surprise that I believe a scenario where we have more data literacy -- and business and tech people who are "bilingual" -- to be the one that will most successfully solve the labor issues we face.  Data science is about having a command of both data technology and business domain expertise.  If the technology becomes simple, and business people become more adept with it, then business users can be bona fide analytics professionals.

If I had my druthers, a perfect business analytics wonk would be a sales, marketing or planning professional who was also a tech power user, had a command of statistics, knew Excel very well and could do some light programming.  But that's an ideal…and in order for analytics technology to take off, we shouldn't need people to fit this ideal in order to be productive Big Data analysts.

Data science algorithms?
Implicit in the definition of data scientist is possession of business intuition and instinct that mere algorithms can't replace.  If you accept that the term is legitimate, then you accept that a combination of human intelligence and technology expertise is what makes someone an authentic data scientist.  While I'm not a huge fan of the term "data scientist," I do feel the experience of the business user and her non-algorithmic intimacy with the semantics of the data is very important.

What we will need, and what we won't
Expertise in data exploration and visualization tools, programming/developer skills, an understanding of statistics, and high-level database design skills will remain important, regardless of whether the data scientist role remains in vogue.  Equally important will be a deep understanding of the business, and the data sources that measure its activity and outcomes.

The term "data scientist" will subside and may well sound dated five years from now.  The skills will become more commonplace and commoditized.  When that happens, the real boom will begin, because the technology will become widely adopted and thus more useful.  But for the relatively small club of people clinging to a data scientist identity and pay scale, it may seem like a bust.

Summing up
Big Data technology is powerful, and it keeps getting better. But the technology does, right now, require niche specialists to derive the greatest business value from it. These specialists have to be renaissance people – possessing a combination of technology, mathematics and business skills, and knowledge. It's not clear that being so clever and versatile makes these specialists into "scientists," but it does make them rarefied.

Nonetheless, for Big Data and analytics implementations to grow and become truly mainstream, having such diverse skill set requirements for them is not a sustainable situation. Market need is going to drive evolution in the technology such that the barrier to entry will not be nearly so high as it is now. If for some reason that didn't happen, then adept use of Big Data would continue to be an option open only to a relatively small group of customers.

The solution to our problem isn't legions of new data scientists. Instead, we need self-service tools that empower smart and tenacious business people to perform Big Data analysis themselves. The specialists will still have an important role, but they won't be the linchpin to scaling Big Data across industries.

Editorial standards