Is machine learning icing on the cake for data scientists?

While machine learning is growing more ubiquitous in the smart products and services that consumers and businesses use, should it be the first step for data scientists? According to the head of a major data science operation for a leading European mobile carrier, focus on the science first.
Written by Tony Baer (dbInsight), Contributor

Image: iStock

It is getting more and more difficult to avoid being touched by machine learning (ML). If you buy products from Amazon, make bids on eBay, or stream the latest episodes of Narco on Netflix, chances are, your experience has been shaped by an ML algorithm that makes informed assumptions on what your preferences are, or predicts what they will be. If you allow Facebook to tag friends in your photographs, you are taking advantage of deep-learning algorithms trained for recognizing human faces, and if you take a series of photographs on your Android device, Google will piece together a storyboard of the highlights of your weekend.

It's the same for businesses, even if they don't necessarily realize it. There was a study recently published by natural language analytics provider Narrative Sciences that revealed an interesting dichotomy: only 38 percent of respondents reported using ML, yet 88 percent stated that they used analytic tools that incorporate many of the fruits of ML such as automated predictive analytics, automated written reporting and communications, and voice recognition and response.

Let's just give a quick sampling of the capabilities that you'll likely find with analytics and data integration tools. Last week, we wrote about DataRobot, an example of a new wave of tools to automate ML for data scientists. Look further, and if you use cloud-based analytics services like Amazon QuickSight or IBM Watson Analytics, you are taking advantage of capabilities that figure out what data to look at, queries to ask, and what stories to tell.

At Ovum, we expect that ML will increasingly become a default capability under the hood for analytics tools. And if you use data preparation tools for big data, chances are you are using one where ML helps you match, lightly cleanse, and reconcile data sets.

Consider this all just the tip of the iceberg.

So our ears perked when we attended a presentation by Jan Romportl, the chief data scientist for O2's Czech Republic business, given at Teradata's annual user conference this week. Romportl's contention is that the "key word' in data science isn't data. It's science. It's all spelled out in this definition. And then he ventured that ML was not synonymous with data science.

Say what?

Romportl's base case was about data science supporting the business -- something that's hardly debatable. For O2, the goal for data science is helping the company build new products based on what it discovers from the data. His team has a good track record for building products that are either consumed internally or form the basis of offerings for external customers. They range from profiling customers through their web, mobility, and television habits to generating "lookalike" targeting models, as well as measuring the effectiveness of outdoor advertising effectiveness, scoring customer credit, conducting cross-media targeting, and generating next best-offer initiatives.

Romportl wants data scientists on his team to stick to the basics. They must be proficient in R and/or Python, not to mention the grunt work of testing and retesting hypotheses. As science, the goal is repeatability. As business, the goal is repeatable models that can be used to bolster the top line or bottom line, which are codified and documented. And, most importantly, it is about automating the running of these models, because once the model is written, there's the need to move on to dealing with the next problem.

So, is ML icing on the cake? That's probably too harsh a judgment. For applications that embed analytics, such as predictive maintenance, supply chain optimization, or fraud detection, ML is essential for making people more productive and effective at spotting meaningful phenomena that might otherwise elude the human mind. And we believe that ML is essential to scaling the processes for populating data lakes.

But for aspiring data scientists being recruited by enterprises, Romportl's walk before you run philosophy has clear merit. Maybe you can build that clever decision tree ML model for charting customer journeys, but your decision tree will only be as valid as the underlying data science hypothesis for identifying the meaningful decision points in the customer's journey.

Editorial standards