Data scientists are essential employees, in the sense that every industry needs them now.
Asked if we're at a point where every business will employ or need someone in data science — if they don't already — Michelle McSweeney, Codecademy's data science domain manager, answered simply and emphatically.
"Yes. Full stop. I don't think I can emphasize that enough — that almost every job is a data job."
And there are a lot of those jobs available right now. In fact, Glassdoor had more than 10,000 data scientist job postings as of publication. The site also said data scientist is third on its list of 50 best jobs in America for 2022. Enterprise architect and full-stack engineer were in first and second place.
Data is so ingrained into our life at this point I don't think there's a single industry that's not touched by data. There's not a single company that's not touched by data.
"I take a very broad view of that," she continued. "If you are a restaurant and you're collecting paper receipts, and that's how you do your taxes, that's still data. It might not be digitized, you might be more efficient by digitizing that, but it's still data. So I don't think that there's a single thing that we do that doesn't have data involved in it on some level."
Like every other professional skill, specializations exist. And they matter to employers and job seekers. That shift, she said, is an inflection point for the data science industry.
That's one of the reasons Codecademy introduced four new data science career learning paths in June: Machine learning, data analytics, inference, and natural language processing. Codecademy says they analyzed real-world job criteria and roles within major tech companies to shape the curricula.
The online learning platform said it had 2.3 million people enrolled in its data science career path in the last two years.
In a recent conversation with ZDNet, Michelle talked about universal skills for data scientists, the content and focus of the new specializations, and the value of asynchronous learning.
Here's our interview. It's been condensed and edited.
Data science reaches 'a point of maturity'
Michelle McSweeney: I think data science has gotten to this point of maturity.
In 2018, Elena Grewal, who was the data science director at Airbnb, put out this blog post on LinkedIn saying how they organized their data science team. And there was a lot of discussion around it afterward and refinements, and iterations on it, but ultimately it came down to this idea that there's three general flavors of data science.
There's inference-based data science, there's analytics-based data science, and there's machine learning.
Those terms don't mean a ton of things to people outside of data science. And even within data science, it's hard to tell the flavor of data science that you do because it's what you do. [If] a company says 'I need a data scientist. I'm going to put out a call for a data scientist,' they don't even realize that they're looking at like one slice of that pie.
A closer look: analytics, inference, and machine learning
McSweeney: Analytics focuses a lot on SQL; it focuses a lot on answering direct questions with data. And I like to call that answering questions about what — what happened, what's going on, what's that?
Then there's inference-based data scientists. And those are the type of people that are answering the questions about why. These are people who are doing a lot of stats — they might work in Arc, they might work in SQL, they might work in Python — but ultimately their purpose is on statistical analysis of data in a variety of ways. They're the ones that are doing A/B tests; they're the ones that are doing hypothesis tests.
When we talk about [how] data science is just a fancy word for statistics, we're talking about inference-based data scientists. It's just one flavor that they're talking about. They're answering the questions about why.
And then we have those machine learning data scientists … There was that 2012 Harvard Business Review article [about how] data science is going to be the sexiest career [of the 21st century].
They were talking about machine learning-based data scientists. ... They're the ones that are getting deep into the edges of what computers can do today. We couldn't do machine learning 20 years ago because our computers weren't powerful enough.
And that's where machine learning comes into play. It's that the power of computers and the power of data have finally met up. They're answering questions about the future. They're answering what's going to happen. So they're answering those predictive questions.
Programming and statistics: data science's universal skills
McSweeney: [All Codecademy learning paths include] a foundation in programming. No matter what. And that is both Python and SQL. SQL is for databases, and Python allows you to write programs freeform. So that's the foundation everyone has. Everyone has a foundation in data visualization and a little bit of a specific Python package called Pandas.
Pandas is an invaluable tool for data science because it allows you to work with what's basically a table of data programmatically. So we don't go deep but go to have a solid foundation for everyone.
[And] even if you're the most technical data scientist, you're still going to have to communicate the results of those experiments with stakeholders. So it's important that everyone has that foundation in communications.
There's also basic statistics for everyone because every data scientist should understand how to run summary stats, like to get the mean of something. [There's also a focus on] standard deviations and distributions because when we start to think about data, once you're thinking about it technically, you have to think about it statistically. The two are deeply tied together.
Even if you're not going into heavy statistics, that statistical foundation is what differentiates a data scientist from someone who can manipulate data in a table. It's kind of a thing that takes you from being a programmer to being a data scientist.
Data science isn't one size fits all
McSweeney: Not every organization does all three of those types of things, right? Not every organization needs to use its data to make predictions or to make recommendations.
Most organizations need an analytics-based data scientist. Most organizations need to be able to look at their data and say, 'What's going on?' That's analytics. Not everyone is running A/B tests.
Not everyone needs an inference person. Not everyone needs a machine learning person.
So being able to narrow in on the type of data science an organization is doing and narrow in on the skills that an individual needs to contribute in that organization lets people focus, I think and get to a career much faster because they're not trying to learn everything.
McSweeney: The majority of our learners are beginner job seekers. So they're somebody who is starting from zero, and they're like, 'I want to be connected to a job as soon as possible.' And that's who these career paths are really geared toward.
We have other products that are in development right now to more specifically target those upskillers. … It's a little more than a third of our learners [who] are upskillers. They're not typically taking the career paths from start to finish.
When I look at user progress data and I see cohorts jumping into the middle — or jumping into the end — I'm like 'those are my upskillers.' And that's one of the cool things about these career paths is you can start from the beginning and go to the end and have all the job-ready skills and get a certificate through that, or you can pick and choose what's actually relevant to you and fill in the gaps of knowledge as well.
The future of learning is asynchronous
McSweeney: I really think that asynchronous [learning] is just as valid as synchronous, probably more so.
I taught in classrooms for a very, very long time. And I've taught across STEM. And teaching programming skills synchronously is awful because it excludes so many people. Invariably, there's going to be someone who's fast and someone who's slow. And that creates a lot of conflict in the synchronous experience.
But learning these skills asynchronously — if you understand the programming super quick? Awesome, fantastic. You can move faster through the content. If it takes you longer, that's fine too. You can sit back and spend some time thinking through those ideas.
I stand behind asynchronous learning so enthusiastically because I think it allows more people to engage with the content and especially with data science, which is a very good, very stable job. It allows more people from different backgrounds access than a synchronous classroom experience would.
That's not to say that I think we should close all the schools — that's not what I'm saying.
But I am saying that I think particularly [for] data science skills where there's so many different aspects of it, and people think about those things and acquire that skill at such different rates, [asynchronous learning] allows different people at different life points to have access to a data science career.