NBA analytics: Going data pro

For the NBA, like every other sports league, awards are important. They can generate attention, spur debate, make money, and involve fans, players, and experts, among others. Is there data science and analytics behind them -- can there or should there be? We picked the NBA Most Improved Player award as an example to analyze some aspects of data-driven culture.
Written by George Anadiotis, Contributor

Forget Moneyball. How about defining metrics for the NBA Awards? How hard could that be? (Image: NBA)

The NBA is announcing its yearly awards today. This is a much-anticipated event that has been talked about and analyzed extensively on sports media and beyond. Predictions and arguments on who should be nominated and who should win each award have been going on almost since the beginning of the season.

Keeping fans engaged is good, but there are more aspects to awards like these: They can give the media something to talk about, boost player and team statuses, and anyone can bet on the results.

Being part of pop culture, and having the potential to make or break careers and fortunes means there's more to the NBA Awards that meets the eye. Let's try and peek behind the looking glass and use data science and analytics to answer a question on many NBA fans minds: Who was the most improved player (MIP) in the NBA this season?

Define 'improved'

To begin with, who gets to define improved, and how? As one NBA writer once put it: "There are few things more frustrating than trying to determine what it means to be the MIP". On the other hand, that makes it interesting and open to interpretation. Since the NBA does not say much about its criteria and evaluation method, others have tried to come up with their own.

The traditional way NBA writers do this is by assembling a panel of experts and getting them to weigh in. Averaging expert opinions may be more on the objective side than just getting one opinion, but it still does not count as data-driven research in most data analysts' books.

Adam Fromal from Bleacher Report argued that the MIP is "some years handed to a player who maintained his level of play (or even regressed slightly) while filling a much bigger role. Other times, the league rewards a contributor who made noticeable strides on both ends of the floor and did legitimately improve on a per-minute basis. Stars can win for reaching a new level, though the award often goes to a low- or mid-level rotation member who made the jump to legitimacy. Here, we're accounting for everything by remaining entirely objective."

That's a strong claim there. Here's what Fromal did and what we can learn from this.

Fromal's methodology was based on weeding out players who improved for no reason other than newfound opportunity, and grading players by how much they improved in two different overarching statistics. Fromal wanted to reward both players who get better on a per-minute basis and those who stagnate while filling bigger roles.

Fromal presented his analysis in what he called "a countdown that intentionally eschews subjectivity." That has not always been well received by everyone. Fromal has received anything from profanity to accusations of bias, and he has also been hilariously plagiarized, as the copycat misinterpreted his results. But how well did Fromal do?

Fromal's top three includes two out the three players nominated by the NBA for the MIP -- Giannis Antetokounmpo on No. 2 and Rudy Gobert on No. 3. His No. 1 was Myles Turner, a player overlooked by pretty much anyone else. Fromal missed Denver's Nicola Jokic, who for most analysts and fans was an obvious contender.

This may give Fromal some objectivity credit, as he is a Denver resident, but begs the question as to where did the NBA and data-driven analysis part ways. The answer perhaps lies in what Fromal himself notes: Sophomores (like Turner) are typically expected to improve. In other analyses, sophomores are excluded from the MIP discussion.

Still, how can Jokic not be in that list? Is it Fromal that's missing something obvious, or the NBA that has its own way of thinking? Perhaps, more importantly, should it? Does the NBA see something Fromal's analysis does not, or do people there make their choices not entirely based on data-driven criteria and methods?

Data, meet eyes

Fromal is a professional NBA writer, and although he does not have a formal background on data analysis, he seems to be doing a lot of that for his work. Jay Spanbauer on the other hand is not a pro by any means -- just a Bucks fan who began looking at the game in a different way with the influx of math and data in the NBA. But Spanbauer's data-driven analysis succeeded where Fromal failed.

Both analyses were done before the NBA announced MIP nominees, but Spanbauer's preceded Fromal's by a month and narrowed down the MIP battle between Jokic and Antetokounmpo. Not only that, but he also pointed to a difference between them that may also lead the NBA to give Antetokounmpo the MIP award in a race that seems mostly between those two: Defense.

Spanbauer used a metric called Defensive Win Shares to show the biggest difference between the two. He pointed out that despite defensive ability being difficult to calculate accurately, it can be seen that Jokic sits below the league average, while Antetokounmpo is over two-and-a-half times higher. Maybe it's obvious now, but nobody else seems to have used data to nail this when Spanbauer did.


It may seem obvious now, but not many people thought about comparing MIP candidates based on their data and visualizing this for others to see. (Image: Jay Spanbauer)

That is a clearly defined metric and difference, but then why focus on these two players in the first place? Unlike Fromal, Spanbauer went with a combination of instinct and data:

"I think ultimately data should be used to 'check' what our eyes see. Anyone who watched the NBA this year saw the amazing leap Giannis Antetokounmpo made. A closer look at his numbers confirm this.

With nonstop coverage, blogs, Twitter, etc, there is enough information and enough discourse for a group of nominees to be fairly decided. I still have confidence in the traditional way nominees are selected, especially for an award as 'open-ended' or 'fluid' as the MIP. Case-in-point: The clear nominees for MIP in 2017 are Antetokounmpo and Jokic. Looking at numbers and crunching data would likely bring you to the same result."

Except it did not -- at least not using Fromal's metrics and data. Which brings us to a key point: Even when something is based on data and has clear definitions, that does not make it a God-given truth. Data makes supporting a point of view more credible, and it can also allow discovering patterns that may be otherwise hard to spot. But data-driven does not necessarily equal indisputable.

Antetokounmpo has a back story worthy of Hollywood, is infinitely ambitious and yet keeps his feet on the ground (when not flying above the rim), is a fan and media darling, has been improving dramatically and has entered super-star status. You could say that was perhaps the most obvious choice possible, but going by numbers alone it would have been Myles Turner for MIP.

The problem with data-driven decision making

We already mentioned the "no sophomores for MIP" rule used by some NBA analysts. If the NBA would have gone for that, Turner would be justifiably not be a MIP nominee, but neither would be Jokic. So, if Turner's number's are better than Jokic's, what is the NBA's reasoning there?

That, or going with Myles Turner for MIP, is the kind of thing that can heat up debates. It can also serve to point out a couple of facts about data-driven decision making.

Coming up with the "right" criteria is hard and ad-hoc. So maybe the criteria for MIP should come down to what Fromal used. And maybe sophomores should be excluded, except in some cases. But then what would those cases be? What about players who make a comeback after a bad year? Or nodding to a player that could use the encouragement, or a market that the league wants to grow?

Whether any of the above are legitimate criteria -- or if and how they are considered by the NBA -- is open to interpretation. Sometimes such overall organizational goals and drivers are clear, sometimes they are not. But let us not forget: Organization executives have a huge influence on these, regardless of whether data is used to capture and evaluate them.

Going from criteria to metrics is hard and ad-hoc. Let's suppose that someone has somehow narrowed down the MIP criteria and written them in stone. What is the metric that best expresses each one? And how should they be combined with each other to derive an overall score?

Even the most widely used metrics were derived by someone at some point and entail their creator's bias and shortcomings -- perceived or otherwise. In the case of basketball, probably the most widely known metric is the PER. Whether that is the best overall metric to capture a player's ability and influence in the game is being debated.

There are more metrics, too, which are constantly evolving, and most of them require some degree of expertise in both the domain (basketball) and the techniques (data science) to be able to fully comprehend and evaluate.


DataOps is the culture and practice of using data and analytics to drive decision making in organizations. But it is not infallible. (Image: Qubole)

Having the right data for the job is hard and not a given. Some data used today to derive information about NBA players' defensive ability, such as steals and blocks, were not recorded until the 70s. This reflects not just the rising importance of data everywhere, but also the evolution of the domain itself.

When the importance of defense in the game of basketball got more recognition, that data found its place. Gradually, more and more data is being added to the NBA arsenal, including visual and spatio-temporal data, hustle statistics, and social media content.

The process there is two-way. Sometimes someone will come up with an idea to quantify something for which there is no data, and sometimes the option to have some data available can be used in unforeseen ways.

Working with five year olds is hard, period. Perhaps unsurprisingly, not everyone that cares about the NBA gets or cares about data and analytics. MIP nominees have not expressed any sentiment toward such analyses, and not many fans seem to be out there doing what Spanbauer did.

Some might say fans and players are more like five year olds anyway, but the truth is if things are not simple enough that a five year old would get, NBA analytics will be at the state all other analytics are right now: Something a few experts and some enthusiasts can use, some others have heard of and can maybe follow, and for most remains mumbo-jumbo.

Like all analytics applications, to apply NBA analytics, the right data sources need to be found, data has to be processed and integrated, domain knowledge applied, analysis done, and results visualized and explained.

So, should the NBA be more transparent about the criteria for its awards? And what would be the result of doing this? Could it make everything deterministic, taking the fun -- and money -- out of it?

Going data pro

Spanbauer is not the first non-pro to engage with NBA data analysis. There is an array of NBA analytics enthusiasts, and a number of people who work professionally in the area. And the borders between the two are not always clear, as the Seth Partnow story shows. Partnow is an ex-blogger turned analyst who now works with the Bucks. John Hollinger, the person who introduced the PER, now works for the Grizzlies.

But what are people using NBA data and analytics for? It depends on who they are, what they are after, and what tools they have available. What you can do with bedroom analyses will only take you so far. For some things, high school math + spreadsheet/internet + casual fan knowledge + a few hours will do. For others, it's probably more like a PhD + IBM Watson + basketball guru status + a few months.

We all know the film Moneyball, and many basketball fans are familiar with Kawhi Leonard's progress through analytics. We also know how top teams in all sports are gradually becoming data driven, and we've seen IBM's Watson being touted as the tool to help NBA teams. Some of us heard about the fallacy of the Hot Hand fallacy.

For teams, the first priority is to analyze the game of their own players and that of their opponents in order to improve and counter it respectively. Scouting for new recruits is also of importance, and at the end of the day, it all comes down to winning more games, which also qualifies for making more profit.

NBA teams seem to be using analytics applications across the spectrum: From understanding what has happened and why, to predicting what will happen, and to making it happen -- descriptive, diagnostic, predictive, and prescriptive analytics.


Data in all shapes and sizes used everywhere, and the NBA could be no exception.

For betting enthusiasts, it's not so much about the game itself, but mostly about trying to make the right prediction that will turn them to winners. For fans like Spanbauer, it's mostly about getting more insight in the game. As a representative of a data-driven culture trickling down, his views are interesting:

"It's difficult to ignore the role and affect advanced metrics and statistics have had in basketball - as well as other sports. While analytics isn't the only way to analyze the game, I like to think of it as another lens through which to look.

I wouldn't say that analytics is all about predictions. Or even results, to be honest. It's about altering the traditional mindset of organizations, and the evolution of the front office. You see more money being spent on research, and more jobs opening up in the field of analytics.

At the end of the day, numbers are just numbers. A lot of attention is paid to them - sometimes more than necessary. The human element of the game cannot and should not be ignored. There are still intangibles that we have not been able to measure, and perhaps will not be able to measure.

That said, we should still continue to seek answers by utilizing as much data as we can. The more numbers and information fed into any model will provide more accurate results.

I don't necessarily believe awards should have some sort of criteria. Awards are for the fans, and part of the fun of the awards is debating amongst other fans your opinion of who should or should not win. However, with award selections dictating salaries and benefits like the designated-player exception, some consideration into criteria should certainly be given.

There is enough information out there to justify spending money and using analytics to measure many different areas. Owners and general managers are on their own to decide whether or not they want to trust themselves, or the numbers."

Other stories:

Editorial standards