Back to all blogs
How to Analyze Basketball Player Performance and Predict MVP Awards
Tutorial

How to Analyze Basketball Player Performance and Predict MVP Awards

March 23, 2025
5 min read

Using advanced formulas

Basketball analytics has evolved significantly in recent years, offering deeper insights into player performance beyond traditional statistics. In this blog, I'll walk through how I reproduced a research paper to predict NBA Most Valuable Player (MVP) awards using the AI data science tool, PlotsALot.

The Challenge

Predicting MVP winners has traditionally been challenging, with analysts relying on subjective criteria and basic statistics. But what if we could create a more objective approach using advanced metrics? That's what Sarlis and Tjortjis attempted in their paper "Sports Analytics — Evaluation of Basketball Players and Team Performance," and what I've set out to reproduce.

Data and Methodology

I used data from three NBA seasons (2017-18, 2018-19, and 2019-20), sourced from NBA.com, basketball-reference.com, and ESPN. The analysis focuses on 20 NBA players who:

  • Participated in at least 30 games per season
  • Averaged at least 15 minutes per game
  • Received nominations in different statistical categories

Following the original research, I analyzed data by splitting each season into quarters (Q1, Q1-Q2, Q1-Q3, and full season Q1-Q4) to track how predictions evolve throughout the season.

The Formula Approach

Instead of using complex machine learning models, the research developed two straightforward formulas:

Aggregated Performance Indicator (API)

This formula combines 30 different performance metrics to predict MVP winners:

API = \frac{RPM + PER + PIE + 4Factors + NETRTG + EFF + PIR + Tendex + BPM 
            + PIPM + GmSc + FP + WS/48 + TeamELO + EFG\% + TS\% + VORP + WinsRPM 
            + WAR + EWA + Deflections + PACE + USG\% + AST/TO + ScreenAssistsPTS 
            + PRA + REB\% + LooseBallsRecovered + PPP + ASTRatio}{30}

Each metric is first normalized to a 0-100 scale to allow for fair comparison between different statistical measures.

Defensive Performance Indicator (DPI)

For predicting Defensive Player of the Year:

DPI = BLK - BLKA + PFD - PF + STL + Deflections 
      + LooseBallsRec - TOV + ScreenAssistsPTS + AST/TO

Data Analysis

Before applying the formulas, I explored the dataset to understand the distribution and relationships between different metrics.

Which players have the highest average values for each advanced metric?

Highest Average values

Highest Average values

This analysis reveals that certain players consistently rank at the top across multiple metrics, which aligns with our intuition about who the best players are.

Correlation between metrics

Since we have too many values, a heatmap is not the best option, instead, I reduced the dimensionality of the data using a method called Principal COmponent Analysis, which transforms the correlation matrix of the data into lower dimensions.

Dimensionality Reduction with PCA

PCA visualization

PCA visualization

Principal Component Analysis helps visualize how players cluster based on their statistical profiles. We clearly see a separation in the data, indicating that MVP candidates clearly separate from other players in this reduced dimension space.

Player Clustering with K-means

K-means clustering visualization

K-means clustering visualization

K-means clustering reveals natural groupings of players based on their statistical profiles. Interestingly, MVP candidates tend to form their own distinct clusters.

For example, here's the cluster of Giannis, belonging to cluster 0 that is characterized by high in VA, FP and PIR.

Giannis cluster visualization

Giannis cluster visualization

MVP Prediction Results

API Scores Across Seasons and Quarters

Using the API formula, I calculated scores for each player across different seasons and quarters.

API score

API score

The API formula correctly identified James Harden as the 2017-18 MVP, showing the highest API score throughout the season.

Giannis Antetokounmpo was correctly predicted as the 2018-19 MVP with a 76.5% API score.

For the 2019-20 season, using data from Q1 to Q3, the formula predicted Giannis Antetokounmpo to win with a 77.8% API score, significantly ahead of James Harden at 67.4%.

Defensive Player Analysis

DPI Scores Across Seasons

DPI score

DPI score

Calculating the DPI formula across seasons reveals which players consistently demonstrate elite defensive impact.

Which defensive metrics contribute most to the DPI scores of the winners?

DPI score contribution

DPI score contribution

My analysis of the DPI components revealed that Personal Fouls Drawn (PFD), Screen Assists, and Deflections are the most influential metrics in determining the DPI scores of winners.

This finding is particularly interesting as these "hustle stats" are often overlooked in traditional defensive evaluations but clearly play a crucial role in identifying elite defenders.

Key Insights

My reproduction of the original research confirms several important findings:

  1. The API formula successfully predicted the MVPs for the seasons analyzed
  2. Performance consistency throughout the season is crucial
  3. Defensive contributions matter more than traditionally recognized
  4. The formula works with current season data only, making it practical for in-season analysis

Conclusion

By reproducing Sarlis and Tjortjis's research, I've confirmed that combining advanced analytics into comprehensive formulas like API and DPI allows us to predict major NBA awards with impressive accuracy. This approach offers fans, analysts, and teams a more objective way to evaluate player performance and identify truly exceptional seasons.

Want to explore NBA player metrics yourself and predict the next MVP?

Launch AI data analytics tool

Access the data I used to reproduce this research and make your own predictions!

PlotsALot Analytics Screenshot

PlotsALot Analytics Screenshot

Your AI for analysing data

AI specifically trained for data analysis and visualization

Chat with Plotsalot