Each NBA season, 30 groups compete for one thing just one will obtain: the legacy of a championship. From energy rankings to commerce deadline chaos and accidents, followers and analysts alike speculate endlessly about who will elevate the Larry O’Brien Trophy.
However what if we may transcend the new takes and predictions, and use information and Machine Studying to, on the finish of the common season, forecast the NBA Champion?
On this article, I’ll stroll by means of this course of — from gathering and making ready the information, to coaching and evaluating the mannequin, and at last utilizing it to make predictions for the upcoming 2024–25 Playoffs. Alongside the way in which, I’ll spotlight a few of the most shocking insights that emerged from the evaluation.
All of the code and information used can be found on GitHub.
Understanding the downside
Earlier than diving into mannequin coaching, an important step in any machine studying challenge is knowing the issue:
What query are we making an attempt to reply, and what information (and mannequin) may also help us get there?
On this case, the query is straightforward: Who’s going to be the NBA Champion?
A pure first thought is to border this as a classification downside: every crew in every season is labeled as both Champion or Not Champion.
However there’s a catch. There’s solely one champion per yr (clearly).
So if we pull information from the final 40 seasons, we’d have 40 optimistic examples… and tons of of unfavourable ones. That lack of optimistic samples makes it extraordinarily arduous for a mannequin to study significant patterns, specifically contemplating that profitable an NBA title is such a uncommon occasion that we merely don’t have sufficient historic information — we’re not working with 20,000 seasons. That shortage makes it extraordinarily tough for any classification mannequin to actually perceive what separates champions from the remaining.
We want a better option to body the issue.
To assist the mannequin perceive what makes a champion, it’s helpful to additionally train it what makes an nearly champion — and the way that differs from a crew that was knocked out within the first spherical. In different phrases, we would like the mannequin to study levels of success within the playoffs, slightly than a easy sure/no consequence.
This led me to the idea of Champion Share — the proportion of playoff wins a crew achieved out of the entire wanted to win the title.
From 2003 onward, it takes 16 wins to develop into a NBA Champion. Nonetheless, between 1984 and 2002, the primary spherical was a best-of-five collection, so throughout that interval the entire required was 15 wins.
A crew that loses within the first spherical may need 0 or 1 win (Champion Share = 1/16), whereas a crew that makes the Finals however loses may need 14 wins (Champion Share = 14/16). The Champion has a full share of 1.0.
This reframes the duty as a regression downside, the place the mannequin predicts a steady worth between 0 and 1 — representing how shut every crew got here to profitable all of it.
On this setup, the crew with the highest predicted worth is our mannequin’s decide for the NBA Champion.
This can be a related strategy to the MVP prediction from my earlier article.
Knowledge
Basketball — and the NBA specifically — is without doubt one of the most fun sports activities to work with in information science, due to the amount of freely accessible statistics. For this challenge, I gathered information from Basketball Reference utilizing my python package deal BRScraper, that permits quick access to the gamers’ and groups information. All information assortment was carried out in accordance with the web site’s tips and price limits.
The information used contains team-level statistics, ultimate common season standings (e.g., win share, seeding), in addition to player-level statistics for every crew (restricted to gamers who appeared in at the very least 30 video games) and historic playoff efficiency indicators.
Nonetheless, it’s necessary to be cautious when working with uncooked, absolute values. For instance, the common factors per sport (PPG) within the 2023–24 season was 114.2, whereas in 2000–01 it was 94.8 — a rise of practically 20%.
This is because of a collection of things, however the reality is that the sport has modified considerably through the years, and so have the metrics derived from it.

To account for this shift, the strategy right here avoids utilizing absolute statistics instantly, opting as an alternative for normalized, relative metrics. For instance:
- As a substitute of a crew’s PPG, you should use their rating in that season.
- As a substitute of counting what number of gamers common 20+ PPG, you may think about what number of are within the prime 10 in scoring, and so forth.
This allows the mannequin to seize relative dominance inside every period, making comparisons throughout many years extra significant and thus allowing the inclusion of older seasons to counterpoint the dataset.
Knowledge from the 1984 to 2024 seasons had been used to coach and take a look at the mannequin, totaling 40 seasons, with a complete of 70 variables.
Earlier than diving into the mannequin itself, some attention-grabbing patterns emerge from an exploratory evaluation when evaluating championship groups to all playoff groups as a complete:

Champions have a tendency to come back from the highest seeds and with increased profitable percentages, unsurprisingly. The crew with the worst common season report to win all of it on this interval was the 1994–95 Houston Rockets, led by Hakeem Olajuwon, ending 47–35 (.573) and coming into the playoffs as solely the tenth finest general crew (sixth within the West).
One other notable pattern is that champions are likely to have a barely increased common age, suggesting that have performs a vital function as soon as the playoffs start. The youngest championship crew within the database with a median of 26.6 years is the 1990–91 Chicago Bulls, and the oldest is the 1997–98 Chicago Bulls, with 31.2 years — the primary and final titles from the Michael Jordan dinasty.
Equally, groups with coaches who’ve been with the franchise longer additionally have a tendency to search out extra success within the postseason.
Modeling
The mannequin used was LightGBM, a tree-based algorithm well known as one of the crucial efficient strategies for tabular information, alongside others like XGBoost. A grid search was carried out to determine the most effective hyperparameters for this particular downside.
The mannequin efficiency was evaluated utilizing the foundation imply squared error (RMSE) and the coefficient of dedication (R²).
You could find the formulation and rationalization of every metric in my earlier MVP article.
The seasons used for coaching and testing had been randomly chosen, with the constraint of reserving the final three seasons for the take a look at set with the intention to higher assess the mannequin’s efficiency on newer information. Importantly, all groups had been included within the dataset — not simply people who certified for the playoffs — permitting the mannequin to study patterns with out counting on prior information of postseason qualification.
Outcomes
Right here we are able to see a comparability between the “distributions” of each the predictions and the actual values. Whereas it’s technically a histogram — since we’re coping with a regression downside — it nonetheless works as a visible distribution as a result of the goal values vary from 0 to 1. Moreover, we additionally show the distribution of the residual error for every prediction.

As we are able to see, the predictions and the actual values comply with an analogous sample, each concentrated close to zero — as most groups don’t obtain excessive playoff success. That is additional supported by the distribution of the residual errors, which is centered round zero and resembles a standard distribution. This implies that the mannequin is ready to seize and reproduce the underlying patterns current within the information.
By way of efficiency metrics, the most effective mannequin achieved an RMSE of 0.184 and an R² rating of 0.537 on the take a look at dataset.
An efficient strategy for visualizing the important thing variables influencing the mannequin’s predictions is thru SHAP Values, atechnique that gives an affordable rationalization of how every characteristic impacts the mannequin’s predictions.
Once more, a deeper rationalization about SHAP and methods to interpret its chart will be present in Predicting the NBA MVP with Machine Studying.

From the SHAP chart, a number of necessary insights emerge:
- Seed and W/L% rank among the many prime three most impactful options, highlighting the significance of crew efficiency within the common season.
- Workforce-level stats reminiscent of Internet Score (NRtg), Opponent Factors Per Sport (PA/G), Margin of Victory (MOV) and Adjusted Offensive Score (ORtg/A) additionally play a big function in shaping playoff success.
- On the participant aspect, superior metrics stand out: the variety of gamers within the prime 30 for Field Plus/Minus (BPM) and prime 3 for Win Shares per 48 Minutes (WS/48) are among the many most influential.
Apparently, the mannequin additionally captures broader traits — groups with a better common age are likely to carry out higher within the playoffs, and a robust exhibiting within the earlier postseason usually correlates with future success. Each patterns level once more to expertise as a helpful asset within the pursuit of a championship.
Let’s now take a better have a look at how the mannequin carried out in predicting the final three NBA champions:

The mannequin appropriately predicted two of the final three NBA champions. The one miss was in 2023, when it favored the Milwaukee Bucks. That season, Milwaukee had the most effective regular-season report at 58–24 (.707), however an damage to Giannis Antetokounmpo harm their playoff run. The Bucks had been eradicated 4–1 within the first spherical by the Miami Warmth, who went on to succeed in the Finals — a shocking and disappointing postseason exit for Milwaukee, who had claimed the championship simply two years earlier.
2025 Playoffs Predictions
For this upcoming 2025 playoffs, the mannequin is predicting the Boston Celtics to go back-to-back, with OKC and Cleveland shut behind.
Given their robust common season (61–21, 2nd seed within the East) and the truth that they’re the reigning champions, I are likely to agree. They mix present efficiency with current playoff success.
Nonetheless, as everyone knows, something can occur in sports activities — and we’ll solely get the actual reply by the tip of June.

Conclusions
This challenge demonstrates how machine studying will be utilized to complicated, dynamic environments like sports activities. Utilizing a dataset spanning 4 many years of basketball historical past, the mannequin was in a position to uncover significant patterns into what drives playoff success. Past prediction, instruments like SHAP allowed us to interpret the mannequin’s choices and higher perceive the components that contribute to postseason success.
One of many largest challenges on this downside is accounting for accidents. They’ll fully reshape the playoff panorama — significantly after they have an effect on star gamers throughout the playoffs or late within the common season. Ideally, we may incorporate damage histories and availability information to higher account for this. Sadly, constant and structured open information on this matter— particularly on the granularity wanted for modeling — is tough to come back by. Consequently, this stays one of many mannequin’s blind spots: it treats all groups at full energy, which is commonly not the case.
Whereas no mannequin can completely predict the chaos and unpredictability of sports activities, this evaluation reveals that data-driven approaches can get shut. Because the 2025 playoffs unfold, it is going to be thrilling to see how the predictions maintain up — and what surprises the sport nonetheless has in retailer.

I’m at all times accessible on my channels (LinkedIn and GitHub).
Thanks on your consideration!👏
Gabriel Speranza Pastorello