Predicting NFL Team Records: A Data-Driven Approach
By Armaan Sahni, Carnegie Mellon Sports Analytics Club
As a lifelong Chicago Bears fan, the excitement of every offseason has often turned into disappointment once the season begins. This common cycle got me wondering: Is the best time to be a Bears fan during the offseason? I decided to explore whether a data-driven approach could help predict my team's performance, along with other NFL teams, for the 2024 season.
Predicting NFL team performance is no small task. Numerous factors—from historical records to unpredictable in-game events—affect the outcomes. In this project, I developed a model to forecast the regular-season wins for each team using both historical data and current season trends. The model prioritizes 2024 data, reflecting the most up-to-date team dynamics.
Data Collection and Feature Engineering
I gathered team data from Pro Football Focus (PFF) spanning the 2020–2024 seasons. The dataset includes:
Wins and Losses (Team Record)
Strength of Schedule
Points Scored and Allowed
Turnover Differential (Takeaways vs. Giveaways)
Total Penalty Yardage
Passing (Completions, Attempts, Yards, TD, INT)
Rushing (Attempts, Yards, TD)
Offense and Defense Metrics (e.g., 3rd Down Conversion %, Redzone TD Conversion %, Avg. Passer Rating Allowed, Sacks, Missed Tackles)
Special Teams Metrics (Field Goal %, Punt Inside 20 %)
Division Standings
Certain features were combined for simplicity, such as creating ratios (e.g., Passing TD:INT ratio) and differentials (e.g., 3rd Down Conversion % Differential). Additionally, I inverted some metrics, like Turnover Differential, to ensure that higher values corresponded to better performance.
Feature Importance and Model Development
After analyzing the data, several metrics emerged as highly correlated with wins. For example, 3rd Down Conversion % Differential and Point Differential were top contributors to winning records, while Total Penalty Yardage and Defensive INTs/Sacks were less significant.
I pre-processed the data using imputation to handle missing values and standardized the features. The key to the model was assigning greater weight to 2024, ensuring it captured the most current performance while balancing historical trends. Using a Random Forest Regressor, I trained the model with data from 2020–2022, tested it on 2023 data, and evaluated it with the Mean Absolute Error (MAE) metric. The model achieved an MAE of 1.198, meaning the predicted win total for each team could vary by about one game.
Results and Insights
Though I’m still refining the model, I’ve already generated predictions for 12 teams. The model’s error margin suggests reasonable accuracy, but I plan to expand the model to all 32 NFL teams soon.
Challenges and Future Work
One of the main challenges was balancing early 2024 season data with historical trends. Early-season performance can indicate team potential but also be misleading. To address this, I weighted the current season’s data more heavily, though future iterations will further refine this balance.
Another potential improvement is considering the impact of divisional games. Since teams play three divisional opponents twice each season, it’s unrealistic for every team in the same division to win a high number of games. I plan to enhance the model by incorporating more granular, team-specific factors such as:
Injury Reports
Coaching Changes
Trades and Free Agent Signings
Game-by-Game Forecasts
Divisional Matchups and Playoff Predictors
Incorporating these elements will likely improve the model’s predictions as the season unfolds.
Conclusion
After all, it seems like the Bears could be in line for a good season! I am excited for this year, and unless you are a Browns or Panthers fan you should have something to look forward to as well. The NFL season always brings surprises and great games on a weekly basis.
The NFL Record Prediction model shows the power of leveraging data analytics to forecast team performance with accuracy and insight. By combining the most recent season’s data with historical performance metrics, the model creates a dynamic, multi-layered perspective on team performance. This project highlights the promise of predictive modeling in sports, not only in terms of forecasting outcomes but also in understanding the underlying dynamics of team performance. As the NFL season continues, the model’s capacity to refine its predictions in realtime will make it increasingly adept at navigating the unpredictable nature of professional football, offering valuable insights into team trajectories and performance potential.
How do you ensure you aren’t double counting? For example, turnover diff., TD/Int ratio, and Def Ints share a lot of the same inputs.