Members: Eddie, Chase, Adam, Neel, Timothy, Jack, Harrison
Using all of the models we have used this semester, we analyze NBA player data from the 2024–2025 seasons to answer a set of research questions. We clean and transform the data, explore it with descriptive statistics and visualizations, and build multiple predictive models depending on the task. Finally, we deploy a Streamlit app to showcase our findings interactively.
- Can we accurately predict player salary, all-star nominations, and other accomplishment features?
- Can we classify whether a player will be an all-star using season statistics?
- Can we cluster players based on performance metrics and valuation to identify archetypes or undervalued players?
- Can we classify players into different salary tiers using per game performance metrics?
- Build a trade analysis model based on projected evaluated salaries and other evaluative metrics.
- Predict next season’s win/loss record based on current roster and player statistics.
- Multiple Linear Regression (Polynomial extensions optional)
- Logistic Regression
- K-Nearest Neighbors (KNN)
- K-Means Clustering
- Clustering players into performance archetypes
- Clustering by valuation to identify overvalued/undervalued players
- Principal Component Analysis (PCA)
- MLP Neural Network — Trade Analysis
- Page 1: README
- Page 2: Interactive data table
- Page 3: Exploratory Data Analysis (EDA)
- Page 4: Statistical model pages
- Create the Conda environment:
conda env create -f environment.yml - Activate Environment:
conda activate nba_ml_project - Run Data Processing Scripts:
python scrape_salaries.pyandpython get_clean_data.py - Run Streamlit App:
streamlit run nba_model_app.py
- NBA API: https://github.com/swar/nba_api
- ESPN Salary Data: https://www.espn.com/nba/salaries
- 2012–2023 NBA Stats.csv
Link to app dashboard: https://ml1project-nba.streamlit.app/
Link to presentation slides: https://docs.google.com/presentation/d/19ICyufQWbHA0C849dZXp_vIfJSb9GIW_mdxjlFuhz7A/edit?usp=sharing