📚 MSc in Data Science and Advanced Analytics — Nova IMS
This project aims to build a Machine Learning model capable of predicting the Claim Injury Type for each case from the New York Workers’ Compensation Board (WCB), using real data from 2020–2022.
- Automate the WCB’s decision on injury compensation type.
- Benchmark multiple supervised classification algorithms.
- Select the most generalizable and explainable model.
- Deploy a Streamlit web app for real-time predictions.
- Exploratory Data Analysis (EDA) – outlier and missing value treatment, variable inspection.
- Feature Engineering – creation of temporal, categorical, and combined variables.
- Model Benchmarking – Decision Tree, Random Forest, Gradient Boosting, XGBoost, and CatBoost.
- Model Selection – GridSearchCV with Stratified Cross Validation (F1-macro).
- Deployment – Streamlit app (
app.py) for interactive use.
The Random Forest model was selected for its robustness, interpretability, and best F1 performance on the test set.
pip install -r requirements.txt
streamlit run app.py