"TO GRANT OR NOT TO GRANT: DECIDING ON COMPENSATION BENEFITS", a Machine Learning Project repository
This repository contains a machine learning project "TO GRANT OR NOT TO GRANT: DECIDING ON COMPENSATION BENEFITS" developed as part of the course in the Master’s program in Data Science and Advanced Analytics at Nova IMS in december 2024. The project use data provided by the teacher from the New York Workers’ Compensation Board (WCB), covering claims made between 2020 and 2022, and addresses the challenges faced by the WCB in processing and categorizing claims efficiently. The main objective of this project is to build and optimize machine learning models for automating the classification of injury types from claims.
- Multiclass Classification Benchmarking Develop a classification model to predict the Claim Injury Type for workers' compensation claims assembled between 2020 and 2022? which envolves the implementation of a model evaluation strategy and the identification of the model with the best generalization performance.
- Model Optimization Refine the selected models to improve their predictive performance through hyperparameter tuning and adjustments in preprocessing and feature selection.
- Additional insights: Creative exploration of the data.
- final_submission.csv contains the preprocessed dataset used for model training.
- Group_20_notebook.ipynb contains the full project analysis notebook, including data exploration, preprocessing, model comparison, feature importance, and results.
- Group_20_Report.pdf is the report detailing the methodology, steps taken, and conclusions on the project notebook.
- Clone the repository to your local machine.
- Install the required dependencies by running
pip install -r requirements.txt. - Open the Jupyter notebook
WCB_Predictions_Notebook.ipynbto explore the analysis and results. - Optionally, you can run the Python scripts in
src/to see the individual steps.
This project was developed by Group_20 as part of a machine learning course at Nova IMS. Team members:
- Duarte Nunes 20240564
- Mariana Gomes 20211689
- Pedro Gaspar 20240112
- Rodrigo Nascimento 20240565
- Yasmine Boubezari 20230775
https://www.kaggle.com/competitions/to-grant-or-not-to-grant/