This project aims to predict employee attrition within a specified time frame using machine learning techniques. The model is designed to analyze historical employee data to foresee potential departures, allowing companies to proactively implement retention strategies. The project includes data preprocessing, model development, evaluation, and deployment stages, offering insights to reduce operational costs and enhance workforce stability.
End-To-End Machine Learning Project for Attrition Prediction. Built as a Python Package with an API endpoint for Prediction. The project is broken down into Components and Pipelines. A Classification model is trained using IBM's employee attrition data, and predictions are mdae on the FastAPI app.
Reponsible for Model Training. They are as follows:
Data Ingestion -> Data Transformation -> Data Preprocessing -> Model Training -> Model Evaluation
Data Ingestion: Data is downloaded from GitHub in a zipfile, and CSV File is extracted
Data Transformation: Data is split into Train and Test sets, new files are created
Data Preprocessing: Preprocessor Pipeline is defined, preprocessor is fit on train data and saved
Model Training: RandomForestClassifier Model is trained using GridSearchCV, Best estimator is saved
Model Evaluation: Best Model is evaluated on Test set, and results are saved
- Training Pipeline: All components for model training are executed here
- Prediction Pipeline:
Saved Preprocessor Object and Model are loaded here,
new data is preprocessed and predictions are returned
API endpoint is defined for Prediction.
infer/ takes new data and returns Prediction class and probability
- Constants: Config file paths (yaml) are defined in this file
- Utils: Utility Functions
- Exception: A Custom Exception is setup to pinpoint errors within the project structure
- Logger: Custom Logging for project logs
- Mlflow: Experiment & Model Tracking, versioning
- Artifacts Store: Cloud storage integrated with Mlflow
- DVC: Data Version Control for maintaining separate versions of Dataset
- Scheduling: Automating training pipeline to support model retraining using Prefect