Attrition Prediction End-to-End Project

Problem Statement

This project aims to predict employee attrition within a specified time frame using machine learning techniques. The model is designed to analyze historical employee data to foresee potential departures, allowing companies to proactively implement retention strategies. The project includes data preprocessing, model development, evaluation, and deployment stages, offering insights to reduce operational costs and enhance workforce stability.

The Project

End-To-End Machine Learning Project for Attrition Prediction. Built as a Python Package with an API endpoint for Prediction. The project is broken down into Components and Pipelines. A Classification model is trained using IBM's employee attrition data, and predictions are mdae on the FastAPI app.

Components

Reponsible for Model Training. They are as follows:

Data Ingestion -> Data Transformation -> Data Preprocessing -> Model Training -> Model Evaluation

Data Ingestion: Data is downloaded from GitHub in a zipfile, and CSV File is extracted
Data Transformation: Data is split into Train and Test sets, new files are created
Data Preprocessing: Preprocessor Pipeline is defined, preprocessor is fit on train data and saved
Model Training: RandomForestClassifier Model is trained using GridSearchCV, Best estimator is saved
Model Evaluation: Best Model is evaluated on Test set, and results are saved

Pipelines

Training Pipeline: All components for model training are executed here
Prediction Pipeline: Saved Preprocessor Object and Model are loaded here,
new data is preprocessed and predictions are returned

Web API - FastAPI

API endpoint is defined for Prediction.
infer/ takes new data and returns Prediction class and probability

Extras

Constants: Config file paths (yaml) are defined in this file
Utils: Utility Functions
Exception: A Custom Exception is setup to pinpoint errors within the project structure
Logger: Custom Logging for project logs

Future Work

Mlflow: Experiment & Model Tracking, versioning
Artifacts Store: Cloud storage integrated with Mlflow
DVC: Data Version Control for maintaining separate versions of Dataset
Scheduling: Automating training pipeline to support model retraining using Prefect

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
config		config
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Attrition Prediction End-to-End Project

Problem Statement

The Project

Components

Pipelines

Web API - FastAPI

Extras

Future Work

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Attrition Prediction End-to-End Project

Problem Statement

The Project

Components

Pipelines

Web API - FastAPI

Extras

Future Work

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages