This repository focuses on predicting employee attrition within an organization using advanced machine learning techniques. By analyzing employee data, the project aims to provide actionable insights to improve retention strategies and enhance workplace satisfaction.
Employee attrition is a critical challenge for organizations, leading to increased costs and disruptions. Understanding the factors driving attrition enables businesses to design strategies to retain talent effectively.
- Predict whether an employee is likely to leave the organization.
- Identify the key factors contributing to employee attrition.
- Provide actionable recommendations to improve retention.
-
Data Analysis:
- Explored trends in employee demographics, job satisfaction, and performance metrics.
- Visualized correlations between features such as salary hikes, job level, and attrition rates.
-
Preprocessing Pipeline:
- Handled missing values and outliers in the dataset.
- Encoded categorical variables such as department and job role.
- Scaled numerical features for better model performance.
-
Modeling:
- Implemented classification models like Decision Trees, Random Forest, and Logistic Regression.
- Fine-tuned hyperparameters to maximize prediction accuracy.
- Achieved an accuracy of 90% using Random Forest, outperforming other models.
-
Evaluation:
- Used metrics like precision, recall, F1-score, and ROC-AUC to assess model effectiveness.
-
What did I learn?
- Employees with low job satisfaction and high overtime hours are more likely to leave.
- Significant predictors include job role, work-life balance, and salary hikes.
- Gained experience in applying machine learning to workforce analytics.
-
What did I try out?
- Explored multiple algorithms, including Logistic Regression and ensemble methods.
- Conducted feature importance analysis to identify key drivers of attrition.
- Experimented with SMOTE to address class imbalance.
-
What worked and why?
- Random Forest provided the best results due to its ability to handle complex feature interactions.
- Preprocessing steps like scaling and encoding ensured consistent data inputs.
- Hyperparameter tuning improved model performance and generalization.
-
Recommendations for the Business Counterpart:
- Focus on improving work-life balance and addressing high overtime for at-risk employees.
- Provide targeted career development opportunities to employees in critical roles.
- Regularly monitor workforce data and update predictive models to adapt to changing trends.
The dataset contains the following key attributes:
- Age: Employee age.
- JobSatisfaction: Level of job satisfaction.
- WorkLifeBalance: Work-life balance rating.
- MonthlyIncome: Monthly salary.
- OverTime: Whether the employee works overtime (Yes/No).
- YearsAtCompany: Number of years with the company.
- Attrition: Target variable (Yes = Left, No = Stayed).
-
Clone the repository:
git clone https://github.com/username/hr-attrition-prediction.git
-
Navigate to the project directory:
cd hr-attrition-prediction -
Install the required dependencies:
pip install -r requirements.txt
-
Run the Jupyter Notebook to explore the data and train models:
jupyter notebook HR_Attrition_Prediction.ipynb
-
Evaluate the trained model:
from sklearn.metrics import classification_report y_pred = model.predict(X_test) print(classification_report(y_test, y_pred))
- Accuracy: 90%
- Key Predictors: Job Role, Work-Life Balance, Overtime
- Impact: Enhanced ability to identify at-risk employees and design targeted retention strategies.
- Focus on improving work-life balance and reducing overtime for at-risk employees.
- Offer tailored career development plans for employees in critical job roles.
- Regularly update workforce data and predictive models to maintain accuracy.
Contributions are welcome! Please follow these steps:
- Fork the repository.
- Create a new branch:
git checkout -b feature-branch
- Commit your changes:
git commit -m "Add feature" - Push to the branch:
git push origin feature-branch
- Open a pull request.
This project is licensed under the MIT License.
- The dataset is sourced from IBM HR Analytics.
- Thanks to the data science community for valuable resources and inspiration.
Thank you for exploring this project! If you have any questions or suggestions, feel free to reach out.