Data Preprocessing and EDA** and (Machine Learning & Deep Learning Models)** of Employee Attrition & Performance project:
This project analyzes employee data to uncover patterns behind attrition and predict performance scores using machine learning (Random Forest & Linear Regression) and deep learning (Neural Network). It includes data preprocessing, exploratory analysis, statistical testing, classification, regression, and model evaluation.
File: employee_data.csv
Shape: 100 rows × 8 columns
Columns:
EmployeeID: Unique identifierName: Employee nameAge: Employee ageDepartment: Department name (e.g., HR, Sales)Salary: Annual salary (numeric)YearsAtCompany: Total years spent at the companyPerformanceScore: Annual performance score (out of 100)Attrition: Whether the employee left the company (Yes/No)
- Missing Values: Filled numeric columns using column mean
- Duplicates: Removed duplicate records
- Inconsistent Entries: Cleaned and standardized
Departmentvalues - Descriptive Stats: Summary generated using
df.describe()
- Pairplot: To observe attrition patterns across features
- Correlation Heatmap: To assess feature relationships
- Boxplot: To detect outliers in
Age,Salary,YearsAtCompany - Attrition Probability by Department:
- Sales: 35.89%
- Engineering: 30.77%
- HR: 23.08%
- Marketing: 10.26%
- Bayesian Inference:
- P(Attrition | PerformanceScore) ≈ 0.395
- ANOVA (F-test):
- Significant difference in performance scores across departments (p-value ≈ 2.56e-12)
- Target:
Attrition - Accuracy: 70%
- Precision/Recall (Class 1): 57%
- Confusion Matrix: Visualized using Seaborn heatmap
- Target:
PerformanceScore - R² Score: 0.74
- Cross-Validated R² (5-fold): 0.73
- MAE: 0.36
- Visuals:
- Predicted vs Actual
- Residuals Plot
- Architecture:
- Dense(64, ReLU)
- Dense(32, ReLU)
- Dense(1, Linear)
- Loss: MSE
- Final Validation MAE: ~0.33
- Test MAE: ~0.49
- Performance vs Tenure: Performance generally increases with
YearsAtCompany - Performance by Department & Attrition: Insights into department-specific attrition-performance patterns
- Python Libraries:
pandas,numpy,matplotlib,seaborn,scikit-learn,tensorflow,keras - ML Models: Random Forest, Linear Regression
- DL Model: Keras Sequential Neural Network
- Hyperparameter tuning (Random Forest & Neural Net)
- Feature engineering with interaction terms
- Deployment using Streamlit or Flask
- SHAP/LIME for model explainability