-
Use Stack Overflow Annual Developer Survey to train model. Download the dataset at here
-
For Salary prediction purpose, use 9 features
RemoteWork,EdLevel,YearsCodePro,DevType,LanguageHaveWorkedWith,PlatformHaveWorkedWith,ToolsTechHaveWorkedWith,Country,Ageto predictConvertedCompYearlyas Salary. -
Data preprocessing: preprocess.py
-
Data visualization: visual.ipynb
-
Ultilize Ensemble Method to train model.
-
Experiment with Decision Tree, RandomForest, Bagging, AdaBoost, GradientBoosting.
-
Detail training and testing process in main.ipynb
-
Apply GridSearchCV found best hyperparameter for
Gradient Boosting.
| Metrics | Values |
|---|---|
| RMSE | 37068.786 |
| MAE | 25121.021 |
| R2-score | 0.619 |
- Install dependencies
pip install -r requirements.txt
- Download dataset & extract zip file
wget <link-to-data>
unzip stack-overflow-developer-survey-2022.zip -d data
- Run streamlit web app
I build a streamlit app to easily view the data and predict the salary.
Run code:
streamlit run web.py

