This is a flask web application that predicts developer salaries containerized using docker and deployed on AWS Elastic beanstalk.
The code requires python versions of 3.10 as well as other python packages. To install libraries use
pip install -r requirements.txt
In my previous job as a human capital analyst in a recruiting firm, I often encountered talents who were not quite sure of their salary expectations when considering new roles. I wondered if it were possible to create a model that predicted what a developer would earn based on some personal info and their tech stack.
This motivated and birthed this project. Stackoverflow yearly survey data for 2021 was utilized to create the model. A deep learning model was built using Tensorflow keras, packaged using a flask REST API alongside an accompanying frontend built using HTML/CSS and Bootstrap. The code was containerized using docker in order to create a microservice application that ran regardless of environment and then deployed on AWS Elasticbeanstalk, AWS' PAAS option for easy deployment of code.
Data utilized was gotten from Stackoverflow's yearly developer survey for the year 2021. It comes alongside a schema. All of which are available in the data folder in the project directory.
The project contains:
- a
datafolder that holds the data files used for the exploratory data analysis and model building - a
model_buildingfolder that contains three files; anEDA.ipynbfile where EDA was carried out, amodel.ipynbfile where feature selection, engineering and model building was carried out and avariables.pyfile where feature options were stored - an
app.pyfile where the flask app was built - a
templatesfolder that hold the html files - a
staticfolder that holds the css files - a
dockerfilethat holds the docker image used to create the docker container - a
model.h5file which is the saved keras model - a
preprocessingjoblibwhich is a preprocessing pipeline - a
requirements.txtfile which holds the requirements used to create the docker image.
- The greatest limitation to the project was the fact that not all countries could be represented because not all sample collected were a good representation of all countries in the world to be considered for model building. Ultimately, only
United States of America,
India,
Germany,
United Kingdom of Great Britain and Northern Ireland,
Canada,
France,
Brazil,
Poland,
Netherlands,
Italy,
Spain,
Russian Federation,
Australiawere considered.