The goal is to predict the math score of a student based on other given values.
There are 7 different input variables:
- Gender
- Race/Ethnicity
- Parental Education
- Lunch
- test preparation course
- Reading Score
- Writing Score
Target Variable: 2. Math Score
Data Ingestion :
In Data Ingestion phase the data is first read as csv. Then the data is split into training and testing and saved as csv file.
Data Transformation :
In this phase a ColumnTransformer Pipeline is created. For Numeric Variables first SimpleImputer is applied with strategy median , then Standard Scaling is performed on numeric data. For Categorical Variables SimpleImputer is applied with most frequent strategy, then ordinal encoding performed , after this data is scaled with Standard Scaler.
This preprocessor is saved as pickle file.
Model Training :
In this phase base model is tested . The best model found was Linear regressor by comparing with other algorithms and hyperparameter tuning.
This model is saved as pickle file.
Prediction Pipeline :
This pipeline converts given data into dataframe and has various functions to load pickle files and predict the final results in python.
Flask App creation :
Flask app is created with User Interface to predict the Math score of a student inside a Web Application.
changed app.py to application.py for AWS deployment
http://studentexamperformance-env.eba-gyfszm4z.ap-northeast-2.elasticbeanstalk.com/predictdata

