In this dataset i selected a complex regression dataset and tried to increase accuracy as much as possible at starting i use ANN and using KerasTuner on ANN, then i used Supervised ML using AutoML
There were 17 model tried using ANN and AutoML and the best model was of supervised ML using
H2o for AutoML and the best model was StackedEnsemble, ANN was close by
-
After importing the data i observed that there were too many categorical columns so i decided to keep those columns only which have less no. of category, but almost all of the columns were having proper no. of category.
-
After then i did label encoding on all the categorical columns
-
Then i performed some feature selection
- I used
VarianceThresholdandPearson Correlationfor removing all the similar columns but i did not found it very promising so i dropped these techiques - I used
mutual_info_selectionandchi square testto remove most important features and took a intersection of columns of both the techniques and there were 24 columns in common which were selected
- I used
-
I performed normalization on the dataset
-
Created a ANN architecture and custom tuned it's hyperparameters (Model 0)
-
Used Keras tuner for tuning the ANN architecture by
- Selecting no of layers
- Selecting no of neurons per layer
- Selecting the optimizer
- All in one tuner
-
Used more techniques to decrease the overfitting
- Early Stopping
- Data scaling
- Regularization
- Weight Initialization
- Batch Normalization
- Dropout