Technology stack:
Python
Keras - TensorFlow
Matplotlib
Numpy
sklearn
The dataset for this project is hosted by Kaggle.
https://www.kaggle.com/c/new-york-city-taxi-fare-prediction/data
It has features like pickup_datetime, pickup_longitude, pickup_latitude, dropoff_longitude, dropoff_latitude and passenger_count.
1. visualization.ipynb:
Plots pick up locations in New York.
2. utils.ipynb:
Contains preprocessing which removes outliers, missing values, replaced certain outliers with mode and feature engineering.
3. main.ipynb:
Used a Keras model with TensorFlow backend with 5 hidden layers: 1st layer - 128 hidden units with relu activation, 2nd layer - 64 hidden units with relu activation, 3rd layer - 32 hidden units with relu activation, 4th layer - 8 hidden units with relu activation and 5th layer - 1 output unit.
Used Mean squred error as the loss function and adam optimizer with 5 epoches of training.
Root Mean Squared Error:
Train RMSE: 3.34
Test RMSE: 3.29
Prediction on test data: