New York City Taxi Fare Prediction

This project trains a machine learning model to predict taxi fares in New York City using a dataset from Kaggle. This project is a learning project, the competition already ended.

Dataset

The dataset is sourced from the Kaggle competition: New York City Taxi Fare Prediction. It contains:

Comprehensive training and test data involving location and fare information.
Date and time of taxi trips.

Project Features

Data Preprocessing:

Sampled 10% of the training data to reduce runtime.
Addressed missing values and outliers.
Engineered features like trip distance, pickup/dropoff landmarks, and datetime components.

Exploratory Data Analysis (EDA):

Identified data distributions, ranges, and outliers.
Observed that latitude and longitude values had some errors in the dataset.

Model Development:

Implemented baseline models (e.g., Mean Regressor).
Experimented with multiple algorithms:
- Linear Regression
- Ridge Regression
- Lasso
- Random Forest
- Elastic Net
Compared model performances based on RMSE and selected the best-performing model for final tuning.

Hyperparameter Tuning:

Utilized grid search and manual tuning to optimize model parameters.

Installation

Ensure Python and pip are installed. Install the required libraries using:

pip install pandas numpy scikit-learn xgboost matplotlib opendatasets

Usage

Download and load the data: Use the opendatasets library to download data directly from Kaggle.
Run preprocessing and feature engineering scripts: Prepare the data by cleaning and creating new features necessary for the models.
Model training and evaluation: Train various models and evaluate them to select the best one.
Hyperparameter tuning: Fine-tune the chosen model to improve accuracy.
Predict and generate submission file: Predict taxi fares for the test dataset and generate a submission file for Kaggle.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
GCP-Coupons-Instructions.rtf		GCP-Coupons-Instructions.rtf
README.md		README.md
ersyidanOneDriveDesktopALMLproject		ersyidanOneDriveDesktopALMLproject
linear_model_submission		linear_model_submission
sample_submission.csv		sample_submission.csv
taxi.py		taxi.py
test.csv		test.csv
xgb_tuned.csv		xgb_tuned.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

New York City Taxi Fare Prediction

Dataset

Project Features

Data Preprocessing:

Exploratory Data Analysis (EDA):

Model Development:

Hyperparameter Tuning:

Installation

Usage

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

yidan233/taxi_data_analyze

Folders and files

Latest commit

History

Repository files navigation

New York City Taxi Fare Prediction

Dataset

Project Features

Data Preprocessing:

Exploratory Data Analysis (EDA):

Model Development:

Hyperparameter Tuning:

Installation

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages