This project aims to predict the total taxi fare in Chicago using features like trip duration, distance, tip amount, payment type, and taxi company. The dataset is taken from the City of Chicago Taxi Trips dataset.
The dataset contains records of individual taxi rides in Chicago.
Used columns:
Trip MilesTrip SecondsTipsPayment TypeCompanyTrip Total(Target)
Build a Linear Regression model to predict the Trip Total (fare) for a given trip using selected features.
- Python 🐍
- pandas, NumPy
- scikit-learn
- matplotlib
Used pandas to load and inspect the dataset.
Selected only the most relevant columns and removed rows with missing values.
- Used
OneHotEncoderto encode categorical columns (Payment Type,Company) - Built a
Pipelineto apply preprocessing + Linear Regression
- Split the data into training and test sets (80/20)
- Trained a Linear Regression model
- Evaluated using:
- Mean Absolute Error (MAE)
- R² Score
Plotted Actual vs Predicted Fare to visualize performance.