"Chetenkoe Taxi" company collects historical data on taxi orders at airports. To attract more drivers during peak hours, it's essential to forecast the number of taxi orders for the next hour. This project develops a time series forecasting model to predict taxi demand with high accuracy.
Build a model that accurately predicts hourly taxi order demand with:
- RMSE β€ 48 on the test set
- Ability to forecast demand for multiple future hours
- Timely identification of peak demand periods to optimize driver allocation
- Source:
taxi.csv- historical taxi order data - Time period: March 1, 2018 to August 31, 2018 (spring-summer season)
- Original frequency: Every 10 minutes
- Resampled frequency: Hourly (after processing)
- Key feature:
num_orders- number of taxi orders
The dataset contains 26,496 entries with no missing values or anomalies. The time series shows:
- Mild upward trend
- Daily seasonality patterns
- No yearly seasonality (limited to spring-summer period)
- Resampled original 10-minute data to hourly frequency
- Conducted stationarity analysis using Augmented Dickey-Fuller test (confirmed stationarity)
- Created time-based features:
- Hour of day
- Day of week
- Weekend indicator
- Lag features (up to 248 hours)
- Rolling mean (window sizes: 19, 24, 29 hours)
Implemented custom time series forecasting framework with:
- TimeSeriesSplit cross-validation
- Feature engineering pipeline
- Two candidate models with hyperparameter optimization:
- Ridge Regression (L2-regularized linear model)
- LGBMRegressor (LightGBM gradient boosting)
- Tested multiple combinations of lag features and window sizes
- Evaluated models using RMSE metric
- Selected best model based on cross-validation performance
Best Model: Ridge Regression with optimal parameters
Performance Metrics:
- Cross-validation RMSE: 25
- Test set RMSE: 33.79 (well below the 48 threshold)
- Successfully captures daily demand patterns and peak hours
Practical Application:
- Model can forecast demand for multiple future hours
- Example prediction for September 1, 2018:
- First hour: ~279 orders
- Full day forecast shows clear daily pattern with morning and evening peaks
The solution enables "Chetenkoe Taxi" to proactively allocate drivers based on predicted demand, improving service quality during peak hours while optimizing operational costs.