Skip to content

AhmedAli58/London_Bike_Rides

Repository files navigation

London Bike Rides

Overview

I built this project to turn the London bike sharing data into a clear, reproducible analysis with a small KPI set, a baseline predictive model, and a Tableau-ready export.

Objectives

  • Quantify ride demand patterns over time and by weather
  • Report a concise KPI summary (average, median, max, min daily rides)
  • Build a baseline model to predict ride counts
  • Export Tableau-ready files and a dashboard

Data

Source

Kaggle dataset hmavrodiev/london-bike-sharing-dataset (downloaded via Kaggle CLI).

Files used

  • Input: london_merged.csv
  • Raw download: london-bike-sharing-dataset.zip
  • Output (notebook): london_bikes_final.xlsx, london_bikes_final.csv
  • Output (script): outputs/data_summary.json, outputs/kpis.json, outputs/model_metrics.json, outputs/model_cv_metrics.json
  • Dashboard: London Bike Rides.twbx

Key columns

  • Input columns: timestamp, cnt, t1, t2, hum, wind_speed, weather_code, is_holiday, is_weekend, season
  • Renamed columns: time, count, temp_real_C, temp_feels_like_C, humidity_percent, wind_speed_kph, weather, is_holiday, is_weekend, season

Time coverage

2015-01-04 00:00:00 to 2017-01-03 23:00:00

Method

  1. Import pandas, zipfile, and kaggle libraries.
  2. Download the dataset from Kaggle via CLI.
  3. Extract the downloaded zip file.
  4. Load london_merged.csv into a pandas DataFrame.
  5. Parse timestamps to datetime.
  6. Inspect structure and size (info, shape, preview).
  7. Check category distributions for weather_code and season.
  8. Rename columns for readability and compute time coverage.
  9. Convert humidity to a 0–1 proportion.
  10. Map season and weather codes to descriptive labels.
  11. Run validation and range checks (missing values, duplicates, negative counts, humidity bounds).
  12. Engineer lagged and rolling features from historical counts.
  13. Train a baseline regression model and record evaluation metrics.
  14. Export the cleaned dataset to london_bikes_final.xlsx and london_bikes_final.csv.

KPIs

From outputs/kpis.json:

  • Average daily rides: 27,268.45
  • Median daily rides: 27,011.50
  • Max daily rides: 72,504 (2015-07-09)
  • Min daily rides: 4,869 (2016-01-03)

Model

I used a Random Forest regressor with a time-based split and lag features, and compared it against Gradient Boosting with time-series cross-validation. Metrics are stored in outputs/model_metrics.json and outputs/model_cv_metrics.json.

Visuals

Daily total rides Rides by weather category Actual vs predicted rides

Repository structure

  • london_bikes.ipynb: primary analysis notebook.
  • london_merged.csv: raw dataset used in the notebook.
  • london-bike-sharing-dataset.zip: downloaded dataset archive.
  • london_bikes_final.xlsx: cleaned output used for Tableau.
  • London Bike Rides - Moving Average and Heatmap.twbx: Tableau workbook.
  • London Bike Rides.twbx: Tableau dashboard.
  • requirements.txt: Python dependencies inferred from notebook imports.
  • scripts/run_analysis.py: standalone analysis script for KPIs, plots, and ML baseline.
  • outputs/: generated KPI and model metrics.
  • assets/: generated plots.
  • .DS_Store: macOS metadata file, not used by the analysis.

How to run locally

Dependencies inferred from imports

  • pandas
  • kaggle
  • zipfile (Python standard library)
  • scikit-learn, numpy, matplotlib, seaborn

Execution steps

  1. Install dependencies from requirements.txt.
  2. Open london_bikes.ipynb in Jupyter.
  3. Ensure Kaggle CLI is configured, then run the Kaggle download cell.
  4. Run the extraction, loading, transformation, and validation cells in order.
  5. Run the analysis script to generate KPIs, plots, and metrics: python scripts/run_analysis.py.
  6. Confirm london_bikes_final.xlsx, london_bikes_final.csv, outputs/*.json, and assets/*.png are created.

Results summary

  • Dataset size reported as 17,414 rows and 10 columns.
  • No missing values reported across all 10 columns in bikes.info().
  • Weather code counts and season counts are displayed for categorical checks.
  • Preview of renamed and mapped columns shown via bikes.head().
  • Time coverage: 2015-01-04 00:00:00 to 2017-01-03 23:00:00.
  • Validation results: 0 duplicates, 0 negative counts, 0 humidity out-of-range.
  • KPI highlights: average daily rides 27,268.45; max daily rides 72,504 on 2015-07-09.
  • Baseline model metrics: RMSE 149.54, MAE 86.23, R2 0.982 (time-based split).
  • Cross-validation (time-series) averages: Random Forest RMSE 183.60, MAE 104.41, R2 0.971; Gradient Boosting RMSE 261.36, MAE 172.61, R2 0.942.

Limitations

  • Validation outputs are not saved; results require running the notebook.
  • Baseline model uses a single train/test split without hyperparameter tuning.
  • Data source details beyond the Kaggle dataset identifier are not documented in the notebook.

Next improvements

  • Persist validation outputs (e.g., save a CSV/JSON report) for reproducibility.
  • Add a brief data dictionary or source notes based on the Kaggle dataset page.
  • Parameterize file paths and outputs for easier reuse.

License status

Unknown.
How to verify: check for a LICENSE file in the repository root or repository metadata.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •