London Bike Rides

Overview

I built this project to turn the London bike sharing data into a clear, reproducible analysis with a small KPI set, a baseline predictive model, and a Tableau-ready export.

Objectives

Quantify ride demand patterns over time and by weather
Report a concise KPI summary (average, median, max, min daily rides)
Build a baseline model to predict ride counts
Export Tableau-ready files and a dashboard

Data

Source

Kaggle dataset hmavrodiev/london-bike-sharing-dataset (downloaded via Kaggle CLI).

Files used

Input: london_merged.csv
Raw download: london-bike-sharing-dataset.zip
Output (notebook): london_bikes_final.xlsx, london_bikes_final.csv
Output (script): outputs/data_summary.json, outputs/kpis.json, outputs/model_metrics.json, outputs/model_cv_metrics.json
Dashboard: London Bike Rides.twbx

Key columns

Input columns: timestamp, cnt, t1, t2, hum, wind_speed, weather_code, is_holiday, is_weekend, season
Renamed columns: time, count, temp_real_C, temp_feels_like_C, humidity_percent, wind_speed_kph, weather, is_holiday, is_weekend, season

Time coverage

2015-01-04 00:00:00 to 2017-01-03 23:00:00

Method

Import pandas, zipfile, and kaggle libraries.
Download the dataset from Kaggle via CLI.
Extract the downloaded zip file.
Load london_merged.csv into a pandas DataFrame.
Parse timestamps to datetime.
Inspect structure and size (info, shape, preview).
Check category distributions for weather_code and season.
Rename columns for readability and compute time coverage.
Convert humidity to a 0–1 proportion.
Map season and weather codes to descriptive labels.
Run validation and range checks (missing values, duplicates, negative counts, humidity bounds).
Engineer lagged and rolling features from historical counts.
Train a baseline regression model and record evaluation metrics.
Export the cleaned dataset to london_bikes_final.xlsx and london_bikes_final.csv.

KPIs

From outputs/kpis.json:

Average daily rides: 27,268.45
Median daily rides: 27,011.50
Max daily rides: 72,504 (2015-07-09)
Min daily rides: 4,869 (2016-01-03)

Model

I used a Random Forest regressor with a time-based split and lag features, and compared it against Gradient Boosting with time-series cross-validation. Metrics are stored in outputs/model_metrics.json and outputs/model_cv_metrics.json.

Visuals

Repository structure

london_bikes.ipynb: primary analysis notebook.
london_merged.csv: raw dataset used in the notebook.
london-bike-sharing-dataset.zip: downloaded dataset archive.
london_bikes_final.xlsx: cleaned output used for Tableau.
London Bike Rides - Moving Average and Heatmap.twbx: Tableau workbook.
London Bike Rides.twbx: Tableau dashboard.
requirements.txt: Python dependencies inferred from notebook imports.
scripts/run_analysis.py: standalone analysis script for KPIs, plots, and ML baseline.
outputs/: generated KPI and model metrics.
assets/: generated plots.
.DS_Store: macOS metadata file, not used by the analysis.

How to run locally

Dependencies inferred from imports

pandas
kaggle
zipfile (Python standard library)
scikit-learn, numpy, matplotlib, seaborn

Execution steps

Install dependencies from requirements.txt.
Open london_bikes.ipynb in Jupyter.
Ensure Kaggle CLI is configured, then run the Kaggle download cell.
Run the extraction, loading, transformation, and validation cells in order.
Run the analysis script to generate KPIs, plots, and metrics: python scripts/run_analysis.py.
Confirm london_bikes_final.xlsx, london_bikes_final.csv, outputs/*.json, and assets/*.png are created.

Results summary

Dataset size reported as 17,414 rows and 10 columns.
No missing values reported across all 10 columns in bikes.info().
Weather code counts and season counts are displayed for categorical checks.
Preview of renamed and mapped columns shown via bikes.head().
Time coverage: 2015-01-04 00:00:00 to 2017-01-03 23:00:00.
Validation results: 0 duplicates, 0 negative counts, 0 humidity out-of-range.
KPI highlights: average daily rides 27,268.45; max daily rides 72,504 on 2015-07-09.
Baseline model metrics: RMSE 149.54, MAE 86.23, R2 0.982 (time-based split).
Cross-validation (time-series) averages: Random Forest RMSE 183.60, MAE 104.41, R2 0.971; Gradient Boosting RMSE 261.36, MAE 172.61, R2 0.942.

Limitations

Validation outputs are not saved; results require running the notebook.
Baseline model uses a single train/test split without hyperparameter tuning.
Data source details beyond the Kaggle dataset identifier are not documented in the notebook.

Next improvements

Persist validation outputs (e.g., save a CSV/JSON report) for reproducibility.
Add a brief data dictionary or source notes based on the Kaggle dataset page.
Parameterize file paths and outputs for easier reuse.

License status

Unknown.
How to verify: check for a LICENSE file in the repository root or repository metadata.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

London Bike Rides

Overview

Objectives

Data

Source

Files used

Key columns

Time coverage

Method

KPIs

Model

Visuals

Repository structure

How to run locally

Dependencies inferred from imports

Execution steps

Results summary

Limitations

Next improvements

License status

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
assets		assets
outputs		outputs
scripts		scripts
.gitignore		.gitignore
London Bike Rides - Moving Average and Heatmap.twbx		London Bike Rides - Moving Average and Heatmap.twbx
London Bike Rides.twbx		London Bike Rides.twbx
README.md		README.md
london-bike-sharing-dataset.zip		london-bike-sharing-dataset.zip
london_bikes.ipynb		london_bikes.ipynb
london_bikes_final.xlsx		london_bikes_final.xlsx
london_merged.csv		london_merged.csv
requirements.txt		requirements.txt

AhmedAli58/London_Bike_Rides

Folders and files

Latest commit

History

Repository files navigation

London Bike Rides

Overview

Objectives

Data

Source

Files used

Key columns

Time coverage

Method

KPIs

Model

Visuals

Repository structure

How to run locally

Dependencies inferred from imports

Execution steps

Results summary

Limitations

Next improvements

License status

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages