Skip to content

nammnjoshii/Canadian-Healthcare-Delivery-Analytics-Budget-Impact-on-Patient-Wait-Times-at-Provincial-Scale

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Why Increased Healthcare Spending Fails to Reduce Wait Times in Canada

Modeled provincial healthcare investment efficiency across 10 Canadian provinces to uncover where funding breaks down, enabling evidence-based reallocation recommendations for system planners.

Python R XGBoost SQLite Data n Cost


The Finding in 30 Seconds

At the national level, higher provincial budgets correlate with shorter wait times (r = −0.50, p < 0.001). Break the data down by province, and the relationship reverses — provinces with the highest budgets often have the longest waits.

This is a Simpson's Paradox. It is consistent with reactive government funding: provinces with structurally long wait times receive budget increases, but aging populations, physician shortages, and facility constraints absorb the investment without producing wait time improvement.

The budget signal is real. It is not strong enough to be the primary lever.


Quick Numbers

Metric Value
Pearson correlation r = −0.50, p < 0.001
Variance explained (OLS) R² = 0.205 — budget explains 20.5% of wait variation
Effect size ~0.2 days per $1B (directional estimate, not a causal coefficient)
Unexplained variance 79.5% — structural factors dominate
Observations n = 54 (10 provinces × 6 years, 2013–2018)
Top predictive feature (XGBoost) budget_rank — provincial structural position, not raw spend

What Should Be Done

This analysis does not establish causality. All recommendations are directional hypotheses requiring quasi-experimental validation.

Recommendation Evidence Basis Trade-off
Shift KPI to wait-days per $M CAD (efficiency, not total spend) R² = 0.205: budget level alone is an insufficient performance signal Efficiency metrics require consistent cost accounting across provinces
Target BC, NB, PEI first (highest unexplained within-province variance) Province-level analysis shows these provinces diverge most from budget-predicted wait times Requires provincial buy-in; not a funding conversation
Invest in structural capacity (physician supply, facility distribution) over aggregate budget transfers 79.5% of variance is structural — this is where the ROI is Longer payback cycle; harder to show as a political win

Repository Structure

.
├── src/                          # Python pipeline
│   ├── config.py                 # Path constants, province maps, analysis params
│   ├── data_ingestion.py         # CIHI download + SQLite write; synthetic fallback
│   ├── data_cleaning.py          # Budget + wait time cleaning + merge
│   ├── feature_engineering.py   # 7 features: per-capita, lag, rank, trend
│   ├── modeling.py               # 3-model strategy: OLS → Ridge/Lasso → XGBoost
│   ├── evaluation.py             # Metrics, partial dependence, decision output
│   ├── run_pipeline.py           # Orchestrator — run this
│   └── requirements.txt
│
├── tests/
│   ├── conftest.py               # Shared pytest fixtures (synthetic data, session-scoped)
│   └── test_pipeline.py          # 25 smoke tests: schema, merge ~60 rows, 7 features, R² gate
│
├── docs/
│   ├── executive_brief.md        # 1-page standalone brief for system planners
│   ├── executive_one_pager.md    # Single-slide summary for senior executives
│   ├── decision_output.md        # Explicit recommendations with evidence + trade-offs
│   ├── program_narrative.md      # TPM framing: decisions, trade-offs, stakeholders
│   ├── slide_deck_outline.md     # 5-slide consulting-grade deck outline
│   ├── communications_guide.md   # Audience-specific briefing summaries and analytical FAQ
│   └── program_delivery.md       # Delivery plan: charter, WBS, gates, RACI, risk register
│
├── notebooks/
│   ├── canadian_healthcare_analysis.Rmd   # R analysis
│   └── canadian_healthcare_analysis.md    # Knitted markdown output
│
├── data/
│   ├── README.md                 # Data dictionary, schemas, province codes, assumptions
│   ├── input/                    # Raw CIHI xlsx files (gitignored — download fresh)
│   └── processed/                # Cleaned CSVs (gitignored — regenerated by pipeline)
│
├── outputs/                      # Rendered R outputs (PDF, HTML)
├── pipelines/
│   ├── README.md                 # Pipeline execution guide and deployment instructions
│   └── github_actions_pipeline.yml   # Draft CI/CD workflow (Phase 2 reference)
├── pytest.ini                    # pytest configuration (testpaths = tests)
├── .gitignore
└── README.md

Model Architecture

Model Features Purpose Limitation
Baseline OLS Budget only Replicates R analysis; sanity-check gate (R² ≈ 0.205) Omitted variable bias; no non-linearity
Ridge / Lasso All 7 engineered features Stability with n = 54; Lasso auto-selects features Less interpretable; penalised coefficients
XGBoost* All 7 engineered features Non-linear pattern exploration; feature importance Pattern exploration only — not for prediction or deployment

*XGBoost falls back to RandomForestRegressor if libomp is not installed (brew install libomp on macOS).


Feature Engineering

Simple baseline This analysis
Features Budget (raw, millions CAD) 7 features capturing per-capita normalization, temporal dynamics, structural position
Problem Confounded by province size; ON ($59B) vs PEI ($680M) not comparable Per-capita budget addresses scale; lag addresses timing; rank addresses structural position

All 7 features:

Feature Addresses
budget_per_capita Province size confound — most important correction
volume_per_capita Demand-side pressure differences across provinces
budget_lag1 Tests reactive vs. proactive funding hypothesis
province_encoded Structural position (fiscal scale, ordinal)
year_trend Secular time trend (aging, technology)
budget_yoy_change Direction of investment, not just level
budget_rank Relative provincial position within each year

Population normalization uses static 2016 Census baseline to avoid introducing temporal bias from interpolated estimates.


Key Insights

  1. Budget explains 20.5% of wait time variance — statistically significant, practically limited. The other 79.5% is the more important signal.

  2. Simpson's Paradox at the provincial level — the national negative trend reverses province-by-province, consistent with reactive government funding into structurally constrained systems.

  3. Low marginal return — approximately 0.2 days per $1B on the observed dataset. Large funding increases produce small outcomes.

  4. Structural position outperforms raw budget — XGBoost identifies budget_rank as more predictive than raw Budget, confirming the structural hypothesis.

  5. Diminishing returns — partial dependence analysis estimates that beyond ~$5,000–6,000 per capita, additional spending produces minimal further wait time reduction (directional, n = 54, not a policy rule).


What Was Not Done and Why

Approach Why Not Used
Deep learning n = 54; no latent structure; no generalisation basis; overkill
Causal inference (IV / DiD / RDD) No valid instrument variable; no policy discontinuity; observational panel data
50+ feature models Noise risk overwhelms n = 54; parsimony is the correct call, not a limitation
Individual patient-level analysis Not in CIHI public data; aggregate-to-individual inference is the ecological fallacy

Choosing not to use a technique, with documented reasoning, is the senior analytical move.


Documents

Document Purpose
docs/executive_brief.md 1-page brief — read this first
docs/executive_one_pager.md Single-slide summary for senior executives
docs/decision_output.md Full recommendation set with evidence and trade-offs
docs/program_narrative.md TPM framing: analytical decisions, trade-offs, stakeholder context
docs/slide_deck_outline.md 5-slide consulting-grade deck outline
docs/communications_guide.md Audience-specific briefing summaries for system planners and technical reviewers
docs/program_delivery.md Program delivery plan: charter, WBS, milestones, gate criteria, RACI, risk register

| data/README.md | Data dictionary, sourcing, schemas, assumptions |


Setup

Prerequisites: Python 3.10+, pip

# 1. Clone the repo
git clone <repo-url>
cd <repo-name>

# 2. Install dependencies (all free / open-source)
pip install -r src/requirements.txt

# macOS only — required for XGBoost:
brew install libomp

# 3. Run the pipeline
python src/run_pipeline.py

# Optional: attempt live CIHI download
python src/run_pipeline.py --live

No server setup required. SQLite (Python stdlib). No .env file. No credentials.

Output:

  • data/healthcare.db — SQLite database with raw and processed tables
  • data/processed/merged_final.csv — read by R notebook
  • Terminal: model comparison table, feature importance, decision output

R notebook (optional):

# Run Python pipeline first to generate data/processed/merged_final.csv
# Then open notebooks/canadian_healthcare_analysis.Rmd in RStudio and knit

Running the test suite:

pytest

Tests run against synthetic data. No CIHI connection required. Expected: 25 tests pass in under 30 seconds.


Data

Both datasets are free public data from the Canadian Institute for Health Information (CIHI).

Dataset Source
National Health Expenditure Trends CIHI data catalogue
Wait Times for Priority Procedures CIHI data catalogue

Limitations

  • Correlational, not causal. This analysis does not establish causality. All policy recommendations are framed as directional hypotheses requiring quasi-experimental validation.
  • Small sample. n = 54 province-year observations. Results are directional; cross-validated R² reflects generalisation limits.
  • Pre-COVID data. 2013–2018 only. Post-2020 disruption likely changed these dynamics significantly.
  • Aggregate level. Province-year aggregates mask within-province variation. Ecological fallacy risk prohibits individual-level inference.
  • Budget is total expenditure, not procedure-specific. Targeted capacity investment analysis requires procedure-level budget data not in the CIHI public dataset.

Author

Nammn Joshii | LinkedIn | GitHub

Provenance: Original analysis: October 2019. Repository structured for public portfolio: April 2026. The analytical findings, data, and code are unchanged from the original analysis. Documentation (program narrative, decision output, delivery plan) reflects structured retrospective framing of the 2019 work.

About

This project investigates the root cause of delays in getting treatment in Canada

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages