Access, retrieve, and work with Canadian Census data and geography.
pycancensus is a Python package that provides integrated, convenient, and uniform access to Canadian Census data and geography retrieved using the CensusMapper API. This package produces analysis-ready tidy DataFrames and spatial data in multiple formats, with full equivalence to the R cancensus library.
Synchronized with R cancensus 0.6.1 — see CHANGELOG.md for details:
- Full hierarchy traversal:
parent/child_census_vectors()return complete ancestor/descendant trees, verified identical to R - Semantic variable search: typo-tolerant
find_census_vectors(query_type="semantic"), now with the R-parity signature(query, dataset, ...)(breaking change) - StatCan recall detection: cached data is checked against published data recalls
- New helpers:
visualize_vector_hierarchy(),as_census_region_list(),add_unique_names_to_region_list(),explore_census_vectors()/regions() - Reliability: retries honor Retry-After; error payloads can no longer poison the cache; in-memory session cache for metadata
- Download Census data and geography in analysis-ready format
- Support for multiple Census years: 2021, 2016, 2011, 2006, 2001, 1996
- All Census geographic levels: PR, CMA, CD, CSD, CT, DA, EA, DB
- Taxfiler data at Census Tract level (2000-2018)
list_census_vectors()- Browse all available variablessearch_census_vectors()- Search variables by keywordfind_census_vectors()- Exact, semantic (fuzzy), and keyword searchparent_census_vectors()- Full ancestry of a variable, like R cancensuschild_census_vectors()- Full descendant tree, withleaves_onlyandmax_levelvisualize_vector_hierarchy()- ASCII tree view of variable hierarchiesexplore_census_vectors()- Open the interactive CensusMapper explorer
list_census_regions()/search_census_regions()- Browse and search regionsas_census_region_list()- Convert filtered region lists intoget_census()inputadd_unique_names_to_region_list()- De-duplicate ambiguous municipality namesexplore_census_regions()- Open the interactive CensusMapper explorer
- GeoPandas integration for spatial analysis
- Multiple resolution options (simplified/high)
- Seamless geometry + data integration
- Production-grade error handling with helpful messages
- Automatic retry with exponential backoff, honoring server Retry-After headers
- Connection pooling and in-memory session caching for metadata
- Rate limiting to respect API constraints
- Comprehensive file caching with StatCan data-recall detection
(
list_recalled_cached_data()/remove_recalled_cached_data())
Install from PyPI:
pip install pycancensusOr install the latest development version from GitHub:
pip install git+https://github.com/dshkol/pycancensus.gitFor development:
git clone https://github.com/dshkol/pycancensus.git
cd pycancensus
pip install -e .[dev]pycancensus requires a valid CensusMapper API key to use. You can obtain a free API key by signing up for a CensusMapper account.
Set your API key as an environment variable:
export CANCENSUS_API_KEY="your_api_key_here"Or set it programmatically:
import pycancensus as pc
pc.set_api_key("your_api_key_here")Full documentation is available at pycancensus.readthedocs.io
The documentation includes:
- Getting Started Tutorial - Learn the basics
- Working with Geographic Data - Maps and spatial analysis
- Example Gallery - Real-world usage examples
- API Reference - Complete function documentation
- R to Python Migration Guide - For R cancensus users
- LLM Usage Guide - For AI agents using the library (llms.txt)
import pycancensus as pc
# Set your API key
pc.set_api_key("your_api_key_here")
# List available datasets
datasets = pc.list_census_datasets()
# Discover variables with new hierarchy functions
vectors = pc.list_census_vectors("CA21")
income_vars = pc.search_census_vectors("income", "CA21")
related_vars = pc.child_census_vectors("v_CA21_1", dataset="CA21")
# Get census data
data = pc.get_census(
dataset="CA21",
regions={"CMA": "35535"}, # Toronto CMA
vectors=["v_CA21_1", "v_CA21_2", "v_CA21_3"], # Population by gender
level="CSD"
)
# Get census data with geography for mapping
geo_data = pc.get_census(
dataset="CA21",
regions={"PR": "35"}, # Ontario
vectors=["v_CA21_1"], # Total population
level="CSD",
geo_format="geopandas" # Returns GeoDataFrame
)
# Advanced: Compare multiple Census years
data_2021 = pc.get_census("CA21", {"CSD": "5915022"}, ["v_CA21_1"], "CSD")
data_2016 = pc.get_census("CA16", {"CSD": "5915022"}, ["v_CA16_401"], "CSD")# Search for housing-related variables
housing = pc.search_census_vectors("dwelling", "CA21")
# Navigate variable hierarchies
population_base = "v_CA21_1"
breakdowns = pc.child_census_vectors(population_base, dataset="CA21")
parent_categories = pc.parent_census_vectors(population_base, dataset="CA21")
# Enhanced search: exact, semantic (typo-tolerant), or keyword
income_vectors = pc.find_census_vectors("median household income", "CA21",
query_type="semantic")pycancensus includes production-grade error handling:
from pycancensus.resilience import CensusAPIError, RateLimitError
try:
data = pc.get_census("CA21", {"PR": "35"}, ["v_CA21_1"], "PR")
except RateLimitError as e:
print(f"Rate limited: {e}")
print(f"Retry after: {e.retry_after} seconds")
except CensusAPIError as e:
print(f"API error: {e}")
print(f"Suggestion: {e.suggestion}")pycancensus includes comprehensive testing to ensure reliability and R equivalence:
- 114 unit tests covering retry behavior, hierarchy traversal, search modes, caching semantics, recall detection, and region helpers
- CI runs on Python 3.8-3.11 with formatting and lint checks
- Hierarchy traversal, search modes, and name de-duplication verified byte-identical to R cancensus 0.6.1 on live data
- Automated example validator runs the R documentation examples against the Python implementation on every PR
- Real-world scenarios: demographic breakdowns, time series comparisons, geographic analysis with live API calls
- Error handling with invalid regions/vectors, large-dataset performance, retry logic validation
# Run the test suite
python -m pytest tests/ -v
# Run cross-validation against R
python tests/cross_validation/test_r_equivalence.py
# Run integration scenarios
python tests/integration/test_comprehensive_scenarios.pySee tests/cross_validation/results/ for detailed test results and validation reports.
Contributions are welcome! Please see CONTRIBUTING.md for guidelines on:
- Development setup
- Running tests
- Code style (Black, flake8)
- Submitting pull requests
- Reporting issues
If you use pycancensus in your research, please cite it (see CITATION.cff, or use GitHub's "Cite this repository" button):
Shkolnik, D. and J. von Bergmann (2026). pycancensus: access, retrieve, and work with Canadian Census data and geography in Python. v0.2.0. https://github.com/dshkol/pycancensus
This project is licensed under the MIT License - see the LICENSE file for details. The license covers the pycancensus code; census data retrieved with it is subject to the Statistics Canada Open Licence — see the attribution requirements below.
This package is explicitly a python port of the R cancensus package.
Subject to the Statistics Canada Open Data License Agreement, licensed products using Statistics Canada data should employ the following acknowledgement of source.
pycancensus can generate the correct attribution text for the datasets you
used: pc.dataset_attribution(["CA16", "CA21"]).
Acknowledgment of Source
(a) You shall include and maintain the following notice on all licensed rights of the Information:
- Source: Statistics Canada, name of product, reference date. Reproduced and distributed on an "as is" basis with the permission of Statistics Canada.
(b) Where any Information is contained within a Value-added Product, you shall include on such Value-added Product the following notice:
- Adapted from Statistics Canada, name of product, reference date. This does not constitute an endorsement by Statistics Canada of this product.