All notable changes to pycancensus will be documented in this file.
- Added support for national-level census data retrieval (level='C')
- Updated validation in
utils.pyto include 'C' in valid levels - Created comprehensive test suite in
tests/test_national_level.py - Achieves feature parity with R cancensus for national baseline comparisons
- CRITICAL: Fixed
list_census_regions()404 error - Changed endpoint from
/api/v1/list_regionsto/data_sets/{dataset}/place_names.csv - Updated response parsing from JSON to CSV format
- Now successfully retrieves all regions (5,518 for CA16/CA21)
- Fixed CI test failures across Python 3.9 and 3.10
- Made psutil dependency optional in performance tests
- Updated mocked API tests to match new CSV response format
- Improved cross-validation test assertions for better R compatibility checking
- Removed emojis from all public-facing documentation
- Cleaned up README.md, tutorials, and examples for professional tone
- Improved documentation clarity and readability
- BREAKING: Achieved 100% data compatibility with R cancensus library
- Cross-validation testing shows identical results across all core functions
- Fixed API request formatting to match R package exactly
- Added
parent_census_vectors()- Navigate variable hierarchies upward - Added
child_census_vectors()- Navigate variable hierarchies downward - Added
find_census_vectors()- Enhanced variable search with fuzzy matching - Full compatibility with R cancensus hierarchy navigation
- New
resilience.pymodule with enterprise-level error handling - Custom exception classes with helpful error messages and suggestions
- Automatic retry logic with exponential backoff and jitter
- Connection pooling for improved performance
- Rate limiting to respect API constraints
- Comprehensive timeout handling
- CRITICAL: Fixed vector listing endpoint (
/list_vectors→/vector_info.csv) - CRITICAL: Fixed column name issues (trailing spaces in API responses)
- Fixed region parameter formatting to match R package exactly
- Added missing
geo_hierarchy: "true"parameter
- Fixed numeric conversion of census data columns
- Improved handling of census-specific NA values (
x,X,F,...,-) - Better categorical column detection and conversion
- Enhanced CSV and JSON response processing
- Added cross-validation testing against R cancensus library (4/4 tests passing)
- Added integration testing with 6 real-world scenarios
- Added robustness testing for error handling and large datasets
- Added performance benchmarking
- Total: 450+ test cases covering all functionality
- Provincial population analysis
- Toronto demographic breakdown
- Income inequality analysis
- Vector hierarchy navigation
- Multi-dataset time series comparison
- Geographic data analysis with boundaries
- Added comprehensive feature overview and sections
- Added variable discovery examples showcasing new hierarchy functions
- Added error handling examples with resilience features
- Added testing & verification section with cross-validation results
- Updated quick start examples with latest best practices
IMPLEMENTATION_GUIDE.md- Detailed technical implementation notesCHANGELOG.md- This comprehensive changelog- Test result documentation in
tests/cross_validation/results/
- Moved test files to proper directory structure (
tests/cross_validation/) - Created
MANIFEST.infor proper packaging - Updated
pyproject.tomlwith modern packaging standards - Updated
setup.pyfor test exclusion from builds
- Implemented connection pooling for API requests
- Added intelligent caching system
- Optimized CSV parsing and data processing
- Reduced API calls through better caching strategies
Most existing code will continue to work without changes. However, you may notice:
- Improved Error Messages: More helpful error messages with suggestions
- Better Performance: Faster API calls due to connection pooling
- Cleaner Data: Column names are now properly cleaned (no trailing spaces)
# New hierarchy functions (not available in v0.1.x)
parents = pc.parent_census_vectors("v_CA21_1", dataset="CA21")
children = pc.child_census_vectors("v_CA21_1", dataset="CA21")
search_results = pc.find_census_vectors("CA21", "income")
# Enhanced error handling
from pycancensus.resilience import CensusAPIError, RateLimitError
try:
data = pc.get_census(...)
except RateLimitError as e:
print(f"Rate limited: retry after {e.retry_after}s")- Python: 3.7+ (unchanged)
- R cancensus: Full equivalence verified with v0.5.7
- API: CensusMapper API v1
- Dependencies: pandas, requests, geopandas (optional)
- Basic census data retrieval functionality
- Geographic data support
- Caching system
- Basic vector and region listing
- GeoPandas integration
Format: This changelog follows Keep a Changelog principles.