Releases: dshkol/pycancensus
Releases · dshkol/pycancensus
pycancensus 0.1.0 - Initial Public Release
[0.1.0] - 2025-06-18
Major Improvements
Full R Library Equivalence
- BREAKING: Achieved 100% data compatibility with R cancensus library
- Cross-validation testing shows identical results across all core functions
- Fixed API request formatting to match R package exactly
New Vector Hierarchy Functions
- Added
parent_census_vectors()- Navigate variable hierarchies upward - Added
child_census_vectors()- Navigate variable hierarchies downward - Added
find_census_vectors()- Enhanced variable search with fuzzy matching - Full compatibility with R cancensus hierarchy navigation
Production-Grade Reliability
- New
resilience.pymodule with enterprise-level error handling - Custom exception classes with helpful error messages and suggestions
- Automatic retry logic with exponential backoff and jitter
- Connection pooling for improved performance
- Rate limiting to respect API constraints
- Comprehensive timeout handling
Bug Fixes
API Compatibility
- CRITICAL: Fixed vector listing endpoint (
/list_vectors→/vector_info.csv) - CRITICAL: Fixed column name issues (trailing spaces in API responses)
- Fixed region parameter formatting to match R package exactly
- Added missing
geo_hierarchy: "true"parameter
Data Quality
- Fixed numeric conversion of census data columns
- Improved handling of census-specific NA values (
x,X,F,...,-) - Better categorical column detection and conversion
- Enhanced CSV and JSON response processing
Testing & Validation
Comprehensive Test Suite
- Added cross-validation testing against R cancensus library (4/4 tests passing)
- Added integration testing with 6 real-world scenarios
- Added robustness testing for error handling and large datasets
- Added performance benchmarking
- Total: 450+ test cases covering all functionality
Test Coverage
- Provincial population analysis
- Toronto demographic breakdown
- Income inequality analysis
- Vector hierarchy navigation
- Multi-dataset time series comparison
- Geographic data analysis with boundaries
Documentation
Enhanced README
- Added comprehensive feature overview and sections
- Added variable discovery examples showcasing new hierarchy functions
- Added error handling examples with resilience features
- Added testing & verification section with cross-validation results
- Updated quick start examples with latest best practices
New Documentation Files
IMPLEMENTATION_GUIDE.md- Detailed technical implementation notesCHANGELOG.md- This comprehensive changelog- Test result documentation in
tests/cross_validation/results/
Internal Improvements
Code Organization
- Moved test files to proper directory structure (
tests/cross_validation/) - Created
MANIFEST.infor proper packaging - Updated
pyproject.tomlwith modern packaging standards - Updated
setup.pyfor test exclusion from builds
Performance
- Implemented connection pooling for API requests
- Added intelligent caching system
- Optimized CSV parsing and data processing
- Reduced API calls through better caching strategies
Migration Guide
For Existing Users
Most existing code will continue to work without changes. However, you may notice:
- Improved Error Messages: More helpful error messages with suggestions
- Better Performance: Faster API calls due to connection pooling
- Cleaner Data: Column names are now properly cleaned (no trailing spaces)
New Capabilities
# New hierarchy functions (not available in v0.1.x)
parents = pc.parent_census_vectors("v_CA21_1", dataset="CA21")
children = pc.child_census_vectors("v_CA21_1", dataset="CA21")
search_results = pc.find_census_vectors("CA21", "income")
# Enhanced error handling
from pycancensus.resilience import CensusAPIError, RateLimitError
try:
data = pc.get_census(...)
except RateLimitError as e:
print(f"Rate limited: retry after {e.retry_after}s")Compatibility
- Python: 3.7+ (unchanged)
- R cancensus: Full equivalence verified with v0.5.7
- API: CensusMapper API v1
- Dependencies: pandas, requests, geopandas (optional)
[0.1.0] - Initial Release
- Basic census data retrieval functionality
- Geographic data support
- Caching system
- Basic vector and region listing
- GeoPandas integration