Skip to content

Commit f740574

Browse files
dshkolclaude
andauthored
docs: Refresh README for 0.2.0; strengthen CONTRIBUTING conventions (#22)
README: - Fix broken find_census_vectors example still using the pre-0.2.0 argument order (verified the corrected call against the live API) - Replace hardcoded tests/R-equivalence badges with the live CI badge and a PyPI version badge; Python floor corrected to 3.8 - Replace stale "Recent Updates" with a 0.2.0 summary linking the CHANGELOG; refresh testing-section claims to match reality - Link the LLM usage guide and llms.txt from the documentation list CONTRIBUTING: - State the R-compatibility north star and core conventions (ResilientSession for all HTTP, string identifiers, public-function checklist); point to AGENTS.md for the architecture map Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
1 parent c951755 commit f740574

2 files changed

Lines changed: 45 additions & 26 deletions

File tree

CONTRIBUTING.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -125,12 +125,23 @@ Add support for national-level census data
125125
### Code Guidelines
126126

127127
**General Principles:**
128+
- **R compatibility is the north star**: function names, signatures, and
129+
behavior mirror the R [cancensus](https://github.com/mountainMath/cancensus)
130+
package; check the R implementation before changing ported behavior, and
131+
cross-validate results against it where feasible
132+
- All HTTP goes through `get_session()` from `resilience.py` — never raw `requests`
133+
- Identifier columns (region IDs, UIDs) are strings, never numeric
134+
- When adding a public function: export it in `__init__.py`'s `__all__`, add
135+
it to `docs/api/index.rst`, and commit the autosummary stub
128136
- Write clear, readable code
129137
- Add docstrings to all public functions and classes
130138
- Follow existing code patterns and conventions
131139
- Keep functions focused and single-purpose
132140
- Add type hints where appropriate
133141

142+
See [AGENTS.md](AGENTS.md) for a fuller architecture map and known gotchas
143+
(written for AI coding agents, equally useful for humans).
144+
134145
**Docstring Format:**
135146

136147
```python

README.md

Lines changed: 34 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,28 @@
11
# pycancensus
22

3+
[![PyPI](https://img.shields.io/pypi/v/pycancensus.svg)](https://pypi.org/project/pycancensus/)
34
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
4-
[![Python 3.7+](https://img.shields.io/badge/python-3.7+-blue.svg)](https://www.python.org/downloads/)
5+
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
6+
[![CI](https://github.com/dshkol/pycancensus/actions/workflows/ci.yml/badge.svg)](https://github.com/dshkol/pycancensus/actions/workflows/ci.yml)
57
[![Documentation Status](https://readthedocs.org/projects/pycancensus/badge/?version=latest)](https://pycancensus.readthedocs.io/en/latest/?badge=latest)
6-
[![Tests](https://img.shields.io/badge/tests-passing-green.svg)](tests/)
7-
[![R Equivalence](https://img.shields.io/badge/R%20equivalence-verified-blue.svg)](tests/cross_validation/)
88

99
Access, retrieve, and work with Canadian Census data and geography.
1010

1111
**pycancensus** is a Python package that provides integrated, convenient, and uniform access to Canadian Census data and geography retrieved using the CensusMapper API. This package produces analysis-ready tidy DataFrames and spatial data in multiple formats, with full equivalence to the R cancensus library.
1212

13-
## Recent Updates
13+
## What's New in 0.2.0
1414

15-
- **Full R Library Equivalence**: Verified 100% data compatibility with R cancensus
16-
- **Enhanced API Reliability**: Production-grade error handling and retry logic
17-
- **Vector Hierarchy Functions**: Navigate census variable relationships like R
18-
- **Improved Data Quality**: Fixed column naming and data processing issues
19-
- **Comprehensive Testing**: 450+ integration tests covering real-world scenarios
20-
- **National-Level Support**: Added level='C' for Canada-wide baseline comparisons
15+
Synchronized with R cancensus 0.6.1 — see [CHANGELOG.md](CHANGELOG.md) for details:
16+
17+
- **Full hierarchy traversal**: `parent/child_census_vectors()` return complete
18+
ancestor/descendant trees, verified identical to R
19+
- **Semantic variable search**: typo-tolerant `find_census_vectors(query_type="semantic")`,
20+
now with the R-parity signature `(query, dataset, ...)` (breaking change)
21+
- **StatCan recall detection**: cached data is checked against published data recalls
22+
- **New helpers**: `visualize_vector_hierarchy()`, `as_census_region_list()`,
23+
`add_unique_names_to_region_list()`, `explore_census_vectors()/regions()`
24+
- **Reliability**: retries honor Retry-After; error payloads can no longer poison
25+
the cache; in-memory session cache for metadata
2126

2227
## Features
2328

@@ -104,6 +109,7 @@ The documentation includes:
104109
- **[Example Gallery](https://pycancensus.readthedocs.io/en/latest/auto_examples/index.html)** - Real-world usage examples
105110
- **[API Reference](https://pycancensus.readthedocs.io/en/latest/api/index.html)** - Complete function documentation
106111
- **[R to Python Migration Guide](https://pycancensus.readthedocs.io/en/latest/migration.html)** - For R cancensus users
112+
- **[LLM Usage Guide](https://pycancensus.readthedocs.io/en/latest/llm_usage.html)** - For AI agents using the library ([llms.txt](https://pycancensus.readthedocs.io/en/latest/llms.txt))
107113

108114
## Quick Start
109115

@@ -154,8 +160,9 @@ population_base = "v_CA21_1"
154160
breakdowns = pc.child_census_vectors(population_base, dataset="CA21")
155161
parent_categories = pc.parent_census_vectors(population_base, dataset="CA21")
156162

157-
# Enhanced search with fuzzy matching
158-
income_vectors = pc.find_census_vectors("CA21", "median household income")
163+
# Enhanced search: exact, semantic (typo-tolerant), or keyword
164+
income_vectors = pc.find_census_vectors("median household income", "CA21",
165+
query_type="semantic")
159166
```
160167

161168
## Error Handling & Resilience
@@ -179,21 +186,22 @@ except CensusAPIError as e:
179186

180187
pycancensus includes comprehensive testing to ensure reliability and R equivalence:
181188

189+
### Unit Testing
190+
- **114 unit tests** covering retry behavior, hierarchy traversal, search modes,
191+
caching semantics, recall detection, and region helpers
192+
- CI runs on Python 3.8-3.11 with formatting and lint checks
193+
182194
### Cross-Validation with R cancensus
183-
- **4/4 tests passing** with full data equivalence
184-
- Identical results for vector listing, data retrieval, and multi-region queries
185-
- Automated testing against R cancensus library
186-
187-
### Integration Testing
188-
- **6 real-world scenarios** covering typical data analysis workflows
189-
- Provincial population analysis, demographic breakdowns, income analysis
190-
- Vector hierarchy navigation, time series comparisons, geographic analysis
191-
- Performance benchmarking with large datasets
192-
193-
### Robustness Testing
194-
- Error handling with invalid regions/vectors
195-
- Large dataset performance testing
196-
- API resilience and retry logic validation
195+
- Hierarchy traversal, search modes, and name de-duplication verified
196+
**byte-identical** to R cancensus 0.6.1 on live data
197+
- Automated example validator runs the R documentation examples against
198+
the Python implementation on every PR
199+
200+
### Integration & Robustness Testing
201+
- Real-world scenarios: demographic breakdowns, time series comparisons,
202+
geographic analysis with live API calls
203+
- Error handling with invalid regions/vectors, large-dataset performance,
204+
retry logic validation
197205

198206
```bash
199207
# Run the test suite

0 commit comments

Comments
 (0)