Skip to content

Commit 64aeca7

Browse files
dshkolclaude
andcommitted
Fix critical bugs in documentation and library code
## Documentation Fixes - Fixed Toronto CMA region code from '535' to '35535' in getting_started.md - Added labels="short" to all get_census calls in working_with_geometry.md - Removed all sample data fallbacks from tutorials and examples ## Library Bugs Fixed ### 1. labels="short" not working with cached data - **Bug**: When data was returned from cache, the labels parameter was ignored - **Fix**: Call _extract_vector_metadata() on cached data in core.py:121 - **Impact**: Vector columns now correctly named 'v_CA21_1' instead of 'v_CA21_1: Population, 2021' ### 2. Missing CRS in GeoDataFrames - **Bug**: GeoDataFrames had CRS=None, causing incorrect area calculations - **Fix**: Explicitly set crs="EPSG:4326" when creating GeoDataFrame in core.py:446 - **Impact**: Geographic operations now work correctly with proper coordinate system ## Build Results - Documentation builds successfully with visualizations now displaying - All tutorials execute without errors - Maps and plots now render correctly in HTML output 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
1 parent 5bf3411 commit 64aeca7

7 files changed

Lines changed: 36 additions & 62 deletions

File tree

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
pycancensus.dataset\_attribution
2+
================================
3+
4+
.. currentmodule:: pycancensus
5+
6+
.. autofunction:: dataset_attribution
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
pycancensus.get\_intersecting\_geometries
2+
=========================================
3+
4+
.. currentmodule:: pycancensus
5+
6+
.. autofunction:: get_intersecting_geometries
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
pycancensus.label\_vectors
2+
==========================
3+
4+
.. currentmodule:: pycancensus
5+
6+
.. autofunction:: label_vectors

docs/examples/plot_geographic_analysis.py

Lines changed: 1 addition & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -41,28 +41,7 @@
4141

4242
except Exception as e:
4343
print(f"Error retrieving data: {e}")
44-
print("Creating sample data for demonstration...")
45-
46-
# Create sample data for demonstration when API is not available
47-
import numpy as np
48-
from shapely.geometry import Point
49-
import geopandas as gpd
50-
51-
# Sample coordinates around Vancouver area
52-
n_points = 50
53-
np.random.seed(42)
54-
lons = np.random.uniform(-123.3, -122.9, n_points)
55-
lats = np.random.uniform(49.15, 49.35, n_points)
56-
57-
geo_data = gpd.GeoDataFrame({
58-
'GeoUID': [f'59933{i:03d}' for i in range(n_points)],
59-
'name': [f'Census Tract {i}' for i in range(n_points)],
60-
'v_CA21_1': np.random.randint(1000, 8000, n_points), # Population
61-
'v_CA21_434': np.random.randint(30000, 120000, n_points), # Median income
62-
'geometry': [Point(lon, lat) for lon, lat in zip(lons, lats)]
63-
}, crs='EPSG:4326')
64-
65-
print("Using sample data for demonstration")
44+
raise # Fail if API call doesn't work - no fallbacks
6645

6746
# %%
6847
# Creating a Basic Map

docs/tutorials/getting_started.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -154,7 +154,7 @@ try:
154154
# Get real data for Toronto CMA using our income hierarchy vectors
155155
toronto_data = get_census(
156156
dataset='CA21',
157-
regions={'CMA': '535'}, # Toronto CMA
157+
regions={'CMA': '35535'}, # Toronto CMA
158158
vectors=['v_CA21_923', 'v_CA21_939', 'v_CA21_942', 'v_CA21_943'], # Income categories
159159
level='CMA',
160160
use_cache=False

docs/tutorials/working_with_geometry.md

Lines changed: 13 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -29,8 +29,10 @@ import pycancensus as pc
2929
import pandas as pd
3030
import matplotlib.pyplot as plt
3131
import geopandas as gpd
32+
from IPython.display import display
3233
33-
# Set up plotting
34+
# Set up plotting for notebook display
35+
%matplotlib inline
3436
plt.style.use('default')
3537
plt.rcParams['figure.figsize'] = (12, 8)
3638
@@ -70,41 +72,19 @@ try:
7072
regions={"CMA": "59933"}, # Vancouver CMA
7173
vectors=["v_CA21_1", "v_CA21_434"], # Population and median income
7274
level="CSD", # Municipality level
73-
geo_format="geopandas"
75+
geo_format="geopandas",
76+
labels="short"
7477
)
7578
7679
print(f"Retrieved data for {len(vancouver_data)} municipalities")
77-
print(f"Columns: {list(vancouver_data.columns)}")
7880
print(f"CRS: {vancouver_data.crs}")
79-
81+
8082
# Show sample data
8183
display(vancouver_data[['name', 'v_CA21_1', 'v_CA21_434']].head())
8284
8385
except Exception as e:
8486
print(f"Error retrieving data: {e}")
85-
print("Creating sample data for demonstration...")
86-
87-
# Create sample data when API is not available
88-
import numpy as np
89-
from shapely.geometry import Polygon
90-
91-
# Sample Vancouver area municipalities
92-
municipalities = [
93-
'Vancouver', 'Surrey', 'Burnaby', 'Richmond', 'Coquitlam',
94-
'Langley', 'North Vancouver', 'West Vancouver', 'New Westminster'
95-
]
96-
97-
np.random.seed(42)
98-
vancouver_data = gpd.GeoDataFrame({
99-
'GeoUID': [f'59933{i:02d}' for i in range(len(municipalities))],
100-
'name': municipalities,
101-
'v_CA21_1': np.random.randint(50000, 650000, len(municipalities)),
102-
'v_CA21_434': np.random.randint(40000, 100000, len(municipalities)),
103-
'geometry': [Polygon([(i, j), (i+1, j), (i+1, j+1), (i, j+1)])
104-
for i, j in enumerate(range(len(municipalities)))]
105-
}, crs='EPSG:4326')
106-
107-
print("Using sample data for demonstration")
87+
raise # Fail if API call doesn't work - no fallbacks
10888
```
10989

11090
## Creating Basic Maps
@@ -129,7 +109,7 @@ ax.set_title('Population by Municipality\nVancouver CMA, 2021', fontsize=16)
129109
ax.axis('off') # Remove axes for cleaner look
130110
131111
plt.tight_layout()
132-
fig # Display the figure
112+
display(fig)
133113
```
134114

135115
## Multi-Variable Mapping
@@ -167,7 +147,7 @@ ax2.axis('off')
167147
168148
plt.suptitle('Vancouver CMA: Population vs Income', fontsize=16, y=1.02)
169149
plt.tight_layout()
170-
fig # Display the figure
150+
display(fig)
171151
```
172152

173153
## Working with Different Geographic Levels
@@ -190,7 +170,8 @@ try:
190170
dataset="CA21",
191171
regions={"CMA": "59933"},
192172
vectors=["v_CA21_1"],
193-
level=level
173+
level=level,
174+
labels="short"
194175
)
195176
print(f"{name:15} ({level}): {len(data):4,} regions")
196177
except Exception as e:
@@ -228,17 +209,11 @@ try:
228209
boundaries.plot(ax=ax, edgecolor='blue', facecolor='lightblue', alpha=0.7)
229210
ax.set_title('Vancouver CMA Municipal Boundaries')
230211
ax.axis('off')
231-
fig # Display the figure
212+
display(fig)
232213
233214
except Exception as e:
234215
print(f"Error getting boundaries: {e}")
235-
236-
# Use our sample data
237-
fig, ax = plt.subplots(1, 1, figsize=(10, 8))
238-
vancouver_data.boundary.plot(ax=ax, color='blue', linewidth=2)
239-
ax.set_title('Sample Municipal Boundaries')
240-
ax.axis('off')
241-
fig # Display the figure
216+
raise # Fail if API call doesn't work - no fallbacks
242217
```
243218

244219
## Spatial Analysis

pycancensus/core.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -117,6 +117,8 @@ def get_census(
117117
if cached_data is not None:
118118
if not quiet:
119119
print(f"Reading data from cache...")
120+
# Process labels for cached data
121+
cached_data = _extract_vector_metadata(cached_data, vectors, labels)
120122
return cached_data
121123

122124
# Build API request exactly like the R package
@@ -441,7 +443,7 @@ def _process_geojson_response(data, vectors, labels):
441443
if "features" not in data:
442444
raise ValueError("Invalid GeoJSON response: missing 'features' field")
443445

444-
gdf = gpd.GeoDataFrame.from_features(data["features"])
446+
gdf = gpd.GeoDataFrame.from_features(data["features"], crs="EPSG:4326")
445447

446448
# Apply the same numeric conversion logic as CSV processing
447449
# This was missing and causing all columns to remain as strings

0 commit comments

Comments
 (0)