Skip to content

Commit 6360016

Browse files
dshkolclaude
andcommitted
Add bulletproof verification JSON system
- Extend fetch_cansim_enhanced.R with --simple mode for single-series indicators, supporting --filter and --name flags - Delete save_verification_json.R (functionality merged into enhanced fetcher) - Rewrite check_verification_coverage.R to read from article frontmatter instead of hardcoded mappings - Add scripts/validate-verification.js for build-time validation - Update npm build to run validation before building (fails if missing) - Add npm validate command for checking without building - Update SKILL.md and data-workflow.md with new workflow Verification JSON requirement: - Every article must declare verification_json in frontmatter - Referenced JSON file must exist - Build blocked until all 122 articles have valid verification Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1 parent b483c84 commit 6360016

7 files changed

Lines changed: 461 additions & 491 deletions

File tree

.claude/skills/the-daily-generator/SKILL.md

Lines changed: 54 additions & 76 deletions
Original file line numberDiff line numberDiff line change
@@ -76,121 +76,100 @@ Before finalizing ANY article, verify all checks pass.
7676

7777
```
7878
1. FETCH DATA
79+
For complex tables (CPI, LFS, GDP, Retail):
7980
Rscript r-tools/fetch_cansim_enhanced.R <table-number> output
8081
→ output/data_<table>_enhanced.json
8182
82-
2. VERIFY JSON EXISTS (MANDATORY)
83-
Check that verification JSON file exists in output/
84-
If missing, create it using save_verification_json.R:
83+
For simple single-series indicators:
84+
Rscript r-tools/fetch_cansim_enhanced.R <table-number> output --simple \
85+
--filter "GEO=Canada" \
86+
--filter "Column Name=Filter Value" \
87+
--name "indicator_name"
88+
→ output/indicator_name.json
8589
86-
Rscript -e "source('r-tools/save_verification_json.R'); ..."
87-
88-
See "Verification JSON Requirement" section below.
89-
90-
3. READ AND CONFIRM DATA (MANDATORY)
90+
2. READ AND CONFIRM DATA (MANDATORY)
9191
Before writing ANY article content:
9292
- Read the JSON file completely
93-
- State the headline value: "latest.yoy_pct_change is X.X%"
94-
- State the reference period: "metadata.reference_period is YYYY-MM"
93+
- State the headline value: "yoy_pct is X.X%"
94+
- State the reference period: "ref_date is YYYY-MM"
9595
- If JSON doesn't exist or is stale, STOP - do not proceed
9696
97-
4. CREATE ENGLISH ARTICLE
97+
3. CREATE ENGLISH ARTICLE WITH FRONTMATTER (MANDATORY)
9898
docs/en/<slug>/index.md
9999
100-
5. CREATE FRENCH ARTICLE
101-
docs/fr/<slug-fr>/index.md
102-
103-
6. LINK ARTICLE TO JSON (MANDATORY)
104-
Record which JSON file this article uses in the article's source-info div:
100+
MUST include verification_json in frontmatter:
101+
---
102+
title: Manufacturing sales down 1.0% in October 2025
103+
verification_json: output/manufacturing_sales.json
104+
toc: false
105+
---
105106
106-
**Verification JSON:** `output/<indicator>.json`
107+
4. CREATE FRENCH ARTICLE
108+
docs/fr/<slug-fr>/index.md (same frontmatter requirement)
107109
108-
This creates the audit trail from article → data source.
109-
110-
7. UPDATE LANGUAGE MAP
110+
5. UPDATE LANGUAGE MAP
111111
Add slug pair to src/lang-map.js
112112
113-
8. UPDATE INDEX PAGES
113+
6. UPDATE INDEX PAGES
114114
Add entry to docs/en/index.md and docs/fr/index.md
115115
116-
9. PREVIEW AND VERIFY
116+
7. PREVIEW AND VERIFY
117117
npm run dev → http://localhost:3000
118118
```
119119

120120
## Verification JSON Requirement
121121

122-
**Every article MUST have a corresponding JSON verification file.** This enables:
122+
**Every article MUST declare its verification JSON in frontmatter.** This enables:
123+
- Build-time validation (fails if JSON missing)
123124
- Audit trail for data provenance
124-
- Post-publication verification
125125
- Detection of fabricated data
126126

127-
### For Tables with Enhanced Fetcher Support
127+
### Fetching Data
128+
129+
**One tool for all tables** - use `fetch_cansim_enhanced.R`:
128130

129-
Tables like CPI (18-10-0004), LFS (14-10-0287), Retail (20-10-0008) use the enhanced fetcher:
130131
```bash
132+
# Complex tables with subseries/provincial breakdowns
131133
Rscript r-tools/fetch_cansim_enhanced.R 18-10-0004 output
132-
```
133-
This automatically creates `output/data_18_10_0004_enhanced.json`.
134-
135-
### For Other Tables (Simpler Indicators)
136134

137-
Use the verification JSON utility:
138-
```r
139-
source("r-tools/save_verification_json.R")
140-
141-
# Fetch and save in one step
142-
fetch_and_save_verification(
143-
series_name = "Manufacturing Sales",
144-
table_number = "16-10-0047",
145-
GEO == "Canada",
146-
`Seasonal adjustment` == "Seasonally adjusted",
147-
`Principal statistics` == "Sales of goods manufactured (shipments)",
148-
`North American Industry Classification System (NAICS)` == "Manufacturing",
149-
unit = "millions"
150-
)
135+
# Simple single-series indicators
136+
Rscript r-tools/fetch_cansim_enhanced.R 16-10-0047 output --simple \
137+
--filter "GEO=Canada" \
138+
--filter "Seasonal adjustment=Seasonally adjusted" \
139+
--filter "Principal statistics=Sales of goods manufactured (shipments)" \
140+
--filter "North American Industry Classification System (NAICS)=Manufacturing" \
141+
--name "manufacturing_sales"
151142
```
152143

153-
Or manually:
154-
```r
155-
source("r-tools/save_verification_json.R")
144+
### Frontmatter Declaration
156145

157-
# Your custom fetch
158-
data <- get_cansim("16-10-0047") %>%
159-
filter(...) %>%
160-
select(REF_DATE, VALUE) %>%
161-
arrange(REF_DATE)
162-
163-
# Save verification JSON
164-
save_verification_json(
165-
series_name = "Manufacturing Sales",
166-
table_number = "16-10-0047",
167-
data = data,
168-
unit = "millions",
169-
article_slug = "manufacturing-sales-october-2025"
170-
)
171-
```
146+
Every article must include `verification_json` in its YAML frontmatter:
172147

173-
### JSON File Naming Convention
148+
```yaml
149+
---
150+
title: Manufacturing sales down 1.0% in October 2025
151+
verification_json: output/manufacturing_sales.json
152+
toc: false
153+
---
154+
```
174155

175-
| Series Name | JSON Filename |
176-
|-------------|---------------|
177-
| Manufacturing Sales | `manufacturing_sales.json` |
178-
| Consumer Price Index | `data_18_10_0004_enhanced.json` |
179-
| EI Claims | `ei_claims.json` |
180-
| Industrial Product Prices | `ippi.json` |
156+
This is **enforced at build time** - the site will not build if:
157+
- `verification_json` field is missing
158+
- The referenced JSON file doesn't exist
181159

182-
### Verification Before Publishing
160+
### Verification Audit
183161

184-
Before marking an article complete:
185-
1. Confirm JSON file exists in `output/`
186-
2. Confirm article values match JSON values
187-
3. Confirm article period ≤ JSON reference period
162+
Run the coverage checker to see which articles have valid verification:
163+
```bash
164+
Rscript r-tools/check_verification_coverage.R
165+
```
188166

189167
## Article Structure
190168

191169
```markdown
192170
---
193171
title: Consumer prices up 2.2% year over year in November 2025
172+
verification_json: output/data_18_10_0004_enhanced.json
194173
toc: false
195174
---
196175

@@ -327,15 +306,14 @@ import * as Plot from "npm:@observablehq/plot";
327306
## Quality Checklist
328307

329308
Before publishing:
330-
- [ ] **Verification JSON exists** in `output/` for this indicator
309+
- [ ] **`verification_json` in frontmatter** pointing to valid JSON file
331310
- [ ] All values from fetched JSON (no made-up data)
332311
- [ ] Headline leads with key statistic
333312
- [ ] Charts render with #AF3C43 color
334313
- [ ] Language switcher works (slug in lang-map.js)
335314
- [ ] Voice is neutral (no "surged", "plummeted")
336315
- [ ] French uses comma decimals (2,2 %)
337316
- [ ] R code reproducibility section included with correct table number
338-
- [ ] **Verification JSON path** noted in source-info div
339317

340318
## Review Mode
341319

.claude/skills/the-daily-generator/references/data-workflow.md

Lines changed: 34 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -349,73 +349,64 @@ Never substitute synthetic data.
349349
- If `output/<indicator>.json` doesn't exist → audit trail gap
350350

351351
**Prevention**:
352-
1. **EVERY article MUST have a verification JSON file**
353-
2. Use `save_verification_json.R` for indicators not covered by enhanced fetcher
354-
3. Record JSON path in article's source-info div
355-
4. Pre-commit hook should verify JSON exists for each article
352+
1. **EVERY article MUST declare `verification_json` in frontmatter**
353+
2. Use `fetch_cansim_enhanced.R --simple` for single-series indicators
354+
3. Build fails if verification JSON is missing or invalid
356355

357356
## Verification JSON System
358357

359-
Every article must have a corresponding JSON file to enable post-publication verification.
358+
Every article must declare its verification JSON in frontmatter. Build-time validation enforces this.
360359

361-
### Required JSON Files
360+
### Fetching Data
362361

363-
| Indicator | JSON File | Source |
364-
|-----------|-----------|--------|
365-
| CPI | `data_18_10_0004_enhanced.json` | Enhanced fetcher |
366-
| LFS | `lfs_real.json` | Enhanced fetcher |
367-
| GDP | `gdp_real.json` | Enhanced fetcher |
368-
| Retail | `retail_real.json` | Enhanced fetcher |
369-
| Trade | `trade_real.json` | Enhanced fetcher |
370-
| Manufacturing | `manufacturing_sales.json` | save_verification_json.R |
371-
| Food Services | `food_services.json` | save_verification_json.R |
372-
| IPPI | `ippi.json` | save_verification_json.R |
373-
| RMPI | `rmpi.json` | save_verification_json.R |
374-
| Electricity | `electricity.json` | save_verification_json.R |
375-
| EI Claims | `ei_claims.json` | save_verification_json.R |
362+
**One tool for all tables:**
376363

377-
### Using save_verification_json.R
364+
```bash
365+
# Complex tables (CPI, LFS, GDP, Retail)
366+
Rscript r-tools/fetch_cansim_enhanced.R 18-10-0004 output
378367

379-
For indicators not covered by the enhanced fetcher:
368+
# Simple single-series indicators
369+
Rscript r-tools/fetch_cansim_enhanced.R 14-10-0005 output --simple \
370+
--filter "GEO=Canada" \
371+
--filter "Type of claim=Initial and renewal claims, seasonally adjusted" \
372+
--filter "Claim detail=Received" \
373+
--name "ei_claims"
374+
```
380375

381-
```r
382-
source("r-tools/save_verification_json.R")
383-
384-
# One-step fetch and save
385-
fetch_and_save_verification(
386-
series_name = "EI Claims",
387-
table_number = "14-10-0005",
388-
GEO == "Canada",
389-
`Type of claim` == "Initial and renewal claims, seasonally adjusted",
390-
`Claim detail` == "Received",
391-
unit = "claims"
392-
)
393-
# → Creates output/ei_claims.json
376+
### Article Frontmatter
377+
378+
```yaml
379+
---
380+
title: EI claims down 1.1% in October 2025
381+
verification_json: output/ei_claims.json
382+
toc: false
383+
---
394384
```
395385

396-
### JSON Structure
386+
### JSON Structure (Simple Mode)
397387

398388
```json
399389
{
400-
"series": "EI Claims",
390+
"series": "Ei claims",
401391
"ref_date": "2025-10",
402392
"value": 267280,
403393
"mom_pct": -1.1,
404394
"yoy_pct": 2.1,
405395
"time_series": [...],
406396
"provenance": {
407397
"table_number": "14-10-0005",
408-
"fetched_at": "2026-01-10 14:30:00 EST",
398+
"fetched_at": "2026-01-10 14:30:00 PST",
409399
"statcan_url": "https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=1410000501",
410-
"filters_applied": {...},
411-
"article_slug": "ei-claims-october-2025"
400+
"filters_applied": {"GEO": "Canada", ...},
401+
"r_version": "4.5.0",
402+
"cansim_package_version": "0.4.4"
412403
}
413404
}
414405
```
415406

416407
### Verification Workflow
417408

418-
1. **Before generating**: Confirm JSON exists or create it
419-
2. **During generation**: Pull all values from JSON
420-
3. **After generation**: Record JSON path in article
421-
4. **During audit**: Compare article claims to JSON values
409+
1. **Fetch**: Run `fetch_cansim_enhanced.R` (with --simple if needed)
410+
2. **Declare**: Add `verification_json` to article frontmatter
411+
3. **Build**: Site validates JSON exists (fails if missing)
412+
4. **Audit**: Run `Rscript r-tools/check_verification_coverage.R`

package.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,8 @@
55
"type": "module",
66
"scripts": {
77
"dev": "observable preview",
8-
"build": "observable build && node scripts/fix-paths.js",
8+
"build": "node scripts/validate-verification.js && observable build && node scripts/fix-paths.js",
9+
"validate": "node scripts/validate-verification.js",
910
"clean": "rm -rf dist .observablehq",
1011
"test": "node --test scripts/*.test.js"
1112
},

0 commit comments

Comments
 (0)