Skip to content

Commit e9758ce

Browse files
dshkolclaude
andcommitted
Add data verification checkpoint to article generation skill
Prevent data errors by enforcing a mandatory step to read and confirm JSON data before writing any article content. Changes: - Add step 2 "READ AND CONFIRM DATA" to workflow requiring explicit confirmation of headline value and reference period before writing - Add "How to Use Fetched Data" section with JSON field mappings - Add forbidden patterns (writing from memory, estimating values) - Document Jan 2026 failure mode where articles had plausible but incorrect hardcoded values (2.50% vs 2.25%, 80.8% vs 80.7%) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1 parent c6cff32 commit e9758ce

2 files changed

Lines changed: 54 additions & 5 deletions

File tree

.claude/skills/the-daily-generator/SKILL.md

Lines changed: 33 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -11,26 +11,54 @@ Generate StatCan "The Daily"-style statistical bulletins from CANSIM data tables
1111

1212
**NEVER use synthetic, made-up, or placeholder data.** Every number must come from real Statistics Canada data fetched via the `cansim` R package. If data fetch fails, do not generate the article.
1313

14+
## How to Use Fetched Data
15+
16+
Every number in the article MUST come from the JSON file:
17+
18+
| Article Element | JSON Source |
19+
|-----------------|-------------|
20+
| Headline percentage | `latest.yoy_pct_change` or `latest.mom_pct_change` |
21+
| Headline value | `latest.value` |
22+
| Chart data array | `time_series[]` - copy date and value exactly |
23+
| Component breakdown | `subseries[]` |
24+
| Provincial table | `provincial[]` |
25+
| Reference period | `metadata.reference_period` |
26+
27+
**Forbidden patterns:**
28+
- ❌ Writing a number from memory
29+
- ❌ Estimating values not in JSON
30+
- ❌ Using "plausible" numbers to fill gaps
31+
- ❌ Confusing similar terms (e.g., "Bank Rate" vs "Policy Rate")
32+
33+
**If a value isn't in the JSON, don't include it in the article.**
34+
1435
## Workflow
1536

1637
```
1738
1. FETCH DATA
1839
Rscript r-tools/fetch_cansim_enhanced.R <table-number> output
1940
→ output/data_<table>_enhanced.json
2041
21-
2. CREATE ENGLISH ARTICLE
42+
2. READ AND CONFIRM DATA (MANDATORY)
43+
Before writing ANY article content:
44+
- Read the JSON file completely
45+
- State the headline value: "latest.yoy_pct_change is X.X%"
46+
- State the reference period: "metadata.reference_period is YYYY-MM"
47+
- If JSON doesn't exist or is stale, STOP - do not proceed
48+
49+
3. CREATE ENGLISH ARTICLE
2250
docs/en/<slug>/index.md
2351
24-
3. CREATE FRENCH ARTICLE
52+
4. CREATE FRENCH ARTICLE
2553
docs/fr/<slug-fr>/index.md
2654
27-
4. UPDATE LANGUAGE MAP
55+
5. UPDATE LANGUAGE MAP
2856
Add slug pair to src/lang-map.js
2957
30-
5. UPDATE INDEX PAGES
58+
6. UPDATE INDEX PAGES
3159
Add entry to docs/en/index.md and docs/fr/index.md
3260
33-
6. PREVIEW AND VERIFY
61+
7. PREVIEW AND VERIFY
3462
npm run dev → http://localhost:3000
3563
```
3664

.claude/skills/the-daily-generator/references/data-workflow.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -224,3 +224,24 @@ If the R script cannot fetch data:
224224
5. Try again later
225225

226226
Never substitute synthetic data.
227+
228+
## Known Failure Modes
229+
230+
### Hardcoded Plausible Values (Jan 2026)
231+
232+
**What happened**: Articles were generated with numbers that looked reasonable but weren't from the fetched data.
233+
234+
**Examples**:
235+
- Interest rates article used 2.50% (the "Bank Rate") instead of 2.25% (the "Policy Rate" from JSON)
236+
- Manufacturing capacity used 80.8% instead of actual 80.7% from JSON
237+
238+
**Root causes**:
239+
1. JSON file wasn't read before generating article text
240+
2. LLM used approximate values from training data instead of exact JSON values
241+
3. Similar-sounding terms confused (Bank Rate ≠ Policy Rate)
242+
243+
**Prevention**:
244+
1. ALWAYS read JSON file before writing ANY numbers
245+
2. ALWAYS state headline value out loud: "The JSON shows X.X%"
246+
3. Copy-paste values from JSON, don't type from memory
247+
4. For financial data: verify exact terminology matches the JSON field name

0 commit comments

Comments
 (0)