Skip to content

Commit ba81c46

Browse files
dshkolclaude
andcommitted
Fix: Prevent data fabrication in historical articles
- Generator now strips subseries/provincial data when rebasing to historical periods (these sections only contain latest-period data) - Added "Red Flags: Fabrication Detection" section to SKILL.md with validation checklists for component, provincial, YoY, and period match - Documented two new failure modes in data-workflow.md: - Stale breakdown data in historical articles (identical values across months) - YoY calculation errors (10x decimal place mistakes) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1 parent 444de03 commit ba81c46

3 files changed

Lines changed: 75 additions & 0 deletions

File tree

.claude/skills/the-daily-generator/SKILL.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,29 @@ Every number in the article MUST come from the JSON file:
3232

3333
**If a value isn't in the JSON, don't include it in the article.**
3434

35+
## Red Flags: Fabrication Detection
36+
37+
Before publishing ANY article, verify these checks pass:
38+
39+
### 1. Subseries/Component Data
40+
- [ ] Does `subseries[]` array exist in JSON? If empty/missing → omit breakdown from article
41+
- [ ] Can you cite the exact JSON path for EACH component value?
42+
- [ ] If multiple months generated: are component values DIFFERENT across months? (Identical = fabricated)
43+
44+
### 2. Provincial Data
45+
- [ ] Does `provincial[]` array exist in JSON? If empty/missing → omit provincial table
46+
- [ ] Can you cite the exact JSON path for EACH provincial value?
47+
- [ ] If multiple months generated: are provincial values DIFFERENT across months? (Identical = fabricated)
48+
49+
### 3. YoY Calculations
50+
- [ ] Cross-validate: Calculate YoY manually from time_series values
51+
- [ ] Formula: (current_value - year_ago_value) / year_ago_value × 100
52+
- [ ] If calculated YoY differs from claimed YoY by >0.1pp → STOP and investigate
53+
54+
### 4. Data-Article Period Match
55+
- [ ] Does JSON `metadata.reference_period` match the article's reference period?
56+
- [ ] If generating historical article: verify subseries/provincial data matches that period (not latest)
57+
3558
## Workflow
3659

3760
```

.claude/skills/the-daily-generator/references/data-workflow.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -245,3 +245,42 @@ Never substitute synthetic data.
245245
2. ALWAYS state headline value out loud: "The JSON shows X.X%"
246246
3. Copy-paste values from JSON, don't type from memory
247247
4. For financial data: verify exact terminology matches the JSON field name
248+
249+
### Stale Breakdown Data in Historical Articles (Jan 2026)
250+
251+
**What happened**: CPI articles for July-October 2025 were generated using `--ref-date` parameter. Headlines and time series correctly showed historical periods, BUT component breakdowns and provincial tables showed November 2025 data.
252+
253+
**Result**: 5 articles had identical fabricated breakdown data - same component percentages (Food: 4.2%, Household: 3.3%, etc.) and same provincial values across all months.
254+
255+
**Root cause**:
256+
1. `rebase_data_to_period()` in Python generator only rebased headlines, not breakdowns
257+
2. JSON file only contained latest period's `subseries[]` and `provincial[]` data
258+
3. Function copied stale data without validation
259+
260+
**Detection red flags**:
261+
- Component percentages **identical** across multiple months → fabricated
262+
- Provincial YoY values **identical** across multiple months → fabricated
263+
- Real economic data has natural month-to-month variation
264+
265+
**Prevention**:
266+
1. Generator now strips `subseries`/`provincial` when rebasing to historical period
267+
2. Always verify JSON `metadata.reference_period` matches article period
268+
3. For historical articles: only include headline + trend, not breakdowns
269+
4. If generating multiple months: verify values DIFFER between months
270+
271+
### YoY Calculation Errors (Jan 2026)
272+
273+
**What happened**: GDP October 2025 article claimed +0.4% YoY, but actual calculation from time_series was +0.04% (10x error).
274+
275+
**The data**:
276+
- time_series: Oct 2024 = 2317.1B, Oct 2025 = 2318.0B
277+
- Correct YoY: (2318.0 - 2317.1) / 2317.1 × 100 = **0.04%**
278+
- Article incorrectly stated: **0.4%**
279+
280+
**Root cause**: Decimal place error when transcribing small percentage changes.
281+
282+
**Prevention**:
283+
1. Always cross-validate YoY by manual calculation from time_series
284+
2. If time_series shows Oct 2024 = X and Oct 2025 = Y, verify (Y-X)/X × 100 matches claimed YoY
285+
3. Be especially careful with small percentage changes (<1%)
286+
4. Double-check decimal places: 0.04% ≠ 0.4% ≠ 4%

generate_article_observable.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -210,6 +210,9 @@ def rebase_data_to_period(data: Dict[str, Any], target_ref_date: str) -> Dict[st
210210
import copy
211211
data = copy.deepcopy(data) # Don't mutate original
212212

213+
# Save original reference period before any modifications
214+
original_ref_period = data.get("metadata", {}).get("reference_period", "")
215+
213216
time_series = data.get("time_series", [])
214217

215218
# Find the target period in time series
@@ -247,6 +250,16 @@ def rebase_data_to_period(data: Dict[str, Any], target_ref_date: str) -> Dict[st
247250
trimmed_series = [e for e in time_series if e.get("ref_date", "") <= target_ref_date]
248251
data["time_series"] = trimmed_series
249252

253+
# Strip subseries and provincial data when rebasing to historical period
254+
# These sections contain only latest-period breakdowns and cannot be rebased
255+
if target_ref_date != original_ref_period:
256+
if "subseries" in data:
257+
del data["subseries"]
258+
logger.warning(f"Removed subseries (only had {original_ref_period} data, not {target_ref_date})")
259+
if "provincial" in data:
260+
del data["provincial"]
261+
logger.warning(f"Removed provincial (only had {original_ref_period} data, not {target_ref_date})")
262+
250263
logger.info(f"Rebased data to reference period: {target_ref_date}")
251264
logger.info(f" Value: {new_latest['value']}, YoY: {new_latest['yoy_pct_change']}%")
252265

0 commit comments

Comments
 (0)