Skip to content

Commit 94c8a33

Browse files
committed
Updates
0 parents  commit 94c8a33

312 files changed

Lines changed: 76830 additions & 0 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
---
2+
name: the-daily-discover
3+
description: Discover and rank StatCan CANSIM tables for Daily article generation. Use when asked to find new topics, scan for recent releases, identify coverage gaps, prioritize which tables to cover, or explore what data is available.
4+
---
5+
6+
# The D-AI-LY Topic Discovery
7+
8+
Discover and prioritize StatCan CANSIM tables for article generation.
9+
10+
## Quick Start
11+
12+
```bash
13+
# Run discovery scan
14+
Rscript r-tools/discover_topics.R
15+
16+
# Check recently updated tables
17+
Rscript r-tools/discover_stories.R
18+
```
19+
20+
## Discovery Workflow
21+
22+
### 1. Check Existing Coverage (REQUIRED FIRST STEP)
23+
24+
Before scanning CANSIM, identify tables already covered:
25+
26+
```bash
27+
# List existing articles to extract table numbers
28+
ls docs/en/*/index.md | xargs grep -h "statcan.gc.ca/t1/tbl1" | grep -oE "[0-9]{2}-[0-9]{2}-[0-9]{4}" | sort -u
29+
```
30+
31+
**Exclude these table numbers from recommendations.** The goal is diversity—never suggest a table that's already been covered unless explicitly asked for an update.
32+
33+
### 2. Scan Available Data
34+
35+
```r
36+
library(cansim)
37+
library(dplyr)
38+
39+
cubes <- list_cansim_cubes()
40+
41+
# Filter for recent monthly releases
42+
candidates <- cubes %>%
43+
filter(frequencyCode == "6") %>%
44+
filter(cubeEndDate >= Sys.Date() - 60) %>%
45+
arrange(desc(cubeEndDate))
46+
```
47+
48+
### 3. Score Candidates
49+
50+
| Dimension | Weight | Criteria |
51+
|-----------|--------|----------|
52+
| Recency | 25% | Days since release (fresher = higher) |
53+
| Diversity | 25% | Sector gap from existing articles—deprioritize sectors covered in last 7 days |
54+
| Narrative | 25% | Regional variation, trend reversals |
55+
| Public Interest | 15% | Topic relevance to general audiences |
56+
| Data Quality | 10% | Complete coverage, national totals |
57+
58+
### 4. Validate Top Candidates
59+
60+
Before committing:
61+
1. **Verify table numbers**: Run `search_cansim_cubes("keyword")` to confirm table numbers are current—they go stale as StatCan discontinues/replaces tables
62+
2. Fetch sample data to verify structure
63+
3. Check for StatCan Daily release
64+
4. Verify national totals exist
65+
5. Confirm table is current (not deprecated)
66+
67+
## Output Format
68+
69+
```
70+
TOPIC DISCOVERY RESULTS - 2025-12-24
71+
=====================================
72+
73+
RANK SCORE TABLE SECTOR TITLE
74+
---- ----- ----------- ---------- ----------------------------------
75+
1 87 23-10-0079 Transport Airline operating statistics
76+
2 82 18-10-0205 Prices New Housing Price Index
77+
3 78 20-10-0003 Trade Wholesale trade
78+
79+
Top recommendation: 23-10-0079
80+
- Last release: 2025-12-23
81+
- Narrative hook: Transborder traffic down 9th consecutive month
82+
- Sector gap: No transport coverage in last 30 days
83+
```
84+
85+
## Reference Files
86+
87+
| Reference | When to Use |
88+
|-----------|-------------|
89+
| [sectors.md](references/sectors.md) | Sector categories, regional story patterns, scoring boosts |
90+
91+
## Handoff to Generator
92+
93+
After discovery, generate the article:
94+
95+
```
96+
/the-daily-generator 23-10-0079 --slug airline-passengers-october-2025
97+
```
98+
99+
For regional stories:
100+
```
101+
/the-daily-generator 34-10-0158 --slug ontario-housing-starts-november-2025 --geo Ontario
102+
```
Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
# Sector Categories and Regional Stories
2+
3+
Reference material for topic discovery scoring.
4+
5+
## Sector Categories
6+
7+
For diversity scoring, tables are categorized:
8+
9+
| Sector | Example Tables | Notes |
10+
|--------|----------------|-------|
11+
| **Prices** | 18-10-0004 (CPI), 18-10-0001 (Gas), 18-10-0205 (NHPI) | High public interest |
12+
| **Labour** | 14-10-0287 (LFS), 14-10-0355 (SEPH) | Core economic indicator |
13+
| **Trade** | 20-10-0056 (Retail), 20-10-0003 (Wholesale), 12-10-0011 (Intl) | Supply chain coverage |
14+
| **Housing** | 34-10-0158 (Starts), 34-10-0292 (Permits) | Housing market health |
15+
| **Production** | 36-10-0434 (GDP), 16-10-0048 (Manufacturing) | Output indicators |
16+
| **Transport** | 23-10-0079 (Aviation), 23-10-0253 (Rail) | Mobility/logistics |
17+
| **Finance** | 10-10-0006 (Credit), 36-10-0580 (Investment) | Financial conditions |
18+
| **Demographics** | 17-10-0009 (Population), 17-10-0014 (Migration) | Social indicators |
19+
| **Energy** | 25-10-0015 (Electricity), 25-10-0063 (Oil & Gas) | Resource production |
20+
21+
## Narrative Potential Indicators
22+
23+
High-scoring narratives typically have:
24+
25+
- **Trend reversals**: "First increase since...", "Ended X-month streak"
26+
- **Regional divergence**: Provinces moving in opposite directions
27+
- **Component splits**: House vs. land, goods vs. services, domestic vs. international
28+
- **Milestone crossings**: Index hits new high/low, crosses round number
29+
- **Seasonal anomalies**: Unexpected pattern vs. typical seasonality
30+
31+
## Geographic Levels in CANSIM
32+
33+
| Level | Description | Story Potential |
34+
|-------|-------------|-----------------|
35+
| **Canada** | National totals | Headline indicators |
36+
| **Provincial/Territorial** | 13 jurisdictions | Regional divergence, provincial spotlight |
37+
| **CMA** | Census Metropolitan Areas | City comparisons, metro-specific trends |
38+
| **Economic Region** | Sub-provincial regions | Local economic conditions |
39+
40+
## Regional Story Types
41+
42+
### 1. Divergence Stories
43+
When regions move in opposite directions:
44+
```r
45+
provincial_data %>%
46+
group_by(REF_DATE) %>%
47+
summarise(
48+
range = max(yoy_change) - min(yoy_change),
49+
leader = GEO[which.max(yoy_change)],
50+
laggard = GEO[which.min(yoy_change)]
51+
) %>%
52+
filter(range > 5) # >5 percentage points spread
53+
```
54+
55+
### 2. Metro Spotlight
56+
Deep-dive on a specific CMA:
57+
- Toronto housing market dynamics
58+
- Vancouver cost of living
59+
- Calgary energy sector employment
60+
- Montreal manufacturing
61+
62+
### 3. Provincial Rankings
63+
League tables comparing provinces:
64+
- Unemployment rates by province
65+
- Housing affordability index
66+
- Retail sales per capita
67+
68+
### 4. Regional Outliers
69+
One region bucking the national trend:
70+
- "Saskatchewan leads provincial gains..."
71+
- "Atlantic Canada bucks national decline..."
72+
73+
## Regional Story Scoring
74+
75+
| Factor | Score Boost | Condition |
76+
|--------|-------------|-----------|
77+
| High provincial variance | +15 | Range > 5 pp |
78+
| Clear leader/laggard | +10 | One province dominates |
79+
| CMA data available | +5 | Metro-level granularity |
80+
| Regional trend reversal | +20 | Province bucks national trend |
81+
82+
## Checking Geographic Coverage
83+
84+
```r
85+
check_geo_coverage <- function(table_number) {
86+
df <- get_cansim(table_number)
87+
geos <- unique(df$GEO)
88+
89+
list(
90+
has_provinces = any(geos %in% c("Ontario", "Quebec", "British Columbia")),
91+
has_cmas = any(grepl("CMA|Toronto|Vancouver|Montreal", geos)),
92+
geo_count = length(geos),
93+
geo_list = head(geos, 10)
94+
)
95+
}
96+
```
97+
98+
## Example Regional Headlines
99+
100+
- "Toronto housing starts surge while Vancouver stalls"
101+
- "Prairie provinces lead employment gains in November"
102+
- "Quebec inflation outpaces national average for 6th month"
103+
- "Atlantic Canada gasoline prices hit 18-month low"
Lines changed: 129 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,129 @@
1+
---
2+
name: the-daily-generator
3+
description: Generate Statistics Canada Daily-style articles from CANSIM tables. Use when asked to create a Daily article, analyze StatCan data, run the D-AI-LY pipeline, generate a statistical bulletin, write about Canadian economic indicators, or cover a CANSIM table release.
4+
---
5+
6+
# The D-AI-LY Article Generator
7+
8+
Generate StatCan "The Daily"-style statistical bulletins from CANSIM data tables.
9+
10+
## Critical Rule
11+
12+
**NEVER use synthetic, made-up, or placeholder data.** Every number must come from real Statistics Canada data fetched via the `cansim` R package. If data fetch fails, do not generate the article.
13+
14+
## Workflow
15+
16+
```
17+
1. FETCH DATA
18+
Rscript r-tools/fetch_cansim_enhanced.R <table-number> output
19+
→ output/data_<table>_enhanced.json
20+
21+
2. CREATE ENGLISH ARTICLE
22+
docs/en/<slug>/index.md
23+
24+
3. CREATE FRENCH ARTICLE
25+
docs/fr/<slug-fr>/index.md
26+
27+
4. UPDATE LANGUAGE MAP
28+
Add slug pair to src/lang-map.js
29+
30+
5. UPDATE INDEX PAGES
31+
Add entry to docs/en/index.md and docs/fr/index.md
32+
33+
6. PREVIEW AND VERIFY
34+
npm run dev → http://localhost:3000
35+
```
36+
37+
## Article Structure
38+
39+
```markdown
40+
---
41+
title: Consumer prices up 2.2% year over year in November 2025
42+
toc: false
43+
---
44+
45+
# Consumer prices up 2.2% year over year in November 2025
46+
47+
<p class="release-date">Data released: December 5, 2025 | Published: December 22, 2025 <span class="article-type-tag release">New Release</span></p>
48+
49+
<div class="highlights">
50+
51+
**Highlights**
52+
- Key finding with number
53+
- Secondary finding
54+
- Regional highlight
55+
56+
</div>
57+
58+
[Body paragraphs with Observable Plot charts]
59+
60+
<div class="note-to-readers">
61+
62+
## Note to readers
63+
[Methodology]
64+
65+
</div>
66+
67+
<div class="source-info">
68+
69+
**Source:** Statistics Canada, [Table XX-XX-XXXX](url)
70+
**DOI:** [https://doi.org/...](url)
71+
72+
</div>
73+
```
74+
75+
## Date Handling
76+
77+
**For new releases** (covering the most recent data period):
78+
- Extract `release_time` from the fetched JSON metadata (e.g., `"2025-12-05 08:30:00"`)
79+
- Format both dates in the release-date line:
80+
```html
81+
<p class="release-date">Data released: December 5, 2025 | Published: December 22, 2025 <span class="article-type-tag release">New Release</span></p>
82+
```
83+
- "Data released" = when StatCan published the data (`metadata.release_time`)
84+
- "Published" = when the article is being generated (today's date)
85+
86+
**For backfill articles** (covering historical periods):
87+
- **Omit the release-date paragraph entirely**
88+
- The `<span class="article-type-tag backfill">Backfill</span>` tag (placed elsewhere) indicates this is historical coverage
89+
90+
## Reference Files
91+
92+
Load these as needed:
93+
94+
| Reference | When to Use |
95+
|-----------|-------------|
96+
| [voice.md](references/voice.md) | Tone, headline rules, language guidelines |
97+
| [chart-styles.md](references/chart-styles.md) | Observable Plot patterns, color palette |
98+
| [french.md](references/french.md) | French formatting, translations, province names |
99+
| [tables.md](references/tables.md) | Common CANSIM tables, URL construction |
100+
| [data-workflow.md](references/data-workflow.md) | JSON structure, data validation |
101+
| [content-strategy.md](references/content-strategy.md) | Release-driven vs story-driven approaches |
102+
| [troubleshooting.md](references/troubleshooting.md) | Common errors and solutions |
103+
104+
## Quick Reference
105+
106+
**StatCan red:** `#AF3C43`
107+
108+
**Headline format:** "[Indicator] [up/down] X.X% [comparison] in [Month Year]"
109+
110+
**Chart import:** (once per article, first code block only)
111+
```js
112+
import * as Plot from "npm:@observablehq/plot";
113+
```
114+
115+
**PID from table:** Remove dashes, add "01" → 18-10-0004 becomes 1810000401
116+
117+
## Quality Checklist
118+
119+
Before publishing:
120+
- [ ] All values from fetched JSON (no made-up data)
121+
- [ ] Headline leads with key statistic
122+
- [ ] Charts render with #AF3C43 color
123+
- [ ] Language switcher works (slug in lang-map.js)
124+
- [ ] Voice is neutral (no "surged", "plummeted")
125+
- [ ] French uses comma decimals (2,2 %)
126+
127+
## Review Mode
128+
129+
If user requests "review mode", pause after generating and ask for approval before publishing.

0 commit comments

Comments
 (0)