Skip to content

Commit 12aa7ee

Browse files
committed
Merge branch 'feature/autonomous-pipeline'
2 parents 103d016 + eae7e42 commit 12aa7ee

270 files changed

Lines changed: 40054 additions & 1566 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
---
2+
name: the-daily-discover
3+
description: Discover and rank StatCan CANSIM tables for Daily article generation. Use when asked to find new topics, scan for recent releases, identify coverage gaps, prioritize which tables to cover, or explore what data is available.
4+
---
5+
6+
# The D-AI-LY Topic Discovery
7+
8+
Discover and prioritize StatCan CANSIM tables for article generation.
9+
10+
## Quick Start
11+
12+
```bash
13+
# Run discovery scan
14+
Rscript r-tools/discover_topics.R
15+
16+
# Check recently updated tables
17+
Rscript r-tools/discover_stories.R
18+
```
19+
20+
## Discovery Workflow
21+
22+
### 1. Check Existing Coverage (REQUIRED FIRST STEP)
23+
24+
Before scanning CANSIM, identify tables already covered:
25+
26+
```bash
27+
# List existing articles to extract table numbers
28+
ls docs/en/*/index.md | xargs grep -h "statcan.gc.ca/t1/tbl1" | grep -oE "[0-9]{2}-[0-9]{2}-[0-9]{4}" | sort -u
29+
```
30+
31+
**Exclude these table numbers from recommendations.** The goal is diversity—never suggest a table that's already been covered unless explicitly asked for an update.
32+
33+
### 2. Scan Available Data
34+
35+
```r
36+
library(cansim)
37+
library(dplyr)
38+
39+
cubes <- list_cansim_cubes()
40+
41+
# Filter for recent monthly releases
42+
candidates <- cubes %>%
43+
filter(frequencyCode == "6") %>%
44+
filter(cubeEndDate >= Sys.Date() - 60) %>%
45+
arrange(desc(cubeEndDate))
46+
```
47+
48+
### 3. Score Candidates
49+
50+
| Dimension | Weight | Criteria |
51+
|-----------|--------|----------|
52+
| Recency | 25% | Days since release (fresher = higher) |
53+
| Diversity | 25% | Sector gap from existing articles—deprioritize sectors covered in last 7 days |
54+
| Narrative | 25% | Regional variation, trend reversals |
55+
| Public Interest | 15% | Topic relevance to general audiences |
56+
| Data Quality | 10% | Complete coverage, national totals |
57+
58+
### 4. Validate Top Candidates
59+
60+
Before committing:
61+
1. **Verify table numbers**: Run `search_cansim_cubes("keyword")` to confirm table numbers are current—they go stale as StatCan discontinues/replaces tables
62+
2. Fetch sample data to verify structure
63+
3. Check for StatCan Daily release
64+
4. Verify national totals exist
65+
5. Confirm table is current (not deprecated)
66+
67+
## Output Format
68+
69+
```
70+
TOPIC DISCOVERY RESULTS - 2025-12-24
71+
=====================================
72+
73+
RANK SCORE TABLE SECTOR TITLE
74+
---- ----- ----------- ---------- ----------------------------------
75+
1 87 23-10-0079 Transport Airline operating statistics
76+
2 82 18-10-0205 Prices New Housing Price Index
77+
3 78 20-10-0003 Trade Wholesale trade
78+
79+
Top recommendation: 23-10-0079
80+
- Last release: 2025-12-23
81+
- Narrative hook: Transborder traffic down 9th consecutive month
82+
- Sector gap: No transport coverage in last 30 days
83+
```
84+
85+
## Reference Files
86+
87+
| Reference | When to Use |
88+
|-----------|-------------|
89+
| [sectors.md](references/sectors.md) | Sector categories, regional story patterns, scoring boosts |
90+
91+
## Handoff to Generator
92+
93+
After discovery, generate the article:
94+
95+
```
96+
/the-daily-generator 23-10-0079 --slug airline-passengers-october-2025
97+
```
98+
99+
For regional stories:
100+
```
101+
/the-daily-generator 34-10-0158 --slug ontario-housing-starts-november-2025 --geo Ontario
102+
```
Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
# Sector Categories and Regional Stories
2+
3+
Reference material for topic discovery scoring.
4+
5+
## Sector Categories
6+
7+
For diversity scoring, tables are categorized:
8+
9+
| Sector | Example Tables | Notes |
10+
|--------|----------------|-------|
11+
| **Prices** | 18-10-0004 (CPI), 18-10-0001 (Gas), 18-10-0205 (NHPI) | High public interest |
12+
| **Labour** | 14-10-0287 (LFS), 14-10-0355 (SEPH) | Core economic indicator |
13+
| **Trade** | 20-10-0056 (Retail), 20-10-0003 (Wholesale), 12-10-0011 (Intl) | Supply chain coverage |
14+
| **Housing** | 34-10-0158 (Starts), 34-10-0292 (Permits) | Housing market health |
15+
| **Production** | 36-10-0434 (GDP), 16-10-0048 (Manufacturing) | Output indicators |
16+
| **Transport** | 23-10-0079 (Aviation), 23-10-0253 (Rail) | Mobility/logistics |
17+
| **Finance** | 10-10-0006 (Credit), 36-10-0580 (Investment) | Financial conditions |
18+
| **Demographics** | 17-10-0009 (Population), 17-10-0014 (Migration) | Social indicators |
19+
| **Energy** | 25-10-0015 (Electricity), 25-10-0063 (Oil & Gas) | Resource production |
20+
21+
## Narrative Potential Indicators
22+
23+
High-scoring narratives typically have:
24+
25+
- **Trend reversals**: "First increase since...", "Ended X-month streak"
26+
- **Regional divergence**: Provinces moving in opposite directions
27+
- **Component splits**: House vs. land, goods vs. services, domestic vs. international
28+
- **Milestone crossings**: Index hits new high/low, crosses round number
29+
- **Seasonal anomalies**: Unexpected pattern vs. typical seasonality
30+
31+
## Geographic Levels in CANSIM
32+
33+
| Level | Description | Story Potential |
34+
|-------|-------------|-----------------|
35+
| **Canada** | National totals | Headline indicators |
36+
| **Provincial/Territorial** | 13 jurisdictions | Regional divergence, provincial spotlight |
37+
| **CMA** | Census Metropolitan Areas | City comparisons, metro-specific trends |
38+
| **Economic Region** | Sub-provincial regions | Local economic conditions |
39+
40+
## Regional Story Types
41+
42+
### 1. Divergence Stories
43+
When regions move in opposite directions:
44+
```r
45+
provincial_data %>%
46+
group_by(REF_DATE) %>%
47+
summarise(
48+
range = max(yoy_change) - min(yoy_change),
49+
leader = GEO[which.max(yoy_change)],
50+
laggard = GEO[which.min(yoy_change)]
51+
) %>%
52+
filter(range > 5) # >5 percentage points spread
53+
```
54+
55+
### 2. Metro Spotlight
56+
Deep-dive on a specific CMA:
57+
- Toronto housing market dynamics
58+
- Vancouver cost of living
59+
- Calgary energy sector employment
60+
- Montreal manufacturing
61+
62+
### 3. Provincial Rankings
63+
League tables comparing provinces:
64+
- Unemployment rates by province
65+
- Housing affordability index
66+
- Retail sales per capita
67+
68+
### 4. Regional Outliers
69+
One region bucking the national trend:
70+
- "Saskatchewan leads provincial gains..."
71+
- "Atlantic Canada bucks national decline..."
72+
73+
## Regional Story Scoring
74+
75+
| Factor | Score Boost | Condition |
76+
|--------|-------------|-----------|
77+
| High provincial variance | +15 | Range > 5 pp |
78+
| Clear leader/laggard | +10 | One province dominates |
79+
| CMA data available | +5 | Metro-level granularity |
80+
| Regional trend reversal | +20 | Province bucks national trend |
81+
82+
## Checking Geographic Coverage
83+
84+
```r
85+
check_geo_coverage <- function(table_number) {
86+
df <- get_cansim(table_number)
87+
geos <- unique(df$GEO)
88+
89+
list(
90+
has_provinces = any(geos %in% c("Ontario", "Quebec", "British Columbia")),
91+
has_cmas = any(grepl("CMA|Toronto|Vancouver|Montreal", geos)),
92+
geo_count = length(geos),
93+
geo_list = head(geos, 10)
94+
)
95+
}
96+
```
97+
98+
## Example Regional Headlines
99+
100+
- "Toronto housing starts surge while Vancouver stalls"
101+
- "Prairie provinces lead employment gains in November"
102+
- "Quebec inflation outpaces national average for 6th month"
103+
- "Atlantic Canada gasoline prices hit 18-month low"

0 commit comments

Comments
 (0)