|
1 | 1 | # The D-AI-LY |
2 | 2 |
|
3 | | -An LLM-driven version of Statistics Canada's "The Daily" - generating automated statistical bulletins from CANSIM data. |
| 3 | +An autonomous, AI-driven version of Statistics Canada's "The Daily" — generating bilingual statistical bulletins from CANSIM data. |
4 | 4 |
|
5 | 5 | ## Overview |
6 | 6 |
|
7 | | -The D-AI-LY fetches data from Statistics Canada's CANSIM database and generates news-style articles following The Daily's distinctive voice: neutral, clinical, and structured using the inverted pyramid format. |
| 7 | +The D-AI-LY runs daily at 8am, automatically: |
| 8 | +1. Discovering newsworthy CANSIM table updates |
| 9 | +2. Fetching real data from Statistics Canada |
| 10 | +3. Generating bilingual articles (EN + FR) following The Daily's voice |
| 11 | +4. Publishing to a static website |
| 12 | + |
| 13 | +## Architecture |
| 14 | + |
| 15 | +``` |
| 16 | +┌─────────────────────────────────────────────────────────────┐ |
| 17 | +│ SCHEDULED TRIGGER (launchd) │ |
| 18 | +│ Runs daily at 8am │ |
| 19 | +└─────────────────────┬───────────────────────────────────────┘ |
| 20 | + │ |
| 21 | + ┌───────────▼────────────┐ |
| 22 | + │ AI LAYER 1 │ ← discover_topics.R |
| 23 | + │ Topic Selection │ Score by recency, sector |
| 24 | + └───────────┬────────────┘ |
| 25 | + │ |
| 26 | + ┌───────────▼────────────┐ |
| 27 | + │ DETERMINISTIC CORE │ ← fetch_table.R |
| 28 | + │ R + cansim package │ Config-driven extraction |
| 29 | + └───────────┬────────────┘ |
| 30 | + │ |
| 31 | + ┌───────────▼────────────┐ |
| 32 | + │ AI LAYER 2 │ ← Claude Code |
| 33 | + │ Article Generation │ /the-daily-generator skill |
| 34 | + └───────────┬────────────┘ |
| 35 | + │ |
| 36 | + ┌───────────▼────────────┐ |
| 37 | + │ DETERMINISTIC CORE │ ← Observable Framework |
| 38 | + │ Build + Publish │ npm run build |
| 39 | + └────────────────────────┘ |
| 40 | +``` |
8 | 41 |
|
9 | 42 | ## Quick Start |
10 | 43 |
|
| 44 | +### Prerequisites |
| 45 | + |
| 46 | +- **R** with packages: `cansim`, `dplyr`, `tidyr`, `jsonlite` |
| 47 | +- **Node.js** 20+ |
| 48 | +- **Claude Code** CLI (`npm install -g @anthropic-ai/claude-code`) |
| 49 | + |
| 50 | +### Installation |
| 51 | + |
11 | 52 | ```bash |
12 | | -# Run the full pipeline (fetches latest CPI data and generates article) |
13 | | -./run_pipeline.sh |
| 53 | +# Clone and install |
| 54 | +git clone https://github.com/mountainmath/the-daily.git |
| 55 | +cd the-daily |
| 56 | +npm install |
14 | 57 |
|
15 | | -# Or specify a different table |
16 | | -./run_pipeline.sh "20-10-0008" |
| 58 | +# Install R packages |
| 59 | +Rscript -e 'install.packages(c("cansim", "dplyr", "tidyr", "jsonlite"))' |
17 | 60 | ``` |
18 | 61 |
|
19 | | -## Pipeline |
| 62 | +### Run Manually |
20 | 63 |
|
21 | | -1. **Data Fetching** (`r-tools/fetch_cansim_data.R`) |
22 | | - - Uses the `cansim` R package to fetch CANSIM tables |
23 | | - - Calculates period-over-period and year-over-year changes |
24 | | - - Exports analysis-ready JSON |
| 64 | +```bash |
| 65 | +# Full pipeline (discovery → fetch → generate → build) |
| 66 | +./automation/run_pipeline.sh |
25 | 67 |
|
26 | | -2. **Article Generation** (`generate_article.py`) |
27 | | - - Generates content in The Daily voice |
28 | | - - Creates headline, highlights, body sections |
29 | | - - Embeds Observable Plot chart |
| 68 | +# Specific table |
| 69 | +./automation/run_pipeline.sh --table=18-10-0004 |
30 | 70 |
|
31 | | -3. **Output** (`output/articles/`) |
32 | | - - Self-contained HTML with inline chart |
33 | | - - StatCan-inspired styling |
| 71 | +# Prep only (no article generation) |
| 72 | +./automation/run_pipeline.sh --prep-only |
| 73 | +``` |
34 | 74 |
|
35 | | -## The Daily Voice |
| 75 | +### Install Daily Automation |
36 | 76 |
|
37 | | -Articles follow strict style guidelines: |
38 | | -- **Neutral and clinical** - no emotional language |
39 | | -- **Inverted pyramid** - most important facts first |
40 | | -- **Plain language** - accessible to general audiences |
41 | | -- Headlines lead with the key number |
42 | | -- Always compare to previous period AND year-over-year |
| 77 | +```bash |
| 78 | +# Install launchd agent (runs at 8am daily) |
| 79 | +./automation/install.sh |
| 80 | + |
| 81 | +# Check status |
| 82 | +./automation/install.sh --status |
| 83 | + |
| 84 | +# Remove automation |
| 85 | +./automation/install.sh --remove |
| 86 | +``` |
43 | 87 |
|
44 | | -## Structure |
| 88 | +## Project Structure |
45 | 89 |
|
46 | 90 | ``` |
47 | 91 | the-daily/ |
| 92 | +├── automation/ |
| 93 | +│ ├── run_pipeline.sh # Daily orchestrator |
| 94 | +│ ├── install.sh # Automation installer |
| 95 | +│ └── com.the-daily.pipeline.plist |
| 96 | +│ |
48 | 97 | ├── r-tools/ |
49 | | -│ └── fetch_cansim_data.R # CANSIM data fetching |
50 | | -├── templates/ |
51 | | -│ └── article.html # HTML template with Observable Plot |
52 | | -├── output/ |
53 | | -│ ├── articles/ # Generated articles |
54 | | -│ └── data_*.json # Cached data |
55 | | -├── generate_article.py # Article generator |
56 | | -└── run_pipeline.sh # Full pipeline script |
| 98 | +│ ├── discover_topics.R # Topic discovery & ranking |
| 99 | +│ ├── fetch_table.R # CANSIM data fetcher |
| 100 | +│ └── table_configs.json # Table extraction configs (25 tables) |
| 101 | +│ |
| 102 | +├── docs/ # Observable Framework site |
| 103 | +│ ├── en/ # English articles |
| 104 | +│ ├── fr/ # French articles |
| 105 | +│ └── style.css # StatCan-inspired styling |
| 106 | +│ |
| 107 | +├── .claude/skills/ |
| 108 | +│ ├── the-daily-generator/ # Article generation skill |
| 109 | +│ ├── the-daily-discover/ # Topic discovery skill |
| 110 | +│ └── the-daily-publish/ # Build & deploy skill |
| 111 | +│ |
| 112 | +├── .github/workflows/ |
| 113 | +│ └── daily.yml # GitHub Action (fallback) |
| 114 | +│ |
| 115 | +└── output/ # Generated data files |
| 116 | +``` |
| 117 | + |
| 118 | +## Skills |
| 119 | + |
| 120 | +The project uses Claude Code skills for AI-driven tasks: |
| 121 | + |
| 122 | +| Skill | Purpose | |
| 123 | +|-------|---------| |
| 124 | +| `/the-daily-generator` | Generate bilingual articles from CANSIM data | |
| 125 | +| `/the-daily-discover` | Identify newsworthy table updates | |
| 126 | +| `/the-daily-publish` | Build and deploy the site | |
| 127 | + |
| 128 | +## Data Pipeline |
| 129 | + |
| 130 | +### Topic Discovery |
| 131 | + |
| 132 | +The R script `discover_topics.R` scans CANSIM for recently updated tables and ranks them by: |
| 133 | + |
| 134 | +- **Recency** (25%) — How recently was data released? |
| 135 | +- **Diversity** (25%) — Avoid covering same sector repeatedly |
| 136 | +- **Public Interest** (50%) — Labour, prices, housing score highest |
| 137 | + |
| 138 | +### Data Fetching |
| 139 | + |
| 140 | +The `fetch_table.R` script uses configs from `table_configs.json` to: |
| 141 | + |
| 142 | +- Fetch data via the `cansim` R package |
| 143 | +- Apply dimension filters (GEO, categories) |
| 144 | +- Calculate MoM and YoY changes |
| 145 | +- Export analysis-ready JSON |
| 146 | + |
| 147 | +### Article Generation |
| 148 | + |
| 149 | +Claude Code follows the skill documentation to: |
| 150 | + |
| 151 | +- Write in The Daily's neutral, clinical voice |
| 152 | +- Create Observable markdown with embedded charts |
| 153 | +- Generate both English and French versions |
| 154 | +- Verify data integrity against source JSON |
| 155 | + |
| 156 | +## The Daily Voice |
| 157 | + |
| 158 | +Articles follow strict style guidelines: |
| 159 | + |
| 160 | +- **Neutral and clinical** — no emotional language ("increased" not "surged") |
| 161 | +- **Inverted pyramid** — most important facts first |
| 162 | +- **Plain language** — accessible to general audiences |
| 163 | +- Headlines lead with the key statistic |
| 164 | +- Always include MoM and YoY comparisons |
| 165 | +- Hedge causation: "amid", "coinciding with" (not "caused by") |
| 166 | + |
| 167 | +## Configuration |
| 168 | + |
| 169 | +### Adding New Tables |
| 170 | + |
| 171 | +1. Add entry to `r-tools/table_configs.json`: |
| 172 | + |
| 173 | +```json |
| 174 | +"18-10-0004": { |
| 175 | + "name": "Consumer Price Index", |
| 176 | + "headline": "Consumer prices", |
| 177 | + "unit": "index", |
| 178 | + "filters": { |
| 179 | + "GEO": "Canada", |
| 180 | + "Products and product groups": "All-items" |
| 181 | + } |
| 182 | +} |
| 183 | +``` |
| 184 | + |
| 185 | +2. Test the fetch: |
| 186 | +```bash |
| 187 | +Rscript r-tools/fetch_table.R 18-10-0004 output |
57 | 188 | ``` |
58 | 189 |
|
59 | | -## Dependencies |
| 190 | +### Automation Schedule |
60 | 191 |
|
61 | | -**R packages:** |
62 | | -- `cansim` - Statistics Canada data access |
63 | | -- `dplyr`, `tidyr` - Data manipulation |
64 | | -- `jsonlite` - JSON export |
| 192 | +Default: 8:00 AM daily. To change, edit `automation/com.the-daily.pipeline.plist` and reinstall. |
65 | 193 |
|
66 | | -**Python:** Standard library only (json, re, pathlib, datetime) |
| 194 | +## Development |
67 | 195 |
|
68 | | -**Web:** Observable Plot, D3.js (loaded via CDN) |
| 196 | +```bash |
| 197 | +# Start dev server |
| 198 | +npm run dev |
69 | 199 |
|
70 | | -## Next Steps |
| 200 | +# Build site |
| 201 | +npm run build |
71 | 202 |
|
72 | | -- [ ] Autonomous table selection (browse 7,000+ tables) |
73 | | -- [ ] LLM-enhanced article generation |
74 | | -- [ ] Static site with index page |
75 | | -- [ ] Scheduled automation |
| 203 | +# Run discovery only |
| 204 | +Rscript r-tools/discover_topics.R --configured --json |
| 205 | +``` |
| 206 | + |
| 207 | +## Fallback Mechanism |
| 208 | + |
| 209 | +If local automation fails (Mac offline, Claude Code issues): |
| 210 | + |
| 211 | +1. GitHub Action runs at 8am ET (1pm UTC) |
| 212 | +2. Runs discovery + fetch |
| 213 | +3. Creates GitHub Issue with instructions |
| 214 | +4. User runs `claude "/the-daily-generator TABLE"` when available |
76 | 215 |
|
77 | 216 | ## License |
78 | 217 |
|
79 | | -This is an experimental project. Data comes from Statistics Canada (Crown Copyright). |
| 218 | +MIT License. Data is from Statistics Canada (Crown Copyright). |
| 219 | + |
| 220 | +## Acknowledgments |
| 221 | + |
| 222 | +- Statistics Canada for the CANSIM data |
| 223 | +- The `cansim` R package by Jens von Bergmann |
| 224 | +- Observable Framework for the static site |
| 225 | +- Anthropic Claude for AI capabilities |
0 commit comments