Skip to content

Commit 64b41b8

Browse files
dshkolclaude
andcommitted
Add: About pages and updated README
- Added bilingual About pages (EN/FR) explaining The D-AI-LY - Updated README with current architecture and quick start - Added About links to index pages 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1 parent 4af6418 commit 64b41b8

5 files changed

Lines changed: 375 additions & 49 deletions

File tree

README.md

Lines changed: 193 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -1,79 +1,225 @@
11
# The D-AI-LY
22

3-
An LLM-driven version of Statistics Canada's "The Daily" - generating automated statistical bulletins from CANSIM data.
3+
An autonomous, AI-driven version of Statistics Canada's "The Daily" generating bilingual statistical bulletins from CANSIM data.
44

55
## Overview
66

7-
The D-AI-LY fetches data from Statistics Canada's CANSIM database and generates news-style articles following The Daily's distinctive voice: neutral, clinical, and structured using the inverted pyramid format.
7+
The D-AI-LY runs daily at 8am, automatically:
8+
1. Discovering newsworthy CANSIM table updates
9+
2. Fetching real data from Statistics Canada
10+
3. Generating bilingual articles (EN + FR) following The Daily's voice
11+
4. Publishing to a static website
12+
13+
## Architecture
14+
15+
```
16+
┌─────────────────────────────────────────────────────────────┐
17+
│ SCHEDULED TRIGGER (launchd) │
18+
│ Runs daily at 8am │
19+
└─────────────────────┬───────────────────────────────────────┘
20+
21+
┌───────────▼────────────┐
22+
│ AI LAYER 1 │ ← discover_topics.R
23+
│ Topic Selection │ Score by recency, sector
24+
└───────────┬────────────┘
25+
26+
┌───────────▼────────────┐
27+
│ DETERMINISTIC CORE │ ← fetch_table.R
28+
│ R + cansim package │ Config-driven extraction
29+
└───────────┬────────────┘
30+
31+
┌───────────▼────────────┐
32+
│ AI LAYER 2 │ ← Claude Code
33+
│ Article Generation │ /the-daily-generator skill
34+
└───────────┬────────────┘
35+
36+
┌───────────▼────────────┐
37+
│ DETERMINISTIC CORE │ ← Observable Framework
38+
│ Build + Publish │ npm run build
39+
└────────────────────────┘
40+
```
841

942
## Quick Start
1043

44+
### Prerequisites
45+
46+
- **R** with packages: `cansim`, `dplyr`, `tidyr`, `jsonlite`
47+
- **Node.js** 20+
48+
- **Claude Code** CLI (`npm install -g @anthropic-ai/claude-code`)
49+
50+
### Installation
51+
1152
```bash
12-
# Run the full pipeline (fetches latest CPI data and generates article)
13-
./run_pipeline.sh
53+
# Clone and install
54+
git clone https://github.com/mountainmath/the-daily.git
55+
cd the-daily
56+
npm install
1457

15-
# Or specify a different table
16-
./run_pipeline.sh "20-10-0008"
58+
# Install R packages
59+
Rscript -e 'install.packages(c("cansim", "dplyr", "tidyr", "jsonlite"))'
1760
```
1861

19-
## Pipeline
62+
### Run Manually
2063

21-
1. **Data Fetching** (`r-tools/fetch_cansim_data.R`)
22-
- Uses the `cansim` R package to fetch CANSIM tables
23-
- Calculates period-over-period and year-over-year changes
24-
- Exports analysis-ready JSON
64+
```bash
65+
# Full pipeline (discovery → fetch → generate → build)
66+
./automation/run_pipeline.sh
2567

26-
2. **Article Generation** (`generate_article.py`)
27-
- Generates content in The Daily voice
28-
- Creates headline, highlights, body sections
29-
- Embeds Observable Plot chart
68+
# Specific table
69+
./automation/run_pipeline.sh --table=18-10-0004
3070

31-
3. **Output** (`output/articles/`)
32-
- Self-contained HTML with inline chart
33-
- StatCan-inspired styling
71+
# Prep only (no article generation)
72+
./automation/run_pipeline.sh --prep-only
73+
```
3474

35-
## The Daily Voice
75+
### Install Daily Automation
3676

37-
Articles follow strict style guidelines:
38-
- **Neutral and clinical** - no emotional language
39-
- **Inverted pyramid** - most important facts first
40-
- **Plain language** - accessible to general audiences
41-
- Headlines lead with the key number
42-
- Always compare to previous period AND year-over-year
77+
```bash
78+
# Install launchd agent (runs at 8am daily)
79+
./automation/install.sh
80+
81+
# Check status
82+
./automation/install.sh --status
83+
84+
# Remove automation
85+
./automation/install.sh --remove
86+
```
4387

44-
## Structure
88+
## Project Structure
4589

4690
```
4791
the-daily/
92+
├── automation/
93+
│ ├── run_pipeline.sh # Daily orchestrator
94+
│ ├── install.sh # Automation installer
95+
│ └── com.the-daily.pipeline.plist
96+
4897
├── r-tools/
49-
│ └── fetch_cansim_data.R # CANSIM data fetching
50-
├── templates/
51-
│ └── article.html # HTML template with Observable Plot
52-
├── output/
53-
│ ├── articles/ # Generated articles
54-
│ └── data_*.json # Cached data
55-
├── generate_article.py # Article generator
56-
└── run_pipeline.sh # Full pipeline script
98+
│ ├── discover_topics.R # Topic discovery & ranking
99+
│ ├── fetch_table.R # CANSIM data fetcher
100+
│ └── table_configs.json # Table extraction configs (25 tables)
101+
102+
├── docs/ # Observable Framework site
103+
│ ├── en/ # English articles
104+
│ ├── fr/ # French articles
105+
│ └── style.css # StatCan-inspired styling
106+
107+
├── .claude/skills/
108+
│ ├── the-daily-generator/ # Article generation skill
109+
│ ├── the-daily-discover/ # Topic discovery skill
110+
│ └── the-daily-publish/ # Build & deploy skill
111+
112+
├── .github/workflows/
113+
│ └── daily.yml # GitHub Action (fallback)
114+
115+
└── output/ # Generated data files
116+
```
117+
118+
## Skills
119+
120+
The project uses Claude Code skills for AI-driven tasks:
121+
122+
| Skill | Purpose |
123+
|-------|---------|
124+
| `/the-daily-generator` | Generate bilingual articles from CANSIM data |
125+
| `/the-daily-discover` | Identify newsworthy table updates |
126+
| `/the-daily-publish` | Build and deploy the site |
127+
128+
## Data Pipeline
129+
130+
### Topic Discovery
131+
132+
The R script `discover_topics.R` scans CANSIM for recently updated tables and ranks them by:
133+
134+
- **Recency** (25%) — How recently was data released?
135+
- **Diversity** (25%) — Avoid covering same sector repeatedly
136+
- **Public Interest** (50%) — Labour, prices, housing score highest
137+
138+
### Data Fetching
139+
140+
The `fetch_table.R` script uses configs from `table_configs.json` to:
141+
142+
- Fetch data via the `cansim` R package
143+
- Apply dimension filters (GEO, categories)
144+
- Calculate MoM and YoY changes
145+
- Export analysis-ready JSON
146+
147+
### Article Generation
148+
149+
Claude Code follows the skill documentation to:
150+
151+
- Write in The Daily's neutral, clinical voice
152+
- Create Observable markdown with embedded charts
153+
- Generate both English and French versions
154+
- Verify data integrity against source JSON
155+
156+
## The Daily Voice
157+
158+
Articles follow strict style guidelines:
159+
160+
- **Neutral and clinical** — no emotional language ("increased" not "surged")
161+
- **Inverted pyramid** — most important facts first
162+
- **Plain language** — accessible to general audiences
163+
- Headlines lead with the key statistic
164+
- Always include MoM and YoY comparisons
165+
- Hedge causation: "amid", "coinciding with" (not "caused by")
166+
167+
## Configuration
168+
169+
### Adding New Tables
170+
171+
1. Add entry to `r-tools/table_configs.json`:
172+
173+
```json
174+
"18-10-0004": {
175+
"name": "Consumer Price Index",
176+
"headline": "Consumer prices",
177+
"unit": "index",
178+
"filters": {
179+
"GEO": "Canada",
180+
"Products and product groups": "All-items"
181+
}
182+
}
183+
```
184+
185+
2. Test the fetch:
186+
```bash
187+
Rscript r-tools/fetch_table.R 18-10-0004 output
57188
```
58189

59-
## Dependencies
190+
### Automation Schedule
60191

61-
**R packages:**
62-
- `cansim` - Statistics Canada data access
63-
- `dplyr`, `tidyr` - Data manipulation
64-
- `jsonlite` - JSON export
192+
Default: 8:00 AM daily. To change, edit `automation/com.the-daily.pipeline.plist` and reinstall.
65193

66-
**Python:** Standard library only (json, re, pathlib, datetime)
194+
## Development
67195

68-
**Web:** Observable Plot, D3.js (loaded via CDN)
196+
```bash
197+
# Start dev server
198+
npm run dev
69199

70-
## Next Steps
200+
# Build site
201+
npm run build
71202

72-
- [ ] Autonomous table selection (browse 7,000+ tables)
73-
- [ ] LLM-enhanced article generation
74-
- [ ] Static site with index page
75-
- [ ] Scheduled automation
203+
# Run discovery only
204+
Rscript r-tools/discover_topics.R --configured --json
205+
```
206+
207+
## Fallback Mechanism
208+
209+
If local automation fails (Mac offline, Claude Code issues):
210+
211+
1. GitHub Action runs at 8am ET (1pm UTC)
212+
2. Runs discovery + fetch
213+
3. Creates GitHub Issue with instructions
214+
4. User runs `claude "/the-daily-generator TABLE"` when available
76215

77216
## License
78217

79-
This is an experimental project. Data comes from Statistics Canada (Crown Copyright).
218+
MIT License. Data is from Statistics Canada (Crown Copyright).
219+
220+
## Acknowledgments
221+
222+
- Statistics Canada for the CANSIM data
223+
- The `cansim` R package by Jens von Bergmann
224+
- Observable Framework for the static site
225+
- Anthropic Claude for AI capabilities

docs/en/about/index.md

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
---
2+
title: About The D-AI-LY
3+
toc: false
4+
---
5+
6+
# About The D-AI-LY
7+
8+
The D-AI-LY is an experimental project that generates statistical news bulletins using artificial intelligence. It draws inspiration from Statistics Canada's official publication, [The Daily](https://www.statcan.gc.ca/en/dai/dai), which has been Canada's first release of new statistical data since 1932.
9+
10+
## What We Do
11+
12+
Every day, The D-AI-LY:
13+
14+
1. **Scans** Statistics Canada's CANSIM database for recently updated tables
15+
2. **Selects** newsworthy topics based on recency, public interest, and sector diversity
16+
3. **Fetches** the latest data using the official Statistics Canada API
17+
4. **Generates** bilingual articles (English and French) explaining the key findings
18+
5. **Publishes** the articles to this website
19+
20+
All data comes directly from Statistics Canada. The articles are generated by AI and reviewed for accuracy.
21+
22+
## How It Works
23+
24+
```
25+
┌─────────────────────────────────────────┐
26+
│ Daily Automation (8am) │
27+
└─────────────────┬───────────────────────┘
28+
29+
┌─────────────▼─────────────┐
30+
│ Topic Discovery (AI) │ What's newsworthy today?
31+
└─────────────┬─────────────┘
32+
33+
┌─────────────▼─────────────┐
34+
│ Data Fetch (R/cansim) │ Get real StatCan data
35+
└─────────────┬─────────────┘
36+
37+
┌─────────────▼─────────────┐
38+
│ Article Generation (AI) │ Write EN + FR articles
39+
└─────────────┬─────────────┘
40+
41+
┌─────────────▼─────────────┐
42+
│ Publish to Website │ Build and deploy
43+
└───────────────────────────┘
44+
```
45+
46+
## Data Sources
47+
48+
All statistical data is sourced from **Statistics Canada's CANSIM database** (now called the New Dissemination Model). We use the [cansim R package](https://mountainmath.github.io/cansim/) to access official data tables.
49+
50+
Each article includes:
51+
- The specific CANSIM table number
52+
- A direct link to the source data
53+
- The reference period for the statistics
54+
55+
## AI Transparency
56+
57+
This project uses AI (Claude by Anthropic) for two purposes:
58+
59+
1. **Topic Selection**: Identifying which statistical releases are most newsworthy
60+
2. **Article Writing**: Generating the text of each article based on the data
61+
62+
The AI follows strict guidelines to maintain the neutral, clinical voice of statistical reporting. It does not editorialize or make predictions—it simply reports the numbers.
63+
64+
**Important**: While we strive for accuracy, AI-generated content may contain errors. Always verify important statistics by consulting the [official Statistics Canada source](https://www.statcan.gc.ca/).
65+
66+
## The Daily Voice
67+
68+
Articles follow the style of Statistics Canada's The Daily:
69+
70+
- **Neutral and clinical** — no emotional language
71+
- **Inverted pyramid** — most important facts first
72+
- **Plain language** — accessible to general audiences
73+
- Headlines lead with the key number
74+
- Always compare to previous period AND year-over-year
75+
76+
## Open Source
77+
78+
The D-AI-LY is an open source project. You can view the code, report issues, or contribute on GitHub:
79+
80+
**[github.com/mountainmath/the-daily](https://github.com/mountainmath/the-daily)**
81+
82+
## Disclaimer
83+
84+
The D-AI-LY is not affiliated with Statistics Canada. This is an independent experimental project that uses publicly available data. For official statistics, please visit [statcan.gc.ca](https://www.statcan.gc.ca/).
85+
86+
---
87+
88+
<p style="text-align: center; color: #666; font-size: 0.875rem;">
89+
<a href="../">← Back to Latest Releases</a>
90+
</p>

docs/en/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ toc: false
66
# The Daily — Latest Releases
77

88
<p class="feed-links">
9-
<a href="./archive/">Browse all articles in the archive</a>
9+
<a href="./archive/">Browse all articles in the archive</a> · <a href="./about/">About The D-AI-LY</a>
1010
</p>
1111

1212
---

0 commit comments

Comments
 (0)