@@ -11,6 +11,10 @@ This is the Cloudflare Workers version of the Mass Murder Canada application, co
1111- Cloudflare D1 database (SQLite-compatible)
1212- ** Admin Interface** : Secure admin dashboard for managing records and news stories
1313- ** REST API** : Full CRUD API for programmatic access
14+ - ** Asynchronous AI synthesis pipeline (staging-ready)** :
15+ - per-story summaries
16+ - record-level synthesis across all linked sources
17+ - source classification (` news ` , ` official ` , ` social ` , ` other ` ) with social-only incidents flagged as ` alleged `
1418- All public routes preserved:
1519 - ` / ` - Home page with all records
1620 - ` /records/group/:group ` - Filtered records by group
@@ -90,6 +94,44 @@ All environments use Cloudflare D1 databases. The staging database is kept in sy
9094
9195The worker uses Cloudflare Workers with D1 database. Local development uses ` wrangler dev ` which provides a local D1 database for testing.
9296
97+ ## AI Summaries (Staging)
98+
99+ Staging is configured for ** manual** AI generation to avoid unnecessary token usage:
100+
101+ - ` AI_SUMMARY_ENABLED = "true" `
102+ - ` AI_SUMMARY_AUTO_ON_SAVE = "false" `
103+ - ` AI_FETCH_JINA_FALLBACK = "true" `
104+ - ` AI_FETCH_MARKDOWN_NEW_FALLBACK = "true" `
105+ - ` AI_FETCH_SUMMARIZE_DAEMON_URL = "" ` (optional, if you run summarize daemon)
106+ - Queue binding: ` SUMMARY_QUEUE `
107+ - AI binding: ` AI `
108+
109+ From the admin dashboard, use the ` Generate AI ` button on a record row. This enqueues:
110+
111+ - per-story summarization for all linked sources
112+ - one synthesized summary written to ` records.ai_summary `
113+
114+ Extraction order for linked stories:
115+
116+ 1 . Stored ` body_text ` (if available)
117+ 2 . Direct fetch with structured extraction (JSON-LD ` articleBody ` , meta descriptions, ` <article>/<main> ` blocks)
118+ 3 . Optional summarize daemon fallback (` /v1/summarize ` + events stream) when configured
119+ 4 . Optional fallback readers (` r.jina.ai ` , ` markdown.new ` ) when direct extraction is weak
120+
121+ RCMP URLs are normalized from ` rcmp-grc.gc.ca ` to ` rcmp.ca ` before fetching to improve hit rate.
122+ Unsafe source URLs are skipped (only public ` http/https ` URLs are fetched; localhost/private IP/local hostnames are blocked).
123+
124+ If using summarize daemon with auth, set a secret token:
125+
126+ ` npx wrangler secret put AI_FETCH_SUMMARIZE_DAEMON_TOKEN --env staging `
127+
128+ ### Queue Setup
129+
130+ Before deploying staging with AI summaries, create the queue once:
131+
132+ 1 . ` npx wrangler queues create massmurdercanada-staging-summary `
133+ 2 . ` npx wrangler deploy --env staging `
134+
93135## Notes
94136
95137- The UI has been modernized with improved styling
0 commit comments