Commit 28eb038
fix(delivery): isolate per-session failures so one bad session can't stall delivery for all
The active and sweep delivery poll loops iterate every session in a plain
for-loop wrapped in a single try/catch. deliverSessionMessages re-threw on
failure, so an unhandled error for one session aborted the entire tick and
silently halted message delivery for every other agent until a daemon restart.
Observed failure: a crashed container left an orphaned hot journal
(outbound.db-journal) beside its outbound.db. drainSession opens outbound.db
read-only (single-writer invariant), but rolling back the hot journal requires
a write, so even the SELECT in getDueOutboundMessages threw "attempt to write a
readonly database" on every tick (~1.3s), poisoning delivery for all sessions
ordered after the broken one. A monitoring agent on another session stopped
receiving its scheduled tasks and stopped delivering alerts for hours.
Catch and log per session in deliverSessionMessages so a single unhealthy
session is contained. The broken session self-heals on its next container
start, when the writer opens the DB read-write and rolls the journal back.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>1 parent ee7f891 commit 28eb038
1 file changed
Lines changed: 21 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
159 | 159 | | |
160 | 160 | | |
161 | 161 | | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
162 | 183 | | |
163 | 184 | | |
164 | 185 | | |
| |||
0 commit comments