Problem
Journal events capture what happened but lack the structured metadata needed for post-mortem analysis:
- No duration tracking on unit executions — can't identify slow units
- No error classification — have to grep raw messages to understand failure modes
- No resource metadata — can't correlate issues with specific models or GSD versions
- Stuck-detection recovery is invisible in the journal — no events emitted when it fires
Solution
Enrich journal events with OTel-inspired fields:
iteration-start — resource block (gsdVersion, model, cwd) + causedBy from stuck recovery
unit-start — sessionId + messageOffset for session correlation
unit-end — durationMs, error/errorType classification (structured errorContext preferred over regex)
- New
stuck-detected event with level and reason
Forensics scanner surfaces slow units (>2min) and error distributions as anomalies.
Implementation
PR: gsd-build#2618
Builds on the structured ErrorContext from gsd-build#2612 — error classification in unit-end prefers errorContext.category over regex heuristics.
Problem
Journal events capture what happened but lack the structured metadata needed for post-mortem analysis:
Solution
Enrich journal events with OTel-inspired fields:
iteration-start— resource block (gsdVersion, model, cwd) + causedBy from stuck recoveryunit-start— sessionId + messageOffset for session correlationunit-end— durationMs, error/errorType classification (structured errorContext preferred over regex)stuck-detectedevent with level and reasonForensics scanner surfaces slow units (>2min) and error distributions as anomalies.
Implementation
PR: gsd-build#2618
Builds on the structured ErrorContext from gsd-build#2612 — error classification in unit-end prefers errorContext.category over regex heuristics.