forked from gsd-build/gsd-2
-
Notifications
You must be signed in to change notification settings - Fork 0
Open
gsd-build/gsd-2
#2618Description
Problem
Journal events capture what happened but lack the structured metadata needed for post-mortem analysis:
- No duration tracking on unit executions — can't identify slow units
- No error classification — have to grep raw messages to understand failure modes
- No resource metadata — can't correlate issues with specific models or GSD versions
- Stuck-detection recovery is invisible in the journal — no events emitted when it fires
Solution
Enrich journal events with OTel-inspired fields:
iteration-start— resource block (gsdVersion, model, cwd) + causedBy from stuck recoveryunit-start— sessionId + messageOffset for session correlationunit-end— durationMs, error/errorType classification (structured errorContext preferred over regex)- New
stuck-detectedevent with level and reason
Forensics scanner surfaces slow units (>2min) and error distributions as anomalies.
Implementation
PR: gsd-build#2618
Builds on the structured ErrorContext from gsd-build#2612 — error classification in unit-end prefers errorContext.category over regex heuristics.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels