Improve GSD Journal with targeted OpenTelemetry concepts for better forensics

## Summary

The GSD Journal captures structured events for auto-mode iterations but is missing several OpenTelemetry-inspired concepts that would significantly improve forensics diagnosis quality. This issue tracks five targeted improvements, ordered by value.

---

## 1. Correlate journal events to pi session (highest value)

**Problem:** The journal (`unit-start`/`unit-end`) and the pi session JSONL (LLM calls, tool executions) are completely disconnected. Forensics infers the link by timestamp proximity, which is fragile and wastes LLM context on full-file scanning.

**Fix:** Add `sessionId` and `messageOffset` to `unit-start`:

\`\`\`typescript
{ 
  eventType: "unit-start", 
  data: { 
    unitId, unitType,
    sessionId: "abc123",   // pi session file identifier
    messageOffset: 42      // message count at unit start
  } 
}
\`\`\`

**Impact:** Forensics can jump directly from `unit-end { status: "error" }` to the exact tool call that failed, without scanning the whole session file.

---

## 2. Explicit durations on unit-end

**Problem:** Duration must be computed by pairing `unit-start.ts` and `unit-end.ts` timestamps. Forensics can't query slow units directly.

**Fix:** Add `durationMs` to `unit-end`:

\`\`\`typescript
{ eventType: "unit-end", data: { unitId, status, artifactVerified, durationMs: 142000 } }
\`\`\`

**Impact:** Timeout anomaly detection (`queryJournal({ eventType: "unit-end" })` + filter on `durationMs`) works from journal alone without cross-referencing activity logs.

---

## 3. Structured error detail on unit-end

**Problem:** `unit-end { status: "error" }` carries no error detail in the journal. Forensics must parse the pi session JSONL to find what went wrong.

**Fix:**

\`\`\`typescript
{ 
  eventType: "unit-end", 
  data: { 
    unitId, status: "error",
    error: "Bash tool failed: permission denied on /etc/hosts",
    errorType: "tool-error" | "timeout" | "context-overflow" | "unknown"
  }
}
\`\`\`

**Impact:** Forensics can classify failure modes and generate a summary section from journal-only data.

---

## 4. Resource attributes on iteration-start

**Problem:** Journal entries carry no metadata about the GSD version or model in use. Forensics fetches this from `GSD_VERSION` env and `metrics.json` separately, making regression correlation manual.

**Fix:** Add a `resource` block to `iteration-start`:

\`\`\`typescript
{ 
  eventType: "iteration-start", 
  data: { iteration },
  resource: { gsdVersion: "2.48.0", model: "anthropic/claude-sonnet-4-20250514", cwd: "/..." }
}
\`\`\`

**Impact:** Forensics can answer "did this regression start after the model changed?" from journal alone.

---

## 5. Cross-iteration causal links for recovery chains

**Problem:** `causedBy` only works within a single `flowId`. When stuck detection fires and the next iteration is a recovery attempt, there is no journal link between them.

**Fix:** Emit `causedBy` on the recovery iteration's `iteration-start` pointing to the `stuck-detected` event:

\`\`\`typescript
// iteration N+1 recovery
{ flowId: "flow-bbb", seq: 1, eventType: "iteration-start",
  causedBy: { flowId: "flow-aaa", seq: 5 }  // points to stuck-detected
}
\`\`\`

**Impact:** Forensics reconstructs the full recovery chain (`stuck → cache-invalidate → retry → still-stuck → hard-stop`) from the data model rather than inferring it from timestamps.

---

## What NOT to add

- OTLP export / external collectors — GSD is a local tool
- Sampling — 100% event capture is correct for auto-mode frequency
- Combined span-style start/end — the two-event model is *better* for crash forensics (a crash leaves `unit-start` with no matching `unit-end`, which is precisely the signal)

---

## Affected files

- `src/resources/extensions/gsd/journal.ts` — `JournalEntry` type + `emitJournalEvent`
- `src/resources/extensions/gsd/auto/loop.ts` — `iteration-start` emit
- `src/resources/extensions/gsd/auto/phases.ts` — `unit-start`, `unit-end` emits
- `src/resources/extensions/gsd/auto/loop-deps.ts` — `LoopDeps.emitJournalEvent` signature
- `src/resources/extensions/gsd/forensics.ts` — update journal summary section to use new fields
- `src/resources/extensions/gsd/tests/journal*.test.ts` — update fixtures

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve GSD Journal with targeted OpenTelemetry concepts for better forensics #4

Summary

1. Correlate journal events to pi session (highest value)

2. Explicit durations on unit-end

3. Structured error detail on unit-end

4. Resource attributes on iteration-start

5. Cross-iteration causal links for recovery chains

What NOT to add

Affected files

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Improve GSD Journal with targeted OpenTelemetry concepts for better forensics #4

Description

Summary

1. Correlate journal events to pi session (highest value)

2. Explicit durations on unit-end

3. Structured error detail on unit-end

4. Resource attributes on iteration-start

5. Cross-iteration causal links for recovery chains

What NOT to add

Affected files

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions