Skip to content

fix(transcript): dedupe earlier terminal rows on the activity timeline#65

Merged
chicoxyzzy merged 2 commits intomasterfrom
fix/transcript-dedupe-terminal-rows
May 9, 2026
Merged

fix(transcript): dedupe earlier terminal rows on the activity timeline#65
chicoxyzzy merged 2 commits intomasterfrom
fix/transcript-dedupe-terminal-rows

Conversation

@chicoxyzzy
Copy link
Copy Markdown
Member

Summary

A Hecate Chat run that completes successfully emits two terminal-shaped activity rows:

  • A synced `task_run` mirror surfacing as `run_result` with title "Run finished".
  • An explicit `Activity{Type: status, Title: finalAgentChatActivityTitle(status)}` appended by the agent-chat handler at turn end (`handler_agent_chat.go:635`).

The existing `isTerminalRunSummary` filter only drops rows whose title literally matches `/^run\s+(completed|failed|cancelled)$/i`, so type-only collisions (`type="completed"` + `title="Done"`, `type="run_result"` + `title="Run finished"`, …) survive. Operator sees two side-by-side endings for one run.

`terminalAgentActivity` already picks the latest terminal-shaped row; everything else with a terminal shape is redundant. New rule in `compactAgentActivities`: drop earlier terminal rows by index when a survivor exists.

The line-250 special case (`type="completed"` + `title="final answer"`) is technically subsumed by the new rule, but it predates this change so I left it untouched — narrower scope.

Closes the "duplicate completed / run completed rows" papercut from the Hecate Chat polish list.

Test plan

  • New test pins the regression (`tool_call`, `run_result`, `completed` → only the latest terminal row survives)
  • `npx vitest run src/features/transcript` — 64 passes, 0 fails
  • Full UI suite — 648 passes, 0 fails

A Hecate Chat run that completes successfully emits two
terminal-shaped activity rows: a synced `task_run` mirror that
shows up as `run_result` with title "Run finished", and an
explicit `Activity{Type: status, Title: finalAgentChatActivityTitle(status)}`
appended by the agent-chat handler at turn end. The existing
`isTerminalRunSummary` filter only drops rows whose title
literally matches `/^run\s+(completed|failed|cancelled)$/i`,
which leaves type-only collisions (type=completed title="Done",
type=run_result title="Run finished", etc.) on the timeline. The
operator sees two side-by-side endings for one run.

`terminalAgentActivity` already picks the latest terminal-shaped
row; everything else with a terminal shape is redundant. Drop
earlier terminal rows by index when one was selected.

The line-250 special case (type=completed + title="final answer")
is technically subsumed by the new rule, but I'm leaving it
alone — it predates this change and dropping it would broaden the
dedupe surface beyond what the bug report needed.

Test pins the regression: three activities (tool_call,
run_result, completed) → only the last terminal row survives,
the earlier `run_result` is dropped. UI suite clean: 648 passes.
Copilot AI review requested due to automatic review settings May 9, 2026 16:59
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the UI transcript activity timeline compaction to prevent “double terminal” endings (e.g., a run_result row plus a later completed/failed/cancelled status row) by dropping earlier terminal-shaped activity rows when a later terminal row exists.

Changes:

  • Add terminal-row dedupe logic in compactAgentActivities to keep only the latest terminal-shaped row.
  • Introduce isTerminalActivity helper to identify terminal-shaped activities.
  • Add a regression test covering tool_call + run_result + completed where only the latest terminal row should survive.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
ui/src/features/transcript/TranscriptActivityTimeline.tsx Adds terminal-row dedupe in activity compaction and a helper for terminal detection.
ui/src/features/transcript/TranscriptActivityTimeline.test.tsx Adds a test to pin the duplicate-terminal-row regression.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ui/src/features/transcript/TranscriptActivityTimeline.tsx Outdated
Comment thread ui/src/features/transcript/TranscriptActivityTimeline.tsx Outdated
Comment thread ui/src/features/transcript/TranscriptActivityTimeline.test.tsx
Three follow-ups Copilot raised on the dedupe rule. All real:

1. **Predicate mismatch.** `terminalAgentActivity` walked back-to-front
   over {completed, failed, cancelled} + activity.terminal, but the
   new dedupe filter used `isTerminalActivity` which also includes
   `run_result`. When a run_result-typed terminal arrived after a
   `completed` row, terminalAgentActivity picked the EARLIER row
   and the dedupe (using lastIndexOf on that row) dropped the
   later run_result — backwards. Unify the predicate: both the
   chooser and the dedupe filter now run through `isTerminalActivity`,
   so they cannot disagree about what counts as terminal.

2. **Prefer diagnostic rows.** The synced `task_run` mirror
   carries `terminal: true` AND a useful detail like "LLM call
   failed on turn 3". The agent-chat handler at turn end appends
   a generic `Activity{Type: "failed", Title: "Failed"}` with no
   diagnostic detail. The previous dedupe naïvely kept "the latest
   terminal-shaped row," which dropped the informative diagnostic
   in favor of the bare-bones generic. Replace
   `terminalAgentActivity`'s ad-hoc walk with
   `pickTerminalActivityIndex`, a small chooser that prefers the
   latest `terminal: true` row when one exists, falling back to
   the latest type-only-terminal row otherwise.

3. **Stale test comment.** The fixture title was "Run finished"
   but the comment said "Run completed". The choice of "finished"
   was deliberate (it bypasses the `/^run (completed|failed|cancelled)$/`
   regex in isTerminalRunSummary so the row reaches the new
   dedupe filter); comment now reflects the actual fixture and
   spells out why the title matters.

New test pins the prefer-diagnostic rule: a `run_result` row with
`terminal: true` and detail "LLM call failed on turn 3" wins over
a generic `failed` row from the agent-chat handler. UI suite clean:
649 passes.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (1)

ui/src/features/transcript/TranscriptActivityTimeline.tsx:248

  • terminalIndex is picked from the full activities list, but the loop later unconditionally drops the generic agent-chat terminal row when it is type === "completed" && title === "Final answer" (line 250). In the common case [run_result("Run finished"), completed("Final answer")], pickTerminalActivityIndex will choose the "Final answer" row, causing the dedupe rule to drop the run_result row and then the special-case to drop the chosen row as well—leaving no terminal row in the expanded timeline. Consider excluding this special-cased "Final answer" row from pickTerminalActivityIndex/isTerminalActivity, or computing terminalIndex after applying the same visibility filters so the chosen terminal row is guaranteed to survive compaction.
  const terminalIndex = pickTerminalActivityIndex(activities);
  const lastTaskRunIndex = lastIndexOfTaskRunActivity(activities);
  const lastApprovalIndexByID = lastIndexByApprovalID(activities);
  const out: AgentChatActivityRecord[] = [];
  for (const [index, activity] of activities.entries()) {

@chicoxyzzy chicoxyzzy merged commit ed9b43f into master May 9, 2026
14 checks passed
@chicoxyzzy chicoxyzzy deleted the fix/transcript-dedupe-terminal-rows branch May 9, 2026 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants