Skip to content

fix: integrate ChatSummaryHistory into heartbeat protocol (#334)#473

Draft
lml2468 wants to merge 1 commit into
mainfrom
fix/issue-334-sidebar-heartbeat-v2
Draft

fix: integrate ChatSummaryHistory into heartbeat protocol (#334)#473
lml2468 wants to merge 1 commit into
mainfrom
fix/issue-334-sidebar-heartbeat-v2

Conversation

@lml2468

@lml2468 lml2468 commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

根因 / 实现说明

  • ChatSummaryHistory(侧边栏)的 5s batchStatus 轮询完全游离在 summary-batch-heartbeat 协调协议之外:既不发送心跳,也不监听心跳。当 SummaryListPage(2s)和/或 SummaryDetailPage(15s fallback)已经覆盖相同 taskId 时,sidebar 仍在独立发请求,造成重复的 batch-status 调用。

修复:把 ChatSummaryHistory 接入心跳协议——

  1. dispatch:每次 batchStatus 成功后广播 summary-batch-heartbeat(与 SummaryListPage 行为一致),让 SummaryDetailPage 的 fallback poller 能感知到 sidebar 也在轮询。
  2. listen:监听 summary-batch-heartbeat,当所有活跃 taskId 都已被其他 poller 覆盖且心跳在 10s 内,跳过本周期轮询。心跳过期(≥10s 无新心跳)后自动恢复。

改动摘要

  • packages/dmworksummary/src/components/ChatSummaryHistory.tsx

    • 新增 heartbeatCoveredTaskIdslastHeartbeatTime 实例字段
    • componentDidMount / componentWillUnmount 注册/注销 summary-batch-heartbeat 监听
    • doPoll() 成功后 dispatch summary-batch-heartbeat
    • maybeStartPoll() 定时器回调中增加心跳抑制判断
  • packages/dmworksummary/src/components/__tests__/ChatSummaryHistory.test.tsx

    • 新增 4 个测试:心跳 dispatch、心跳抑制、心跳过期后恢复、部分覆盖不抑制

测试

  • 新增单元测试 4 项 ✓
  • 原有 11 项 polling/render 测试 ✓
  • 全部 15 项 ChatSummaryHistory 测试通过

验证 (baseline diff, charter §6)

  • after − baseline = 0 new failures
  • 噪音 (两侧都有): ChatSummaryPanel resizable splitter (4 tests, pre-existing localStorage/jsdom 环境问题)

Fixes #334

…ol (#334)

ChatSummaryHistory (sidebar) was polling /summaries/batch-status on its own
5s interval without coordinating with SummaryListPage (2s interval) or
SummaryDetailPage (15s fallback). Both already participated in the
summary-batch-heartbeat protocol, but the sidebar neither dispatched nor
listened, causing duplicate batch-status requests for the same taskIds.

- Dispatch summary-batch-heartbeat after each successful batchStatus call
  so SummaryDetailPage can suppress its fallback poller.
- Listen for summary-batch-heartbeat and skip sidebar polls when all active
  taskIds are already covered by another poller (freshness window: 10s).

Adds 4 tests: heartbeat dispatch, suppression, expiry/resume, partial-cover.

@OctoBoooot OctoBoooot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: fix: integrate ChatSummaryHistory into heartbeat protocol (#334) (#473)

Verdict: Request changes

This fixes #472's two problems but reverts the design change that made them fixable, re-introducing #471's self-suppression 🔴. The fix is oscillating between two designs, each of which trades one bug for another — it needs both properties at once.

#472 issues — resolved:

  • ✅ Partial-coverage: now uses ids.every(id => heartbeatCoveredTaskIds.has(id)) (was .some(...)), so the sidebar only suppresses when all its active tasks are covered — an uncovered task no longer gets stranded.
  • ✅ No-expiry over-suppression: the 10s staleness check (Date.now() - this.lastHeartbeatTime < 10000) is back, so a silent-but-mounted coordinator no longer wedges the sidebar permanently.

But #471's self-suppression 🔴 is back:

Blockers

  • ChatSummaryHistory.tsx:112 + :145 — the sidebar both dispatches summary-batch-heartbeat after each poll (line ~112) and listens for it (handleBatchHeartbeat, :145), with no source id in detail (only taskIds). window.dispatchEvent is synchronous, so the sidebar's own heartbeat immediately re-enters its own listener, setting heartbeatCoveredTaskIds / lastHeartbeatTime to its own poll. Trace as sole poller (no SummaryListPage mounted — the common sidebar-only case), tasks [42]: poll at t=0 → self-heartbeat sets covered={42}, lastHeartbeatTime=0 → t=5s tick: now-last < 10000 AND [42].every(∈{42})suppressed → next real poll only at t≈10s. The intended 5s cadence degrades to ~10s whenever the sidebar polls alone. This is exactly the #471 self-suppression bug (there it was 5s→15s; the design that fixed it in #472 — pure consumer, never dispatch — has been reverted here). Fix: either keep #472's pure-consumer design (don't dispatch from the sidebar) plus #473's every()+expiry, or tag the heartbeat with a source/owner id and have handleBatchHeartbeat ignore self-originated events.
  • Tests miss it again. The 4 new cases inject external heartbeats (window.dispatchEvent(...) then advance) — none simulates "sidebar polls alone across ≥2 intervals with no external heartbeat" and asserts it keeps the 5s cadence. Same gap that let #471 through. Add that sole-poller test and it would fail on this diff.

Praise

  • The every() + 10s-expiry combination is the correct coordination logic — it's strictly better than both #471 (some(), self-suppressing) and #472 (binary flag, no expiry). The two properties #473 gets right (full-coverage-only suppression + staleness recovery) are exactly the two that #471 and #472 each got half of. The remaining task is to keep these while not re-arming suppression from the sidebar's own heartbeat — i.e. combine #472's pure-consumer dispatch discipline with #473's coverage/expiry logic.

Out of scope (informational)

  • Draft PR. This is the 3rd iteration on #334 (471→472→473), each fixing the prior's regression while re-introducing an earlier one. The end state needs: (a) full-coverage-only suppression [#473 ✅], (b) staleness expiry [#473 ✅], (c) no self-suppression [#472 ✅, #473 ✗]. Landing all three together closes it.

@Jerry-Xin Jerry-Xin left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This iterates on #472 (same issue #334, same 2 files). The two #472 residuals are now fixed, but the new self-dispatch reintroduces the #471-class self-suppression bug.

🔴 Blocking

🔴 Critical — sidebar self-suppresses its own polling (5s → 10s when sole poller). ChatSummaryHistory now both dispatches and listens to summary-batch-heartbeat with no source/owner discriminator in the event detail:

  • doPoll() dispatches the heartbeat with its own taskIdspackages/dmworksummary/src/components/ChatSummaryHistory.tsx:114
  • the same component's handleBatchHeartbeat consumes that very event and records lastHeartbeatTime = Date.now() + heartbeatCoveredTaskIds = taskIdsChatSummaryHistory.tsx:147-151
  • next 5s tick, the freshness check skips the poll because all ids are "covered" — ChatSummaryHistory.tsx:89-93

Trace as the sole poller (no SummaryListPage/DetailPage mounted — the common case, since the sidebar lives in the chat panel and the user usually doesn't have the list page open): poll@t=5s → self-heartbeat → skip@t=10s → stale@t=15s polls again → skip@t=20s ... → effective interval degrades to 10s instead of the intended 5s. This is the exact #471 self-suppression class that #472 deliberately fixed by making the sidebar a pure consumer; #473 re-adds self-dispatch without guarding against self-consumption.

Fix: add a source/owner marker to the event detail (e.g. an instance id) and have handleBatchHeartbeat ignore heartbeats emitted by the same poller instance, while still honoring SummaryListPage heartbeats.

💬 Non-blocking

🔵 Missing regression test for the self-suppression path. The new tests cover external full-coverage suppression, stale-heartbeat recovery, and partial-coverage non-suppression — but none assert the standalone sidebar keeps its 5s cadence. Add: after one successful sidebar poll (which self-dispatches), advance another 5s with no external heartbeat and assert batchStatus is called again.

✅ Highlights / verified-fixed residuals

  • Partial-coverage (#472 residual) — FIXED. Suppression predicate is now ids.every(id => this.heartbeatCoveredTaskIds.has(id)) (ChatSummaryHistory.tsx:92), not the old some(). On partial overlap (sidebar [10,20], heartbeat [10]) the whole set [10,20] is still polled. New test "does not skip poll when heartbeat only partially covers active taskIds" pins this.
  • No-expiry / permanent-suppression (#472 residual) — FIXED. Suppression is now time-bounded: Date.now() - this.lastHeartbeatTime < 10000 (ChatSummaryHistory.tsx:91). When SummaryListPage stops polling while mounted (activeIds→0, SummaryListPage.tsx:124-127 stopBatchPoll without unmount) no fresh heartbeat arrives, so within 10s the sidebar self-heals and resumes — no longer dependent on summary-list-unmount. SummaryListPage is not in this PR's changed files and does not need to be: the expiry lives on the consumer side. New test "resumes poll when no new heartbeat arrives after a skipped cycle" pins this.
  • Event shape reuses the existing summary-batch-heartbeat contract used by SummaryListPage (SummaryListPage.tsx:152).

Note: PR is a Draft. CI is green (Build skipped, code-review pending). Could not run the focused vitest suite — this mirror checkout has no installed node_modules.

@mochashanyao mochashanyao left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Octo-Q · automated review]

Verdict: Request changes — blocking findings below (data-flow traced).


octo-web PR#473 Review Report — fix: integrate ChatSummaryHistory into heartbeat protocol (#334)

Reviewer: Octo-Q (automated review)
Head SHA: 075c0d71b509b2112c0f8c34a754680170405d7f
Base: main (a3228c5)

1. Summary

This PR integrates the ChatSummaryHistory sidebar component into the existing summary-batch-heartbeat coordination protocol (already used by SummaryListPageSummaryDetailPage). The intent is to eliminate redundant batchStatus API calls from the sidebar when SummaryListPage (2s interval) already covers the same taskIds.

Files changed (2 files, +178 −0):

  • ChatSummaryHistory.tsx — heartbeat dispatch + listen + suppression logic
  • ChatSummaryHistory.test.tsx — 4 new tests for heartbeat behavior

2. Findings

🔴 P1 — Self-heartbeat causes unintended poll self-suppression

File: ChatSummaryHistory.tsx
Diff-scope: New (introduced by this PR)

ChatSummaryHistory both dispatches and listens to the same summary-batch-heartbeat CustomEvent on window. Since window.dispatchEvent is synchronous and delivers to all registered listeners, the component receives its own heartbeat immediately after dispatching it.

Data-flow trace:

  1. doPoll(taskIds) (line ~114) calls batchStatus(taskIds), then:

    window.dispatchEvent(new CustomEvent('summary-batch-heartbeat', { detail: { taskIds } }));
  2. handleBatchHeartbeat (line ~147) fires synchronously on the same component instance:

    this.heartbeatCoveredTaskIds = new Set(taskIds);  // own taskIds
    this.lastHeartbeatTime = Date.now();                // now
  3. Next setInterval tick (T+5000ms) in maybeStartPoll() (line ~90):

    Date.now() - this.lastHeartbeatTime < 10000   // 5000 < 10000 → true
    ids.every(id => this.heartbeatCoveredTaskIds.has(id))  // own ids → true

    Poll skipped.

  4. T+10000ms: 10000 < 10000 → false → poll fires, dispatches heartbeat again → cycle repeats.

Effect: The sidebar polls every 10 seconds instead of every 5 seconds, and only because of a strict-< edge case (10000 < 10000 = false), not by design. Status updates in the sidebar lag by up to 10s when no other poller is present (e.g., user has the list page closed but the sidebar open).

This is not the PR's intent — the listen path should only suppress in response to other pollers' heartbeats (e.g., SummaryListPage), not the component's own.

R1 check: This makes a previously working path (5s sidebar polling) produce stale data (10s effective interval). The degradation is user-visible: status badges in the sidebar update at half the intended frequency. → P1 blocking.

Fix direction: Either (a) set a _dispatchingHeartbeat flag before dispatch and skip self-events in the handler, (b) add a source field to the heartbeat detail (e.g., { source: 'sidebar', taskIds }) and ignore own-source events, or (c) move the dispatch to after the finally block or use a separate event name for the dispatch path.

⚠️ P2 — Test suite passes despite P1; tests don't isolate self-heartbeat from external heartbeat

File: ChatSummaryHistory.test.tsx
Diff-scope: New tests introduced by this PR

The four new tests all pass, but two of them conflate self-heartbeat with manually dispatched heartbeat:

  • "skips poll when heartbeat covers all active taskIds" (line ~468): The first poll at T+5s dispatches a self-heartbeat (sets lastHeartbeatTime = T+5s). The manual dispatch afterward overwrites lastHeartbeatTime to approximately the same value. The 8s advance then tests lastHeartbeatTime + 8s < 10s — this passes regardless of the manual dispatch, because the self-heartbeat alone produces the same result. The test would still pass if the manual dispatch were removed.

  • "resumes poll when no new heartbeat arrives after a skipped cycle" (line ~508): The suppression at T+5s is caused by the component's own self-heartbeat from loadHistorymaybeStartPoll→first doPoll, not by the manual dispatch. The resume at T+10s happens because 10000 < 10000 = false — a coincidental edge of the strict-< comparison, not a tested design property.

Missing test: A test that calls batchStatus once, then advances time without any manual heartbeat dispatch, and asserts that batchStatus IS called on the next cycle — this would currently fail (proving the P1 bug) and should be added as a regression guard.

P2 — heartbeatCoveredTaskIds wholesale replacement on each heartbeat

File: ChatSummaryHistory.tsx, handleBatchHeartbeat (line ~147)
Diff-scope: New

this.heartbeatCoveredTaskIds = new Set(taskIds);

Each incoming heartbeat replaces the entire covered set rather than merging. If SummaryListPage polls {10, 20} at T and another poller dispatches {30} at T+1s, the {10, 20} coverage is lost. In practice this is unlikely to cause issues (SummaryListPage is the primary heartbeat source and polls all active IDs together), but the design is fragile for a multi-poller protocol.

Nit — Magic number 10000

File: ChatSummaryHistory.tsx, line ~91

The 10-second freshness window is hardcoded. SummaryListPage polls at 2s, so 10s = 5 missed polls before the sidebar considers the heartbeat stale. This should be a named constant (e.g., HEARTBEAT_FRESHNESS_MS) shared with SummaryDetailPage (which uses a 15s threshold at line ~468), to prevent silent drift if one is changed without the other.

3. Verification

Item Status Evidence
Heartbeat dispatch matches SummaryListPage format Both use new CustomEvent('summary-batch-heartbeat', { detail: { taskIds } }) — SummaryListPage.tsx:152, ChatSummaryHistory.tsx:~114
Listener registration/cleanup symmetric (mount/unmount) addEventListener in componentDidMount (line 44), removeEventListener in componentWillUnmount (line 52)
SummaryDetailPage handleBatchHeartbeat unaffected SummaryDetailPage.tsx:399 — only reads, never dispatches. Will now also receive sidebar heartbeats, which is a positive side effect (more fresh heartbeats).
isPolling guard preserved doPoll still has if (this.isPolling) return at the top
Test additions correct in isolation ⚠️ Tests verify the suppression path works but don't verify the non-suppression path still works (see P2 above)

4. Data-Flow Trace

Consumer Data consumed Upstream source Does data actually arrive?
maybeStartPoll suppression check this.heartbeatCoveredTaskIds handleBatchHeartbeat ← self-dispatch OR external dispatch ✅ Yes — but self-dispatch always populates it with own IDs, causing unintended suppression
maybeStartPoll suppression check this.lastHeartbeatTime handleBatchHeartbeat ← self-dispatch OR external dispatch ✅ Yes — but self-dispatch sets it to Date.now() after every poll, making freshness check pass on next tick
doPoll heartbeat dispatch taskIds parameter setInterval callback → getActiveTaskIds()state.items filter ✅ Yes — active task IDs flow correctly
handleBatchHeartbeat event.detail.taskIds CustomEvent from self or other pollers ✅ Yes — properly validated for empty/undefined

5. Blind-spot Checklist (R5)

  • C1 — Double-path parity: Dispatch + listen are symmetric within this component, which is the root cause of the P1 self-suppression bug. The other participant (SummaryListPage) dispatches but does NOT listen, avoiding this problem. The protocol is asymmetric by design; this PR mistakenly added both roles to the same component. → HIT (P1).
  • C2 — Control-flow ordering / nesting: The heartbeat is dispatched before state update in doPoll, meaning handleBatchHeartbeat fires synchronously while isPolling is still true. No double-action risk since handleBatchHeartbeat only writes instance fields. The every() check in the suppression path is correct for empty ids (vacuously true) but is guarded by the ids.length === 0 early return above it. → Clear.
  • C3 — Authorization boundary: N/A (frontend UI, no auth-relevant changes).
  • C4 — Authorization lifecycle: N/A.
  • C5 — Build ≠ runtime: N/A (no build/packaging changes).
  • C6 — Governance docs: N/A.

6. Cross-round Blocker Review (R6)

N/A — first review of this PR.

7. Recommendations

  1. Fix P1: Prevent self-heartbeat suppression. Simplest fix: set a flag before dispatching and check it in the handler:

    private _isDispatchingHeartbeat = false;
    
    // In doPoll():
    this._isDispatchingHeartbeat = true;
    window.dispatchEvent(new CustomEvent('summary-batch-heartbeat', { detail: { taskIds } }));
    this._isDispatchingHeartbeat = false;
    
    // In handleBatchHeartbeat():
    if (this._isDispatchingHeartbeat) return;
  2. Add regression test: A test that verifies the sidebar still polls on consecutive 5s cycles when NO external heartbeat is dispatched:

    // advance 5s → batchStatus called
    // advance 5s more → batchStatus called again (NOT suppressed)
  3. Extract heartbeat freshness threshold as a shared constant with SummaryDetailPage.


[Octo-Q] verdict: REQUEST_CHANGES — P1 self-heartbeat suppression effectively doubles the sidebar polling interval from 5s to 10s, degrading status freshness below the pre-PR baseline. Fix is straightforward (filter self-dispatched events in the handler), and tests need a regression guard for the non-suppression path.

@yujiawei yujiawei left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — PR #473 (octo-web)

Reviewed at head SHA 075c0d71b509b2112c0f8c34a754680170405d7f. Diff is 2 files: ChatSummaryHistory.tsx (+29 source) and its test (+149). I read the sibling pollers (SummaryListPage.tsx, SummaryDetailPage.tsx) to understand the existing summary-batch-heartbeat protocol, and ran the unit tests locally (15/15 pass).

1. Spec compliance

The linked issue (#334) proposes "fix direction #2 (minimal): bring ChatSummaryHistory into the summary-batch-heartbeat protocol (send + receive)." The diff does implement send + receive. However, it implements only the suppressing half of the protocol and omits the two compensating signals the existing implementation always pairs with the heartbeat — see the P1 below. So as written it does not fully satisfy the intent ("close gap B" without opening a new one).

  • Missing: the heartbeat is emitted, but summary-status-change and summary-list-unmount are not (see P1).
  • Over-build: none.
  • Deviation: none in approach; the gap is incompleteness, not a wrong direction.

2. Code quality

Verdict: Changes Requested

P1 — Asymmetric protocol participation can permanently starve a coexisting detail page

ChatSummaryHistory now emits summary-batch-heartbeat after every successful poll (ChatSummaryHistory.tsx:114). On the receiving side, SummaryDetailPage treats any heartbeat that includes its taskId as authoritative coverage and stops its own fallback poll:

// SummaryDetailPage.tsx:399-407
private handleBatchHeartbeat = (event: Event) => {
    if (this.taskId == null) return;
    const taskIds = (event as CustomEvent).detail?.taskIds;
    if (!taskIds || !taskIds.includes(this.taskId)) return;
    this.listPageActive = true;
    this.lastEventTime = Date.now();
    this.stopFallbackPoll();
};

In the working implementation, SummaryListPage pairs the heartbeat with two compensating signals so suppression stays safe:

  • summary-status-change (with changedIds) on every detected status change (SummaryListPage.tsx:170) → makes the suppressed detail page reload its content.
  • summary-list-unmount on unmount (SummaryListPage.tsx:84) → re-arms the suppressed detail page's fallback (SummaryDetailPage.tsx:454-461).

ChatSummaryHistory emits neither (confirmed: no summary-status-change / summary-list-unmount dispatch anywhere in the new code). The concrete failure path when the sidebar and an embedded/route SummaryDetailPage coexist over the same taskId (exactly the multi-instance scenario #334 documents):

  1. Sidebar doPoll succeeds → dispatches heartbeat.
  2. Detail page receives it → listPageActive = true, stopFallbackPoll().
  3. Task transitions PROCESSING → COMPLETED. The sidebar detects this and updates its own list state (ChatSummaryHistory.tsx:125-126) but emits no summary-status-change, so the detail page never reloads.
  4. The detail page's only self-heal (the Date.now() - lastEventTime > 15000 staleness reset) lives inside startFallbackPoll (SummaryDetailPage.tsx:468-469), which is never re-entered because nothing re-arms it. The sidebar also never emits summary-list-unmount.

Result: the detail view can sit on a stale PROCESSING state indefinitely while the task has actually completed. This is a user-visible regression introduced by the change, so it should be addressed before merge. Fix: either emit the companion summary-status-change (on detected change) and summary-list-unmount (on unmount) from ChatSummaryHistory to match the existing contract, or scope the heartbeat so it does not suppress pollers the sidebar can't compensate for.

P2 — Self-suppression degrades the sidebar's own 5s cadence

ChatSummaryHistory both dispatches (:114) and listens to (:44) the same event, and handleBatchHeartbeat records its own emission (:150-151). Because the freshness window (< 10000, :91) is larger than the poll interval (5000, line 95), each successful poll suppresses the next tick: every poll that covers all active ids will skip the following cycle, dropping the effective sidebar cadence from ~5s to ~10s (often more with request latency). The fix should ignore self-emitted heartbeats — e.g. tag the event detail with a per-instance sender id and skip it in handleBatchHeartbeat when it matches this.

P2 — All-or-nothing, replacement-based coverage leaves redundancy

handleBatchHeartbeat replaces coverage wholesale: this.heartbeatCoveredTaskIds = new Set(taskIds) (:150), and the suppression check requires ids.every(...) (:92). Two consequences:

  • Overlapping subsets clobber each other: heartbeats for [1] then [2] leave coverage {2}, so active [1,2] still polls both.
  • Partial coverage gives no benefit: a heartbeat for [1] while active is [1,2] still polls [1,2], re-duplicating the covered id rather than polling only the uncovered [2].

So the duplicate-polling the PR targets can still persist for overlapping task sets. A taskId -> lastSeenTimestamp map (and polling only the uncovered subset) would address this. Advisory — not blocking on its own.

Positives

  • Listener registration/teardown is correctly paired in componentDidMount/componentWillUnmount — no leak.
  • Tests are well-structured and the freshness/expiry/partial-coverage cases are covered; they pass (15/15). Note the tests verify the sidebar's local behavior only and would not catch the P1 cross-component regression.

3. Overall verdict

CHANGES_REQUESTED — the P1 cross-component starvation is a real, user-visible regression. The two P2s are quality improvements that would make the fix actually eliminate the redundancy it targets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/M PR size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: sidebar smart-summary batch-status duplicate polling (uncoordinated multi-poller)

5 participants