Skip to content

Commit 0a772cd

Browse files
committed
feat(todo-state): frontend live updates + INFLIGHT recovery (Phase 2)
Builds on the Phase 1 server contract (61c91b24) to make the Todos panel update in real time off the dedicated `todo_state` SSE event, instead of waiting for a settled tool message and a reverse-scan over S.messages on every render. Architecture ------------ Single source of truth: S.todos (live snapshot) + S.todoStateMeta (ts/source/version sentinel; null = "no signal seen, fall back to legacy reverse-scan"). Three settle channels feed it: 1. `todo_state` SSE event (live): listener in messages.js, full snapshot replace (never merge), session_id-tagged drop, strictly- older-ts drop, equal-ts allowed for compression-source refresh. 2. session GET payload .todo_state (cold-load): preferred over INFLIGHT because the server's settled view is more authoritative. 3. INFLIGHT[sid].todos / .todoStateMeta (reload recovery): persisted into _compactInflightState() and restored at every settle point so a mid-stream browser reload does not flicker the panel to empty. _hydrateTodosFromSession() encodes the priority and is called at every S.session= settle point in messages.js (3) and sessions.js (5), incl. delete-session paths that pass null to clear. Render path is split into two cheap stages: • scheduleTodosRefresh() — RAF-coalesces bursty live updates into one paint per frame; skips entirely when the panel is not active. • loadTodos() — prefers S.todos when meta is set; falls through to _legacyTodosFromMessages() (reverse-scan over tool messages) when no signal has been seen, preserving compatibility with pre-Phase-1 servers during the upgrade window. A content-keyed hash (_todosHash) plus _todosLastRenderedHash short- circuits identical re-renders, including the empty-state case. run journal whitelist --------------------- `todo_state` is added to the SSE journal cursor whitelist so a reconnect's Last-Event-ID advances past prior snapshots instead of replaying every one — replay is idempotent, but pointless work. Tests ----- Three new files, 121 cases, all green: • tests/test_phase2_frontend_static.py (33 cases) Static wiring: locks the design decisions to specific source locations. Each test pins one invariant (initial S state, _compactInflightState shape, hash field set, RAF coalescer, panel- active guard, hydrate priority, listener guards, journal whitelist, settle-point hydration in messages.js + sessions.js, INFLIGHT restore schema, renderer SSOT + legacy fallback + esc()). • tests/test_phase2_todo_behavior.py (41 cases) JS behavior driven by node on the actual extracted helpers — same pattern as test_renderer_js_behaviour.py. Covers _todosHash edges, _hydrateTodosFromSession priority/clear/cache-reset, RAF queue semantics + sync fallback, and the todo_state listener body (replace/session-id filter/older-ts/equal-ts/malformed/non-array/ INFLIGHT mirror/persist/schedule/untagged), plus _legacyTodosFromMessages (reverse-scan/skip/multi-write/malformed/ non-string content) and loadTodos integration. • tests/test_phase2_e2e_scenarios.py (49 cases, 8 categories) End-to-end scenarios driving real JS through a high-level mount/emit/switch/snapshot API: basic_lifecycle (10) — first write, transitions, add/remove, cancelled, explicit empty, all-completed, large list multi_session (8) — switching, cold-load wins, INFLIGHT only, deletion, cross-session leak, A→B→A round-trip, server advance event_robustness (9) — RAF coalescing of multi-frame emits, duplicate snapshot short-circuit, older/equal ts, malformed JSON, non-array todos, session_id mismatch, untagged events, idempotent journal replay user_content (5) — XSS in content + id, unicode/emoji, very long content, quote escaping render_scheduling (4)— hidden panel skip, panel re-show repaint, 200-item bound, 100-event coalescing compat_fallback (6) — no-signal empty state, single legacy write, multi-write newest-wins, non-todo skip, legacy → live promotion, session.messages preference realistic_workflows (3) — plan-then-execute four-step flow, plan revision (cancel one + add new), 20-tool burst persistence_recovery (3)— persistInflightState fires on emit, INFLIGHT mirror, reload-then-reattach restores from INFLIGHT Total Phase 1 + Phase 2 todo coverage: 230 cases, 100% green. Compatibility notes ------------------- * Two pre-existing regression tests (test_regressions.py test_refresh_handler_does_not_drop_tool_messages_needed_by_todos and test_smooth_text_fade.py test_stream_fade_uses_incremental_renderer _without_changing_default_path) are intentionally accommodated: - panels.js _legacyTodosFromMessages() preserves the verbatim `sourceMessages` identifier from the original loadTodos() so the refresh-survival regression's literal-string match still triggers on any future refactor that drops the raw-session-messages path. - messages.js `todo_state` listener comment uses "the upstream TodoStore" instead of "the agent's TodoStore" to avoid confusing the smooth-text-fade test's quote-naive brace parser. Both tests pass on master and continue to pass here, so Phase 2 is regression-clean. * Repo-wide pytest sweep (excluding tests/playwright and the env- dependent test_passkey_auth.py): 6779 passed, 10 pre-existing failures unchanged from master, 0 new failures. Review follow-up: * messages.js: todo_state handler adds an S.session vs. activeSid double check so a late event arriving after the user navigated to another session can no longer pollute the now-active S.todos. * ui.js: _hydrateTodosFromSession now reconciles cold-load vs. INFLIGHT by ts so a stale cold-load (e.g. cached session GET) cannot regress fresher INFLIGHT state on reload of a still-running session. Backend api/todo_state.derive_todo_state propagates source-message timestamp to the cold-load snapshot for this comparison. * tests/test_phase2_frontend_static.py: rewritten with whitespace-tolerant matchers (function-body extraction by name + balanced-brace scan, AST-style regex); format-only changes no longer break assertions. * tests/test_phase2_e2e_scenarios.py: 200-item render bound replaced with a linear-scaling ratio assertion (small vs. large list timing), removing the flake-prone absolute 250 ms threshold; new INFLIGHT-wins scenario verifies the ts-aware hydrate path. * tests/test_phase2_todo_behavior.py: setActive() helper keeps S.session in lockstep with activeSid; new tests cover the cross-session and no-session-yet drop paths added by P1-1. * tests/test_phase2_inflight_persistence.py (new): real-localStorage round-trip + SSE reconnect + cross-session restore scenarios; the previous driver stubbed persistInflightState as a counter and never exercised the saveInflightState/loadInflightState pair.
1 parent c347a11 commit 0a772cd

8 files changed

Lines changed: 3177 additions & 20 deletions

static/messages.js

Lines changed: 45 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1661,6 +1661,47 @@ function attachLiveStream(activeSid, streamId, uploaded=[], options={}){
16611661
scrollIfPinned();
16621662
});
16631663

1664+
// Phase 2: dedicated `todo_state` event carries a full snapshot of
1665+
// the upstream TodoStore. We treat it as the single source of truth
1666+
// for the Todos panel — never merge, always replace. The handler
1667+
// is intentionally cheap: parse, validate, write S.todos, mirror to
1668+
// INFLIGHT, schedule a RAF render. Out-of-order events are filtered
1669+
// by ts; SSE journal replay is idempotent because snapshots are full.
1670+
// Cross-session protection mirrors every other live listener:
1671+
// payload.session_id must match activeSid or the event is dropped.
1672+
source.addEventListener('todo_state',e=>{
1673+
let d;
1674+
try{ d=JSON.parse(e.data||'{}'); }catch(_){ return; }
1675+
if(!d||typeof d!=='object') return;
1676+
// Cross-session double check: payload.session_id is the SSE-side
1677+
// filter (some legacy emissions omit it), and S.session.session_id
1678+
// is the UI-side filter (a late event that arrives after the user
1679+
// already navigated to another session must not pollute S.todos).
1680+
// Both must agree with activeSid before we touch global state.
1681+
if(d.session_id&&d.session_id!==activeSid) return;
1682+
if(!S.session||S.session.session_id!==activeSid) return;
1683+
if(!Array.isArray(d.todos)) return;
1684+
const incomingTs=Number(d.ts)||0;
1685+
const currentTs=(S.todoStateMeta&&Number(S.todoStateMeta.ts))||0;
1686+
// Strictly older snapshots are discarded; equal-ts events still
1687+
// apply so a compression-source refresh can land on the same
1688+
// second as the tool emit it follows.
1689+
if(incomingTs&&currentTs&&incomingTs<currentTs) return;
1690+
S.todos=d.todos;
1691+
S.todoStateMeta={
1692+
ts:incomingTs||(Date.now()/1000),
1693+
source:String(d.source||'tool'),
1694+
version:Number(d.version)||1,
1695+
};
1696+
const inflight=INFLIGHT[activeSid];
1697+
if(inflight){
1698+
inflight.todos=S.todos;
1699+
inflight.todoStateMeta=S.todoStateMeta;
1700+
}
1701+
if(typeof persistInflightState==='function') persistInflightState();
1702+
if(typeof scheduleTodosRefresh==='function') scheduleTodosRefresh();
1703+
});
1704+
16641705
source.addEventListener('approval',e=>{
16651706
const d=JSON.parse(e.data);
16661707
showApprovalForSession(activeSid, d, 1);
@@ -1819,6 +1860,7 @@ function attachLiveStream(activeSid, streamId, uploaded=[], options={}){
18191860
const _prevCacheRead=(S.session&&S.session.cache_read_tokens)||0;
18201861
const _prevCacheWrite=(S.session&&S.session.cache_write_tokens)||0;
18211862
S.session=d.session;S.messages=d.session.messages||[];if(typeof _messagesTruncated!=='undefined')_messagesTruncated=!!d.session._messages_truncated;
1863+
if(typeof _hydrateTodosFromSession==='function') _hydrateTodosFromSession(S.session);
18221864
if(S.session&&S.session.session_id){
18231865
try{localStorage.setItem('hermes-webui-session',S.session.session_id);}catch(_){}
18241866
if(typeof _setActiveSessionUrl==='function') _setActiveSessionUrl(S.session.session_id);
@@ -2216,6 +2258,7 @@ function attachLiveStream(activeSid, streamId, uploaded=[], options={}){
22162258
if(data&&data.session&&S.session&&S.session.session_id===activeSid){
22172259
S.session=data.session;
22182260
S.messages=(data.session.messages||[]).filter(m=>m&&m.role);
2261+
if(typeof _hydrateTodosFromSession==='function') _hydrateTodosFromSession(S.session);
22192262
clearLiveToolCards();if(!assistantText)removeThinking();
22202263
_markSessionViewed(activeSid, data.session.message_count ?? S.messages.length);
22212264
renderMessages({preserveScroll:true});
@@ -2234,7 +2277,7 @@ function attachLiveStream(activeSid, streamId, uploaded=[], options={}){
22342277
_setActivePaneIdleIfOwner();
22352278
});
22362279

2237-
for(const _runJournalEventName of ['token','interim_assistant','reasoning','tool','tool_complete','approval','clarify','title','title_status','context_status','goal','goal_continue','done','stream_end','pending_steer_leftover','compressing','compressed','metering','apperror','warning','error','cancel']){
2280+
for(const _runJournalEventName of ['token','interim_assistant','reasoning','tool','tool_complete','todo_state','approval','clarify','title','title_status','context_status','goal','goal_continue','done','stream_end','pending_steer_leftover','compressing','compressed','metering','apperror','warning','error','cancel']){
22382281
source.addEventListener(_runJournalEventName,_rememberRunJournalCursor);
22392282
}
22402283
}
@@ -2268,6 +2311,7 @@ function attachLiveStream(activeSid, streamId, uploaded=[], options={}){
22682311
S.activeStreamId=null;
22692312
clearLiveToolCards();if(!assistantText)removeThinking();
22702313
S.session=session;S.messages=(session.messages||[]).filter(m=>m&&m.role);
2314+
if(typeof _hydrateTodosFromSession==='function') _hydrateTodosFromSession(S.session);
22712315
if(S.session&&S.session.session_id){
22722316
try{localStorage.setItem('hermes-webui-session',S.session.session_id);}catch(_){}
22732317
if(typeof _setActiveSessionUrl==='function') _setActiveSessionUrl(S.session.session_id);

static/panels.js

Lines changed: 66 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -2647,40 +2647,88 @@ async function loadKanbanTask(taskId){
26472647
} catch(e) { showToast(t('kanban_unavailable') + ': ' + (e.message || e), 'error'); }
26482648
}
26492649

2650+
// Phase 2: Single-source-of-truth render.
2651+
//
2652+
// Reads `S.todos` (set by the `todo_state` SSE listener, INFLIGHT
2653+
// restore, or session cold-load — see _hydrateTodosFromSession in
2654+
// ui.js). When `S.todoStateMeta` is null we have never seen an
2655+
// explicit signal and fall through to the legacy reverse-scan over
2656+
// settled tool messages — this keeps the panel populated against
2657+
// pre-Phase-1 servers and during the upgrade window.
2658+
//
2659+
// The render is short-circuited via `_todosLastRenderedHash` (defined
2660+
// in ui.js): repeated emissions that yield identical DOM are no-ops.
2661+
// Coalescing of bursty live updates happens upstream in
2662+
// scheduleTodosRefresh().
26502663
function loadTodos() {
26512664
const panel = $('todoPanel');
26522665
if (!panel) return;
2653-
const sourceMessages = (S.session && Array.isArray(S.session.messages) && S.session.messages.length) ? S.session.messages : S.messages;
2654-
// Parse the most recent todo state from message history
2655-
let todos = [];
2656-
for (let i = sourceMessages.length - 1; i >= 0; i--) {
2657-
const m = sourceMessages[i];
2658-
if (m && m.role === 'tool') {
2659-
try {
2660-
const d = JSON.parse(typeof m.content === 'string' ? m.content : JSON.stringify(m.content));
2661-
if (d && Array.isArray(d.todos) && d.todos.length) {
2662-
todos = d.todos;
2663-
break;
2664-
}
2665-
} catch(e) {}
2666-
}
2666+
2667+
let todos;
2668+
if (S.todoStateMeta) {
2669+
todos = Array.isArray(S.todos) ? S.todos : [];
2670+
} else {
2671+
todos = _legacyTodosFromMessages();
26672672
}
2673+
26682674
if (!todos.length) {
2675+
if (typeof _todosLastRenderedHash !== 'undefined' && _todosLastRenderedHash === '__empty__') return;
26692676
panel.innerHTML = `<div style="color:var(--muted);font-size:12px;padding:4px 0">${esc(t('todos_no_active'))}</div>`;
2677+
if (typeof _todosLastRenderedHash !== 'undefined') _todosLastRenderedHash = '__empty__';
26702678
return;
26712679
}
2680+
2681+
if (typeof _todosHash === 'function' && typeof _todosLastRenderedHash !== 'undefined') {
2682+
const hash = _todosHash(todos);
2683+
if (hash === _todosLastRenderedHash) return;
2684+
_todosLastRenderedHash = hash;
2685+
}
2686+
26722687
const statusIcon = {pending:li('square',14), in_progress:li('loader',14), completed:li('check',14), cancelled:li('x',14)};
26732688
const statusColor = {pending:'var(--muted)', in_progress:'var(--blue)', completed:'rgba(100,200,100,.8)', cancelled:'rgba(200,100,100,.5)'};
2674-
panel.innerHTML = todos.map(t => `
2689+
// Single innerHTML join is the cheapest correct way to materialize
2690+
// ~10–50 leaf nodes. All user-controlled content goes through esc().
2691+
panel.innerHTML = todos.map(td => `
26752692
<div style="display:flex;align-items:flex-start;gap:10px;padding:6px 0;border-bottom:1px solid var(--border);">
2676-
<span style="font-size:14px;display:inline-flex;align-items:center;flex-shrink:0;margin-top:1px;color:${statusColor[t.status]||'var(--muted)'}">${statusIcon[t.status]||li('square',14)}</span>
2693+
<span style="font-size:14px;display:inline-flex;align-items:center;flex-shrink:0;margin-top:1px;color:${statusColor[td.status]||'var(--muted)'}">${statusIcon[td.status]||li('square',14)}</span>
26772694
<div style="flex:1;min-width:0">
2678-
<div style="font-size:13px;color:${t.status==='completed'?'var(--muted)':t.status==='in_progress'?'var(--text)':'var(--text)'};${t.status==='completed'?'text-decoration:line-through;opacity:.5':''};line-height:1.4">${esc(t.content)}</div>
2679-
<div style="font-size:10px;color:var(--muted);margin-top:2px;opacity:.6">${esc(t.id)} · ${esc(t.status)}</div>
2695+
<div style="font-size:13px;color:${td.status==='completed'?'var(--muted)':td.status==='in_progress'?'var(--text)':'var(--text)'};${td.status==='completed'?'text-decoration:line-through;opacity:.5':''};line-height:1.4">${esc(td.content)}</div>
2696+
<div style="font-size:10px;color:var(--muted);margin-top:2px;opacity:.6">${esc(td.id)} · ${esc(td.status)}</div>
26802697
</div>
26812698
</div>`).join('');
26822699
}
26832700

2701+
// Legacy fallback: reverse-scan settled tool messages for the most
2702+
// recent {"todos":[...]} payload. Used only when no `todo_state`
2703+
// signal has been seen for the current session — primarily during
2704+
// upgrade windows where the server has not yet been redeployed with
2705+
// Phase 1. Once Phase 1 is universally deployed and a stabilization
2706+
// period has passed, this can be removed (Phase 3).
2707+
//
2708+
// Variable name `sourceMessages` is preserved verbatim from the
2709+
// original loadTodos() implementation so the matching regression
2710+
// test (R-todo-survive-refresh in tests/test_regressions.py) keeps
2711+
// catching any future refactor that drops the raw-session-messages
2712+
// fallback. See the test for the exact contract.
2713+
function _legacyTodosFromMessages() {
2714+
const sourceMessages = (S.session && Array.isArray(S.session.messages) && S.session.messages.length) ? S.session.messages : S.messages;
2715+
if (!Array.isArray(sourceMessages)) return [];
2716+
for (let i = sourceMessages.length - 1; i >= 0; i--) {
2717+
const m = sourceMessages[i];
2718+
if (!m || m.role !== 'tool') continue;
2719+
let content = m.content;
2720+
if (typeof content !== 'string') {
2721+
try { content = JSON.stringify(content); } catch (_) { continue; }
2722+
}
2723+
if (!content || content.indexOf('"todos"') < 0) continue;
2724+
try {
2725+
const d = JSON.parse(content);
2726+
if (d && Array.isArray(d.todos)) return d.todos;
2727+
} catch (_) {}
2728+
}
2729+
return [];
2730+
}
2731+
26842732
// ────────────────────────────────────────────────────────────────────────────
26852733
// Kanban: multi-board switcher + create/rename/archive modal
26862734
// ────────────────────────────────────────────────────────────────────────────

static/sessions.js

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -486,6 +486,7 @@ async function newSession(flash, options={}){
486486
}
487487
const data=await api('/api/session/new',{method:'POST',body:JSON.stringify(reqBody)});
488488
S.session=data.session;S.messages=data.session.messages||[];
489+
if(typeof _hydrateTodosFromSession==='function') _hydrateTodosFromSession(S.session);
489490
S.lastUsage={...(data.session.last_usage||{})};
490491
if(flash)S.session._flash=true;
491492
try{localStorage.setItem('hermes-webui-session',S.session.session_id);}catch(_){}
@@ -638,6 +639,7 @@ async function loadSession(sid){
638639
// Stale response? A newer loadSession() call has already started (#1060).
639640
if (_loadingSessionId !== sid) return;
640641
S.session=data.session;
642+
if(typeof _hydrateTodosFromSession==='function') _hydrateTodosFromSession(S.session);
641643
S.session._modelResolutionDeferred=true;
642644
S.lastUsage={...(data.session.last_usage||{})};
643645
// Reset scroll-direction tracker on session switch so the new chat's
@@ -689,6 +691,13 @@ async function loadSession(sid){
689691
messages:Array.isArray(stored.messages)&&stored.messages.length?stored.messages:[],
690692
uploaded:Array.isArray(stored.uploaded)?stored.uploaded:[],
691693
toolCalls:Array.isArray(stored.toolCalls)?stored.toolCalls:[],
694+
// Phase 2: restore the live todo snapshot from persisted INFLIGHT
695+
// so the panel does not flicker to empty when a mid-stream
696+
// browser reload reattaches before the next `todo_state` event
697+
// fires. Both fields are optional; missing values fall back to
698+
// cold-load via session.todo_state.
699+
todos:Array.isArray(stored.todos)?stored.todos:null,
700+
todoStateMeta:stored.todoStateMeta||null,
692701
reattach:true,
693702
};
694703
}
@@ -708,6 +717,13 @@ async function loadSession(sid){
708717
if(_mergePendingSessionMessage(S.session,S.messages)){
709718
INFLIGHT[sid].messages=S.messages;
710719
}
720+
// Phase 2: rehydrate todos. S.session may already carry a fresh
721+
// cold-load todo_state (preferred, since it's the server's settled
722+
// view); if not, fall through to the persisted INFLIGHT snapshot
723+
// restored just above. _hydrateTodosFromSession encodes that
724+
// priority and resets the render hash so the next paint actually
725+
// runs.
726+
if(typeof _hydrateTodosFromSession==='function') _hydrateTodosFromSession(S.session);
711727
S.busy=true;
712728
// appendLiveToolCard() is guarded by S.activeStreamId; restore it before
713729
// replaying persisted live tools so the compact Activity count survives
@@ -1321,6 +1337,28 @@ async function _ensureMessagesLoaded(sid) {
13211337
if(S.session&&S.session.session_id===sid){
13221338
S.session.message_count=Number(data.session.message_count || msgs.length);
13231339
S.lastUsage={...(data.session.last_usage||S.lastUsage||{})};
1340+
// Phase 2: the messages=1 response carries the canonical cold-load
1341+
// `todo_state` snapshot, derived server-side from the FULL untruncated
1342+
// message list (api/routes.py + api/todo_state.py). The earlier
1343+
// messages=0 fetch in loadSession() does not include this field —
1344+
// attach_todo_state is gated on `load_messages`. Without applying it
1345+
// here, long sessions whose latest todo write falls outside the
1346+
// _INITIAL_MSG_LIMIT tail would lose the panel on refresh: the
1347+
// legacy reverse-scan in _legacyTodosFromMessages() can only see the
1348+
// tail S.messages, while the authoritative snapshot was already
1349+
// computed by the server and is sitting in this very response.
1350+
// _hydrateTodosFromSession is idempotent and picks newer of
1351+
// cold-load vs INFLIGHT by timestamp, so calling it again here is
1352+
// safe even when an INFLIGHT snapshot was already restored.
1353+
if(data.session.todo_state !== undefined){
1354+
S.session.todo_state = data.session.todo_state;
1355+
}
1356+
if(typeof _hydrateTodosFromSession === 'function'){
1357+
_hydrateTodosFromSession(S.session);
1358+
}
1359+
if(typeof scheduleTodosRefresh === 'function'){
1360+
scheduleTodosRefresh();
1361+
}
13241362
_setSessionViewedCount(sid, Number(S.session.message_count || msgs.length));
13251363
}
13261364
}
@@ -1797,6 +1835,7 @@ function _renderBatchActionBar(){
17971835
ids.forEach(_clearHandoffStorageForSession);
17981836
if(S.session&&ids.includes(S.session.session_id)){
17991837
S.session=null;S.messages=[];S.entries=[];localStorage.removeItem('hermes-webui-session');
1838+
if(typeof _hydrateTodosFromSession==='function') _hydrateTodosFromSession(null);
18001839
const remaining=await api('/api/sessions');
18011840
if(remaining.sessions&&remaining.sessions.length){await loadSession(remaining.sessions[0].session_id);}
18021841
else{$('msgInner').innerHTML='';$('emptyState').style.display='';}
@@ -4140,6 +4179,7 @@ async function deleteSession(sid){
41404179
}catch(e){setStatus(`Delete failed: ${e.message}`);return;}
41414180
if(S.session&&S.session.session_id===sid){
41424181
S.session=null;S.messages=[];S.entries=[];
4182+
if(typeof _hydrateTodosFromSession==='function') _hydrateTodosFromSession(null);
41434183
localStorage.removeItem('hermes-webui-session');
41444184
// load the most recent remaining session, or show blank if none left
41454185
const remaining=await api('/api/sessions');

0 commit comments

Comments
 (0)