fix(#1361): preserve reasoning, tool calls, and partial output on Stop/Cancel#1375
fix(#1361): preserve reasoning, tool calls, and partial output on Stop/Cancel#1375bergeouss wants to merge 1 commit into
Conversation
…t on cancel
Stop/Cancel was discarding AI's already-streamed reasoning and tool output —
paid tokens lost with no way to recover. Three sub-bugs fixed:
§A: Reasoning trace (on_reasoning callback) was accumulated in a thread-local
variable invisible to cancel_stream(). Added STREAM_REASONING_TEXT shared
dict (same pattern as STREAM_PARTIAL_TEXT) to make it accessible.
§B: Live tool calls (_live_tool_calls) were also thread-local. Added
STREAM_LIVE_TOOL_CALLS shared dict to persist tool call info on cancel.
§C: When the entire streamed output was reasoning-only (no visible tokens),
_stripped was empty after regex cleanup, so no partial assistant message
was appended — only the *Task cancelled.* marker survived. Now appends
a partial message with reasoning/tool_calls fields even when content is
empty.
Tests: 8 new regression tests covering all three sub-bugs plus no-regression
cases. Existing nesquena#1298 and nesquena#1195 tests unaffected (15/15 pass).
…p/Cancel (#1375) Three distinct data-loss paths fixed: §A — Reasoning text was accumulated in a thread-local _reasoning_text inside _run_agent_streaming. cancel_stream() never saw it because it went out of scope when the thread was interrupted. Now mirrored to a new shared dict STREAM_REASONING_TEXT keyed by stream_id, populated in on_reasoning() and the reasoning branch of on_tool(), read in cancel_stream(). §B — Live tool calls in thread-local _live_tool_calls were similarly invisible to cancel_stream(). Now mirrored to STREAM_LIVE_TOOL_CALLS on tool.started + tool.completed. §C — Reasoning-only streams produced no partial message because the thinking-block regex strip returned empty string and the `if _stripped:` guard skipped the append. Now appends the partial message when EITHER content text, reasoning trace, OR tool calls exist. Mirrors the existing STREAM_PARTIAL_TEXT pattern from #893 exactly: same dict creation in _run_agent_streaming, same _live_config fallback in cancel_stream, same cleanup in _periodic_checkpoint. 8 regression tests in tests/test_issue1361_cancel_data_loss.py covering all three sections plus tools+text combinations. Co-authored-by: bergeouss <bergeouss@users.noreply.github.com>
…r change Adds two more contributor PRs to the v0.50.251 batch per user directive (per-PR review + Opus review for #1373; #1375 was clean ship-on-sight). #1375 (@bergeouss, +382 LOC, all CI green) — fixes #1361 paid-token data loss on Stop/Cancel. Mirrors the existing STREAM_PARTIAL_TEXT pattern from #893: adds STREAM_REASONING_TEXT and STREAM_LIVE_TOOL_CALLS shared dicts populated during streaming and read by cancel_stream(). Also fixes the §C reasoning-only-creates-no-message gap where the strip-thinking-blocks regex returned empty string and the if-guard skipped the partial append. 8 regression tests covering all 3 sections plus tools+text combinations. #1373 (@bergeouss, +105 LOC, had CI failures pre-fix) — fixes #1195 new-profile-routes-to-default. The is_dir() guard in get_hermes_home_for_profile() caused new profiles (no session yet) to silently route every session back to the default profile until the directory existed on disk. Removed the guard; profile path is now returned unconditionally. Pre-release fix for #1373's CI failures: the change flipped two behaviors pinned by tests in #798: - R19c (test_get_hermes_home_for_profile_falls_back_for_missing_profile) asserted nonexistent → base. Renamed and updated to assert the new always-return-profile-path behavior. - R19j (test_get_hermes_home_for_profile_rejects_path_traversal) asserted that valid-but-nonexistent profile names → base. Updated to assert profile-scoped path. Also updated docstring: the _PROFILE_ID_RE regex is now the SOLE defense against path traversal (previously is_dir() was a defense-in-depth layer); verified each known-bad shape still returns base. Tests: 3484 passing (3471 → 3484, +13).
…ST-FIX) Opus pass-2 review of v0.50.251 caught a critical regression in PR #1375: The cancel-partial message stored captured tool calls under the 'tool_calls' key. That key is whitelisted by _API_SAFE_MSG_KEYS so _sanitize_messages_for_api forwarded the entries to the next-turn LLM call. But the captured entries use the WebUI internal shape ({name, args, done, duration, is_error}) — they don't have the OpenAI/Anthropic id + function: {name, arguments} envelope. Strict providers (OpenAI, Anthropic, Z.AI/GLM) would 400 on the malformed entries. Net effect: the very cancel-then-continue scenario PR #1375 aimed to improve becomes a hard fail. Fix: - Rename the persisted key to '_partial_tool_calls' (underscore- prefixed private key NOT in _API_SAFE_MSG_KEYS, so sanitize correctly strips it). - Update static/messages.js hasMessageToolMetadata check to also recognize _partial_tool_calls for UI rendering. - Update test_issue1361_cancel_data_loss.py assertion to check _partial_tool_calls (and tool_calls as legacy fallback). Plus 2 NIT fixes from the same Opus review: NIT 1 (api/profiles.py:153): re.match → re.fullmatch for consistency with other _PROFILE_ID_RE callers in the codebase. The trailing- newline footgun ($ matches before final \n in re.match) is now closed. Without #1373's is_dir() guard, a name like 'valid\n' would have created a directory named 'valid\n' on Linux. Doesn't escape <HERMES_HOME>/profiles/ via Path joining, but unintended. NIT 2 (test_issue798.py): R19j coverage gaps — added trailing- newline tests, length-boundary tests (64-char valid, 65-char rejected), single-char minimum, and non-ASCII / Unicode-trick tests. New regression test (tests/test_pr1375_partial_tool_calls_sanitize.py): - test_partial_tool_calls_field_not_forwarded_to_llm: pins that sanitize-for-API strips _partial_tool_calls + reasoning + does NOT have tool_calls on a partial message - test_legitimate_tool_calls_are_preserved_for_completed_turns: pins that real OpenAI-shape tool_calls on completed turns survive sanitize unchanged Tests: 3486 passing (3484 → 3486, +2 sanitize tests).
|
Shipped in v0.50.251 (merge Live at https://github.com/nesquena/hermes-webui/releases/tag/v0.50.251. What landedYour fix for #1361 ships clean — three distinct data-loss paths on Stop/Cancel are now closed:
The pattern conformance to Pre-release fix (Opus pass-2 MUST-FIX)One critical correctness bug Opus caught: Your fix stored captured tool calls under the message key Concrete failure scenario: user starts a turn, agent does some tool calls, user hits Stop, then types a follow-up. The next API call would 400 because the partial message's tool_calls don't match the API contract. So the very cancel-then-continue scenario your PR aimed to improve becomes a hard fail. Fix at This is a "make the structural shape itself enforce the invariant" fix — better than a comment because the field name being non-whitelisted means any future refactor that moves the partial-message logic still gets the right behavior automatically. Tests8 of your contributor tests + 2 sanitize-roundtrip tests = 10 total pinning the cancel-stream behavior end-to-end. |
…t on Stop/Cancel (nesquena#1375) Three distinct data-loss paths fixed: §A — Reasoning text was accumulated in a thread-local _reasoning_text inside _run_agent_streaming. cancel_stream() never saw it because it went out of scope when the thread was interrupted. Now mirrored to a new shared dict STREAM_REASONING_TEXT keyed by stream_id, populated in on_reasoning() and the reasoning branch of on_tool(), read in cancel_stream(). §B — Live tool calls in thread-local _live_tool_calls were similarly invisible to cancel_stream(). Now mirrored to STREAM_LIVE_TOOL_CALLS on tool.started + tool.completed. §C — Reasoning-only streams produced no partial message because the thinking-block regex strip returned empty string and the `if _stripped:` guard skipped the append. Now appends the partial message when EITHER content text, reasoning trace, OR tool calls exist. Mirrors the existing STREAM_PARTIAL_TEXT pattern from nesquena#893 exactly: same dict creation in _run_agent_streaming, same _live_config fallback in cancel_stream, same cleanup in _periodic_checkpoint. 8 regression tests in tests/test_issue1361_cancel_data_loss.py covering all three sections plus tools+text combinations. Co-authored-by: bergeouss <bergeouss@users.noreply.github.com>
nesquena#1373 behavior change Adds two more contributor PRs to the v0.50.251 batch per user directive (per-PR review + Opus review for nesquena#1373; nesquena#1375 was clean ship-on-sight). nesquena#1375 (@bergeouss, +382 LOC, all CI green) — fixes nesquena#1361 paid-token data loss on Stop/Cancel. Mirrors the existing STREAM_PARTIAL_TEXT pattern from nesquena#893: adds STREAM_REASONING_TEXT and STREAM_LIVE_TOOL_CALLS shared dicts populated during streaming and read by cancel_stream(). Also fixes the §C reasoning-only-creates-no-message gap where the strip-thinking-blocks regex returned empty string and the if-guard skipped the partial append. 8 regression tests covering all 3 sections plus tools+text combinations. nesquena#1373 (@bergeouss, +105 LOC, had CI failures pre-fix) — fixes nesquena#1195 new-profile-routes-to-default. The is_dir() guard in get_hermes_home_for_profile() caused new profiles (no session yet) to silently route every session back to the default profile until the directory existed on disk. Removed the guard; profile path is now returned unconditionally. Pre-release fix for nesquena#1373's CI failures: the change flipped two behaviors pinned by tests in nesquena#798: - R19c (test_get_hermes_home_for_profile_falls_back_for_missing_profile) asserted nonexistent → base. Renamed and updated to assert the new always-return-profile-path behavior. - R19j (test_get_hermes_home_for_profile_rejects_path_traversal) asserted that valid-but-nonexistent profile names → base. Updated to assert profile-scoped path. Also updated docstring: the _PROFILE_ID_RE regex is now the SOLE defense against path traversal (previously is_dir() was a defense-in-depth layer); verified each known-bad shape still returns base. Tests: 3484 passing (3471 → 3484, +13).
…ST-FIX) Opus pass-2 review of v0.50.251 caught a critical regression in PR nesquena#1375: The cancel-partial message stored captured tool calls under the 'tool_calls' key. That key is whitelisted by _API_SAFE_MSG_KEYS so _sanitize_messages_for_api forwarded the entries to the next-turn LLM call. But the captured entries use the WebUI internal shape ({name, args, done, duration, is_error}) — they don't have the OpenAI/Anthropic id + function: {name, arguments} envelope. Strict providers (OpenAI, Anthropic, Z.AI/GLM) would 400 on the malformed entries. Net effect: the very cancel-then-continue scenario PR nesquena#1375 aimed to improve becomes a hard fail. Fix: - Rename the persisted key to '_partial_tool_calls' (underscore- prefixed private key NOT in _API_SAFE_MSG_KEYS, so sanitize correctly strips it). - Update static/messages.js hasMessageToolMetadata check to also recognize _partial_tool_calls for UI rendering. - Update test_issue1361_cancel_data_loss.py assertion to check _partial_tool_calls (and tool_calls as legacy fallback). Plus 2 NIT fixes from the same Opus review: NIT 1 (api/profiles.py:153): re.match → re.fullmatch for consistency with other _PROFILE_ID_RE callers in the codebase. The trailing- newline footgun ($ matches before final \n in re.match) is now closed. Without nesquena#1373's is_dir() guard, a name like 'valid\n' would have created a directory named 'valid\n' on Linux. Doesn't escape <HERMES_HOME>/profiles/ via Path joining, but unintended. NIT 2 (test_issue798.py): R19j coverage gaps — added trailing- newline tests, length-boundary tests (64-char valid, 65-char rejected), single-char minimum, and non-ASCII / Unicode-trick tests. New regression test (tests/test_pr1375_partial_tool_calls_sanitize.py): - test_partial_tool_calls_field_not_forwarded_to_llm: pins that sanitize-for-API strips _partial_tool_calls + reasoning + does NOT have tool_calls on a partial message - test_legitimate_tool_calls_are_preserved_for_completed_turns: pins that real OpenAI-shape tool_calls on completed turns survive sanitize unchanged Tests: 3486 passing (3484 → 3486, +2 sanitize tests).
## Release v0.51.254 — Release HV (stage-r2) Phase-2 medium wave 1 — 4 PRs (UI/mobile/cancel fixes + an un-held model dedup). ### Fixed | Issue/PR | Author | Fix | |----------|--------|-----| | #3528 | @franksong2702 | Render partial tool calls after cancel — interrupted turns keep their `_partial_tool_calls` rows in the transcript + fallback tool-cards. (Codex confirmed it stays render-only, not forwarded to the provider API.) | | #3550 | @lurebat | Android offline recovery soft-reattaches the live stream instead of hard-reloading the page on a transient background/disconnect. | | #3479 | @mvanhorn | iOS Safari no longer snaps the conversation to the top when a handoff/compression card is inserted mid-stream or on `refreshSession()`. | | #3478 | @JayC-L | **Un-held:** named custom providers (`@custom:name:model`) dedup against bare model IDs without regressing Ollama multi-colon tags (`qwen2.5:7b-instruct-q4`). Only `@custom:` IDs strip the two-segment prefix. | ### Hold-sweep note #3478/#3489 was held yesterday for an Ollama multi-colon-tag regression risk (a blanket `lastIndexOf` would lose the model). The author pushed a scoped fix (only `@custom:` IDs use `lastIndexOf`); I verified `_normId` in node against the regression cases — Ollama bare tags are preserved. Un-held + shipped. ### Gate - Full pytest suite: **7588 passed, 0 failed** - ESLint: CLEAN · ruff: CLEAN · browser-smoke: CLEAN - Codex (regression): **SAFE TO SHIP** — #3552 partial-tool-calls verified render-only (no `_API_SAFE_MSG_KEYS` leak / no 400-on-strict-provider, the v0.50.251 #1375 trap); #3551 no EventSource double-subscribe; #3541 no regression vs the #3525 scroll-follow shipped in v0.51.253; #3489 no over-dedup. Co-authored-by: franksong2702 <franksong2702@users.noreply.github.com> Co-authored-by: lurebat <lurebat@users.noreply.github.com> Co-authored-by: mvanhorn <mvanhorn@users.noreply.github.com> Co-authored-by: JayC-L <JayC-L@users.noreply.github.com>
Problem
Hitting Stop during a streaming response loses both the AI's already-streamed reasoning and tool output — paid tokens gone with no way to recover.
Three distinct data-loss paths:
§A — Reasoning text lost on cancel
_reasoning_textis accumulated in a thread-local variable inside_run_agent_streaming. On cancel,cancel_stream()never sees it — it goes out of scope when the thread is interrupted.§B — Tool calls lost on cancel
_live_tool_callsis also thread-local.cancel_stream()only capturesSTREAM_PARTIAL_TEXT— tool calls the user saw on screen disappear after refresh.§C — Reasoning-only streams produce no partial message
When the entire output is reasoning (no visible tokens), the regex strip of thinking blocks returns empty string. The
if _stripped:guard skips the append entirely — only*Task cancelled.*survives.Fix
Mirrors the same pattern as the existing
STREAM_PARTIAL_TEXTshared dict:STREAM_REASONING_TEXT— new shared dict inapi/config.py, populated inon_reasoning()and the reasoning branch ofon_tool(), read incancel_stream()STREAM_LIVE_TOOL_CALLS— new shared dict, populated ontool.startedandtool.completed, read incancel_stream()§C fix —
cancel_stream()now appends a partial assistant message when_strippedis empty but reasoning or tool calls exist, with the reasoning as areasoningfield and tool_calls as atool_callsfieldTests
8 new regression tests in
tests/test_issue1361_cancel_data_loss.py:Existing tests unaffected: #1298 (10/10) and #1195 (5/5) all pass.
Closes #1361