Skip to content

fix(#1361): preserve reasoning, tool calls, and partial output on Stop/Cancel#1375

Closed
bergeouss wants to merge 1 commit into
nesquena:masterfrom
bergeouss:fix/issue1361-cancel-data-loss
Closed

fix(#1361): preserve reasoning, tool calls, and partial output on Stop/Cancel#1375
bergeouss wants to merge 1 commit into
nesquena:masterfrom
bergeouss:fix/issue1361-cancel-data-loss

Conversation

@bergeouss

Copy link
Copy Markdown
Contributor

Problem

Hitting Stop during a streaming response loses both the AI's already-streamed reasoning and tool output — paid tokens gone with no way to recover.

Three distinct data-loss paths:

§A — Reasoning text lost on cancel
_reasoning_text is accumulated in a thread-local variable inside _run_agent_streaming. On cancel, cancel_stream() never sees it — it goes out of scope when the thread is interrupted.

§B — Tool calls lost on cancel
_live_tool_calls is also thread-local. cancel_stream() only captures STREAM_PARTIAL_TEXT — tool calls the user saw on screen disappear after refresh.

§C — Reasoning-only streams produce no partial message
When the entire output is reasoning (no visible tokens), the regex strip of thinking blocks returns empty string. The if _stripped: guard skips the append entirely — only *Task cancelled.* survives.

Fix

Mirrors the same pattern as the existing STREAM_PARTIAL_TEXT shared dict:

  1. STREAM_REASONING_TEXT — new shared dict in api/config.py, populated in on_reasoning() and the reasoning branch of on_tool(), read in cancel_stream()

  2. STREAM_LIVE_TOOL_CALLS — new shared dict, populated on tool.started and tool.completed, read in cancel_stream()

  3. §C fixcancel_stream() now appends a partial assistant message when _stripped is empty but reasoning or tool calls exist, with the reasoning as a reasoning field and tool_calls as a tool_calls field

Tests

8 new regression tests in tests/test_issue1361_cancel_data_loss.py:

  • ✅ Cancel with reasoning-only → reasoning preserved
  • ✅ Cancel with reasoning + partial tokens → both preserved
  • ✅ Cancel without reasoning dict → works as before (no regression)
  • ✅ Cancel with tool calls → tool_calls preserved
  • ✅ Cancel with tools + text → both preserved
  • ✅ Reasoning-only creates partial message (§C)
  • ✅ Tools-only creates partial message (§C)
  • ✅ No reasoning, no tools, no text → only cancel marker (no change)

Existing tests unaffected: #1298 (10/10) and #1195 (5/5) all pass.

Closes #1361

…t on cancel

Stop/Cancel was discarding AI's already-streamed reasoning and tool output —
paid tokens lost with no way to recover. Three sub-bugs fixed:

§A: Reasoning trace (on_reasoning callback) was accumulated in a thread-local
    variable invisible to cancel_stream(). Added STREAM_REASONING_TEXT shared
    dict (same pattern as STREAM_PARTIAL_TEXT) to make it accessible.

§B: Live tool calls (_live_tool_calls) were also thread-local. Added
    STREAM_LIVE_TOOL_CALLS shared dict to persist tool call info on cancel.

§C: When the entire streamed output was reasoning-only (no visible tokens),
    _stripped was empty after regex cleanup, so no partial assistant message
    was appended — only the *Task cancelled.* marker survived. Now appends
    a partial message with reasoning/tool_calls fields even when content is
    empty.

Tests: 8 new regression tests covering all three sub-bugs plus no-regression
cases. Existing nesquena#1298 and nesquena#1195 tests unaffected (15/15 pass).
nesquena-hermes pushed a commit that referenced this pull request Apr 30, 2026
…p/Cancel (#1375)

Three distinct data-loss paths fixed:

§A — Reasoning text was accumulated in a thread-local _reasoning_text
inside _run_agent_streaming. cancel_stream() never saw it because it
went out of scope when the thread was interrupted. Now mirrored to a
new shared dict STREAM_REASONING_TEXT keyed by stream_id, populated
in on_reasoning() and the reasoning branch of on_tool(), read in
cancel_stream().

§B — Live tool calls in thread-local _live_tool_calls were similarly
invisible to cancel_stream(). Now mirrored to STREAM_LIVE_TOOL_CALLS
on tool.started + tool.completed.

§C — Reasoning-only streams produced no partial message because the
thinking-block regex strip returned empty string and the `if _stripped:`
guard skipped the append. Now appends the partial message when EITHER
content text, reasoning trace, OR tool calls exist.

Mirrors the existing STREAM_PARTIAL_TEXT pattern from #893 exactly:
same dict creation in _run_agent_streaming, same _live_config fallback
in cancel_stream, same cleanup in _periodic_checkpoint.

8 regression tests in tests/test_issue1361_cancel_data_loss.py
covering all three sections plus tools+text combinations.

Co-authored-by: bergeouss <bergeouss@users.noreply.github.com>
nesquena-hermes added a commit that referenced this pull request Apr 30, 2026
…r change

Adds two more contributor PRs to the v0.50.251 batch per user
directive (per-PR review + Opus review for #1373; #1375 was clean
ship-on-sight).

#1375 (@bergeouss, +382 LOC, all CI green) — fixes #1361 paid-token
data loss on Stop/Cancel. Mirrors the existing STREAM_PARTIAL_TEXT
pattern from #893: adds STREAM_REASONING_TEXT and STREAM_LIVE_TOOL_CALLS
shared dicts populated during streaming and read by cancel_stream().
Also fixes the §C reasoning-only-creates-no-message gap where the
strip-thinking-blocks regex returned empty string and the if-guard
skipped the partial append. 8 regression tests covering all 3
sections plus tools+text combinations.

#1373 (@bergeouss, +105 LOC, had CI failures pre-fix) — fixes #1195
new-profile-routes-to-default. The is_dir() guard in
get_hermes_home_for_profile() caused new profiles (no session yet)
to silently route every session back to the default profile until
the directory existed on disk. Removed the guard; profile path is
now returned unconditionally.

Pre-release fix for #1373's CI failures: the change flipped two
behaviors pinned by tests in #798:
- R19c (test_get_hermes_home_for_profile_falls_back_for_missing_profile)
  asserted nonexistent → base. Renamed and updated to assert the
  new always-return-profile-path behavior.
- R19j (test_get_hermes_home_for_profile_rejects_path_traversal)
  asserted that valid-but-nonexistent profile names → base. Updated
  to assert profile-scoped path. Also updated docstring: the
  _PROFILE_ID_RE regex is now the SOLE defense against path
  traversal (previously is_dir() was a defense-in-depth layer);
  verified each known-bad shape still returns base.

Tests: 3484 passing (3471 → 3484, +13).
@nesquena nesquena mentioned this pull request Apr 30, 2026
5 tasks
nesquena-hermes added a commit that referenced this pull request Apr 30, 2026
…ST-FIX)

Opus pass-2 review of v0.50.251 caught a critical regression in PR
#1375:

The cancel-partial message stored captured tool calls under the
'tool_calls' key. That key is whitelisted by _API_SAFE_MSG_KEYS so
_sanitize_messages_for_api forwarded the entries to the next-turn
LLM call. But the captured entries use the WebUI internal shape
({name, args, done, duration, is_error}) — they don't have the
OpenAI/Anthropic id + function: {name, arguments} envelope. Strict
providers (OpenAI, Anthropic, Z.AI/GLM) would 400 on the malformed
entries. Net effect: the very cancel-then-continue scenario PR
#1375 aimed to improve becomes a hard fail.

Fix:
- Rename the persisted key to '_partial_tool_calls' (underscore-
  prefixed private key NOT in _API_SAFE_MSG_KEYS, so sanitize
  correctly strips it).
- Update static/messages.js hasMessageToolMetadata check to also
  recognize _partial_tool_calls for UI rendering.
- Update test_issue1361_cancel_data_loss.py assertion to check
  _partial_tool_calls (and tool_calls as legacy fallback).

Plus 2 NIT fixes from the same Opus review:

NIT 1 (api/profiles.py:153): re.match → re.fullmatch for consistency
with other _PROFILE_ID_RE callers in the codebase. The trailing-
newline footgun ($ matches before final \n in re.match) is now
closed. Without #1373's is_dir() guard, a name like 'valid\n' would
have created a directory named 'valid\n' on Linux. Doesn't escape
<HERMES_HOME>/profiles/ via Path joining, but unintended.

NIT 2 (test_issue798.py): R19j coverage gaps — added trailing-
newline tests, length-boundary tests (64-char valid, 65-char
rejected), single-char minimum, and non-ASCII / Unicode-trick tests.

New regression test (tests/test_pr1375_partial_tool_calls_sanitize.py):
- test_partial_tool_calls_field_not_forwarded_to_llm: pins that
  sanitize-for-API strips _partial_tool_calls + reasoning + does
  NOT have tool_calls on a partial message
- test_legitimate_tool_calls_are_preserved_for_completed_turns:
  pins that real OpenAI-shape tool_calls on completed turns survive
  sanitize unchanged

Tests: 3486 passing (3484 → 3486, +2 sanitize tests).
@nesquena-hermes

Copy link
Copy Markdown
Collaborator

Shipped in v0.50.251 (merge 031feda3, tag https://github.com/nesquena/hermes-webui/releases/tag/v0.50.251). Thanks @bergeouss — solid foundation work on a real correctness gap.

Live at https://github.com/nesquena/hermes-webui/releases/tag/v0.50.251.

What landed

Your fix for #1361 ships clean — three distinct data-loss paths on Stop/Cancel are now closed:

  • §A Reasoning text accumulated in thread-local _reasoning_text is now mirrored to STREAM_REASONING_TEXT shared dict, visible to cancel_stream()
  • §B Live tool calls similarly mirrored to STREAM_LIVE_TOOL_CALLS
  • §C Reasoning-only streams (no visible tokens) now produce a partial assistant message, not just a *Task cancelled.* marker

The pattern conformance to STREAM_PARTIAL_TEXT from #893 was excellent — same dict-creation, same _live_config fallback, same cleanup in _periodic_checkpoint.

Pre-release fix (Opus pass-2 MUST-FIX)

One critical correctness bug Opus caught:

Your fix stored captured tool calls under the message key tool_calls. That key is whitelisted by _API_SAFE_MSG_KEYS in api/streaming.py so the sanitize-for-API path forwards them to the next-turn LLM call. But the captured entries use the WebUI internal shape — they don't have OpenAI/Anthropic's id + function: arguments envelope. Strict providers would 400 on the malformed entries.

Concrete failure scenario: user starts a turn, agent does some tool calls, user hits Stop, then types a follow-up. The next API call would 400 because the partial message's tool_calls don't match the API contract. So the very cancel-then-continue scenario your PR aimed to improve becomes a hard fail.

Fix at f53556b: renamed the key to _partial_tool_calls (underscore-prefixed private key NOT in _API_SAFE_MSG_KEYS, so sanitize correctly strips it). Updated static/messages.js to also recognize _partial_tool_calls for UI rendering. Added tests/test_pr1375_partial_tool_calls_sanitize.py with 2 sanitize-roundtrip tests pinning the invariant.

This is a "make the structural shape itself enforce the invariant" fix — better than a comment because the field name being non-whitelisted means any future refactor that moves the partial-message logic still gets the right behavior automatically.

Tests

8 of your contributor tests + 2 sanitize-roundtrip tests = 10 total pinning the cancel-stream behavior end-to-end.

GeoffBao pushed a commit to GeoffBao/hermes-webui that referenced this pull request May 1, 2026
…t on Stop/Cancel (nesquena#1375)

Three distinct data-loss paths fixed:

§A — Reasoning text was accumulated in a thread-local _reasoning_text
inside _run_agent_streaming. cancel_stream() never saw it because it
went out of scope when the thread was interrupted. Now mirrored to a
new shared dict STREAM_REASONING_TEXT keyed by stream_id, populated
in on_reasoning() and the reasoning branch of on_tool(), read in
cancel_stream().

§B — Live tool calls in thread-local _live_tool_calls were similarly
invisible to cancel_stream(). Now mirrored to STREAM_LIVE_TOOL_CALLS
on tool.started + tool.completed.

§C — Reasoning-only streams produced no partial message because the
thinking-block regex strip returned empty string and the `if _stripped:`
guard skipped the append. Now appends the partial message when EITHER
content text, reasoning trace, OR tool calls exist.

Mirrors the existing STREAM_PARTIAL_TEXT pattern from nesquena#893 exactly:
same dict creation in _run_agent_streaming, same _live_config fallback
in cancel_stream, same cleanup in _periodic_checkpoint.

8 regression tests in tests/test_issue1361_cancel_data_loss.py
covering all three sections plus tools+text combinations.

Co-authored-by: bergeouss <bergeouss@users.noreply.github.com>
GeoffBao pushed a commit to GeoffBao/hermes-webui that referenced this pull request May 1, 2026
nesquena#1373 behavior change

Adds two more contributor PRs to the v0.50.251 batch per user
directive (per-PR review + Opus review for nesquena#1373; nesquena#1375 was clean
ship-on-sight).

nesquena#1375 (@bergeouss, +382 LOC, all CI green) — fixes nesquena#1361 paid-token
data loss on Stop/Cancel. Mirrors the existing STREAM_PARTIAL_TEXT
pattern from nesquena#893: adds STREAM_REASONING_TEXT and STREAM_LIVE_TOOL_CALLS
shared dicts populated during streaming and read by cancel_stream().
Also fixes the §C reasoning-only-creates-no-message gap where the
strip-thinking-blocks regex returned empty string and the if-guard
skipped the partial append. 8 regression tests covering all 3
sections plus tools+text combinations.

nesquena#1373 (@bergeouss, +105 LOC, had CI failures pre-fix) — fixes nesquena#1195
new-profile-routes-to-default. The is_dir() guard in
get_hermes_home_for_profile() caused new profiles (no session yet)
to silently route every session back to the default profile until
the directory existed on disk. Removed the guard; profile path is
now returned unconditionally.

Pre-release fix for nesquena#1373's CI failures: the change flipped two
behaviors pinned by tests in nesquena#798:
- R19c (test_get_hermes_home_for_profile_falls_back_for_missing_profile)
  asserted nonexistent → base. Renamed and updated to assert the
  new always-return-profile-path behavior.
- R19j (test_get_hermes_home_for_profile_rejects_path_traversal)
  asserted that valid-but-nonexistent profile names → base. Updated
  to assert profile-scoped path. Also updated docstring: the
  _PROFILE_ID_RE regex is now the SOLE defense against path
  traversal (previously is_dir() was a defense-in-depth layer);
  verified each known-bad shape still returns base.

Tests: 3484 passing (3471 → 3484, +13).
GeoffBao pushed a commit to GeoffBao/hermes-webui that referenced this pull request May 1, 2026
…ST-FIX)

Opus pass-2 review of v0.50.251 caught a critical regression in PR
nesquena#1375:

The cancel-partial message stored captured tool calls under the
'tool_calls' key. That key is whitelisted by _API_SAFE_MSG_KEYS so
_sanitize_messages_for_api forwarded the entries to the next-turn
LLM call. But the captured entries use the WebUI internal shape
({name, args, done, duration, is_error}) — they don't have the
OpenAI/Anthropic id + function: {name, arguments} envelope. Strict
providers (OpenAI, Anthropic, Z.AI/GLM) would 400 on the malformed
entries. Net effect: the very cancel-then-continue scenario PR
nesquena#1375 aimed to improve becomes a hard fail.

Fix:
- Rename the persisted key to '_partial_tool_calls' (underscore-
  prefixed private key NOT in _API_SAFE_MSG_KEYS, so sanitize
  correctly strips it).
- Update static/messages.js hasMessageToolMetadata check to also
  recognize _partial_tool_calls for UI rendering.
- Update test_issue1361_cancel_data_loss.py assertion to check
  _partial_tool_calls (and tool_calls as legacy fallback).

Plus 2 NIT fixes from the same Opus review:

NIT 1 (api/profiles.py:153): re.match → re.fullmatch for consistency
with other _PROFILE_ID_RE callers in the codebase. The trailing-
newline footgun ($ matches before final \n in re.match) is now
closed. Without nesquena#1373's is_dir() guard, a name like 'valid\n' would
have created a directory named 'valid\n' on Linux. Doesn't escape
<HERMES_HOME>/profiles/ via Path joining, but unintended.

NIT 2 (test_issue798.py): R19j coverage gaps — added trailing-
newline tests, length-boundary tests (64-char valid, 65-char
rejected), single-char minimum, and non-ASCII / Unicode-trick tests.

New regression test (tests/test_pr1375_partial_tool_calls_sanitize.py):
- test_partial_tool_calls_field_not_forwarded_to_llm: pins that
  sanitize-for-API strips _partial_tool_calls + reasoning + does
  NOT have tool_calls on a partial message
- test_legitimate_tool_calls_are_preserved_for_completed_turns:
  pins that real OpenAI-shape tool_calls on completed turns survive
  sanitize unchanged

Tests: 3486 passing (3484 → 3486, +2 sanitize tests).
nesquena-hermes added a commit that referenced this pull request Jun 4, 2026
## Release v0.51.254 — Release HV (stage-r2)

Phase-2 medium wave 1 — 4 PRs (UI/mobile/cancel fixes + an un-held model dedup).

### Fixed
| Issue/PR | Author | Fix |
|----------|--------|-----|
| #3528 | @franksong2702 | Render partial tool calls after cancel — interrupted turns keep their `_partial_tool_calls` rows in the transcript + fallback tool-cards. (Codex confirmed it stays render-only, not forwarded to the provider API.) |
| #3550 | @lurebat | Android offline recovery soft-reattaches the live stream instead of hard-reloading the page on a transient background/disconnect. |
| #3479 | @mvanhorn | iOS Safari no longer snaps the conversation to the top when a handoff/compression card is inserted mid-stream or on `refreshSession()`. |
| #3478 | @JayC-L | **Un-held:** named custom providers (`@custom:name:model`) dedup against bare model IDs without regressing Ollama multi-colon tags (`qwen2.5:7b-instruct-q4`). Only `@custom:` IDs strip the two-segment prefix. |

### Hold-sweep note
#3478/#3489 was held yesterday for an Ollama multi-colon-tag regression risk (a blanket `lastIndexOf` would lose the model). The author pushed a scoped fix (only `@custom:` IDs use `lastIndexOf`); I verified `_normId` in node against the regression cases — Ollama bare tags are preserved. Un-held + shipped.

### Gate
- Full pytest suite: **7588 passed, 0 failed**
- ESLint: CLEAN · ruff: CLEAN · browser-smoke: CLEAN
- Codex (regression): **SAFE TO SHIP** — #3552 partial-tool-calls verified render-only (no `_API_SAFE_MSG_KEYS` leak / no 400-on-strict-provider, the v0.50.251 #1375 trap); #3551 no EventSource double-subscribe; #3541 no regression vs the #3525 scroll-follow shipped in v0.51.253; #3489 no over-dedup.

Co-authored-by: franksong2702 <franksong2702@users.noreply.github.com>
Co-authored-by: lurebat <lurebat@users.noreply.github.com>
Co-authored-by: mvanhorn <mvanhorn@users.noreply.github.com>
Co-authored-by: JayC-L <JayC-L@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(streaming): Stop/Cancel discards AI's already-streamed reasoning + tool output (and possibly user prompt) — paid tokens lost

2 participants