Fix pre-stream gap UX: thinking dots + stop button on send#2
Conversation
Per-request profile switches (process_wide=False, introduced in nesquena#1700) update os.environ['HERMES_HOME'] but skip _set_hermes_home(), which is responsible for monkeypatching module-level caches. Both tools/skills_tool.py and tools/skill_manager_tool.py set HERMES_HOME and SKILLS_DIR once at import time. When a non-default profile is active in the WebUI, os.environ['HERMES_HOME'] is correctly updated per-turn in the _ENV_LOCK block, but the module-level constants still point at the root profile. All agent-side skill operations — skills_list(), skill_view(), skill_manage() — read and write to the wrong directory. Add the same monkeypatching that _set_hermes_home() already performs (profiles.py line ~620) to the per-turn env setup block in streaming.py, covering both skills_tool and skill_manager_tool. The WebUI display half was already fixed in nesquena#1917 via _active_skills_dir() in routes.py. This patch fixes the agent-side half so the running agent resolves skills from the correct profile.
Update Chinese language translation
Add xiaomi to _PROVIDER_DISPLAY, _PROVIDER_MODELS, and _PROVIDER_ALIASES so the WebUI recognizes Xiaomi as a first-class provider. Models included: - mimo-v2.5-pro (MiMo V2.5 Pro) - mimo-v2.5 (MiMo V2.5) - mimo-v2-pro (MiMo V2 Pro) - mimo-v2-omni (MiMo V2 Omni) - mimo-v2-flash (MiMo V2 Flash) Aliases: mimo, xiaomi-mimo -> xiaomi The hermes-agent CLI already registers xiaomi as a provider (hermes_cli/models.py, hermes_cli/auth.py) but the WebUI was missing the corresponding entries, causing the model dropdown to fall back to OpenRouter and the provider list to show 'Unsupported'.
When context compression fires, the agent rotates to a new session_id.
The compression migration block correctly migrates the session lock,
SESSION_AGENT_CACHE, SESSIONS dict, and the session file rename, but
does not ensure s.profile is set on the continuation session.
On the next request, _run_agent_streaming resolves the profile via:
get_hermes_home_for_profile(getattr(s, 'profile', None))
With s.profile == None this falls back to the default profile's
HERMES_HOME. Memory tool calls then read and write the wrong profile's
MEMORY.md — confirmed by investigation: session 0dfefb (continuation
after compression from a troubleshooting profile session) read memory
at 16% / 1,184 chars with 4 entries, while the troubleshooting profile's
actual state was 72-77% / 5,000+ chars. That reading could only come
from the default profile's bank. Subsequent replace operations failed
because the target entries existed only in the troubleshooting profile.
There are two failure paths:
1. In-memory: if s.profile was None from the start (legacy session or
one created before this fix), the continuation session object carries
null through the current request.
2. Persistence: s.save() persists "profile": null to the continuation
session's JSON file (profile is in METADATA_FIELDS, models.py ~408).
On the next request, Session.load(new_sid) reads it back as null and
get_hermes_home_for_profile(None) falls back to the default profile.
Fix: capture _resolved_profile_name at request entry (~line 2019),
immediately after profile home resolution. This is the only point where
profile context is reliable: s.profile if already set, otherwise
get_active_profile_name() — which at that point reads thread-local
storage (_tls.profile) correctly set by the HTTP handler thread via
set_request_profile(). Calling get_active_profile_name() at compression
time instead would be unsafe: the streaming thread is a separate
threading.Thread, does not inherit TLS, and the call would fall back to
the process-global _active_profile which may belong to a different
concurrent tab.
Stamp s.profile in the compression migration block immediately after
s.session_id = new_sid. Guarded by `if not s.profile` so sessions that
already have a profile set are unaffected. A logger.info line records
when the stamp fires, making future investigation straightforward.
Fixes: memory writes bleeding into default profile after compression
Reproduces: reliably on any long non-default profile session that hits
the compression threshold (default: 0.80 context fill)
…hile in-flight by @franksong2702
…_count pluralization by @franksong2702
…or assignee select by @franksong2702
…ls + status hint by @franksong2702
… + extend locale parity test (Opus advisor SHIP-WITH-CAVEATS follow-up)
…uple entries + _LOGIN_LOCALE block nesquena#2142 (legeantbleu) added the fr locale to static/i18n.js but didn't update: 1. tests/test_issue1488_composer_voice_buttons.py: two TestComposerVoiceButtonI18n + TestVoiceModePreferenceGate LOCALES tuples needed 'fr' 2. api/routes.py: _LOGIN_LOCALE needed an 'fr' block so the login page localizes for French users (issue nesquena#1442 parity contract) 3. tests/test_login_locale_parity.py: the test asserting 'fr' falls-back-to-'en' is inverted — fr now resolves to fr, with sibling assertions for fr-FR and fr-CA Mirrors the stage-340 fix for the it locale (PR nesquena#2067 → maintainer adds tuple entries). 46/46 i18n tests pass after fix.
… contributor batch
… + stale-done re-emit
(1) compress/status no longer pops the job entry on first read of `done` payload.
Second open tab no longer sees `idle` and a stale-job toast.
(2) compress/start no longer short-circuits to a stale `done` payload when
re-invoked within the 10-minute TTL. Re-running /compress always starts
fresh, so closing-and-reopening a tab mid-compress works correctly.
Third SHOULD-FIX (nesquena#2135 cfg["model"] fallback tightening when no custom_providers
entry matches) deferred to follow-up — strictly no-worse-than-master behavior.
tests/test_sprint46.py 10/10 still passes.
stage-344: 16-PR contributor batch — i18n + insights + manual-compress async + workspace recovery + iOS PWA + Cloudflare login health + bash 3.2 (fr locale)
fix: guard stale stream writebacks (LumenYoung) Prevents stale WebUI stream workers from writing old results into a session after that session has already moved on to another stream. Adds new helper _stream_writeback_is_current() (a token equality check against the session's active_stream_id) and short-circuits the two finalize/cancel paths when the worker no longer owns the session writeback.
feat: add manual provider usage refresh (Jordan-SkyLF) Adds a 'Refresh usage' button on the Provider quota card in Settings → Providers, with cache: 'no-store' fetch + browser cache-bust query string. Pure browser-side cache-busting; the server-side /api/provider/quota endpoint has no cache layer yet (refresh=1 query param is currently a no-op server-side; the win is bypassing browser/proxy/SW caches).
stage-345: 2-PR low-risk batch — stream-ownership guard against stale writebacks + Refresh-usage button on provider quota card
…tore avatar - Hard-reset to upstream/master (stage-345, v0.51.51) to fix all broken functionality - Migrated Claude skin (full palette + typography + component affordances) - Migrated Nebula skin (accent-only cyan-blue-violet palette) - Skipped Sienna-specific affordances (already canonical in upstream stage-345) - Restored hermes-agent-avatar.png exactly (MD5: 6b4e80f8cd848bd4ef640e48030006e5) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Match both e.key==='k' and e.key==='K' so Cmd+K works regardless of Caps Lock state (upstream B handler already does this for 'b'/'B') - Wrap the newSession() call in try/catch in both the Cmd+K keydown handler and btnNewChat.onclick so any server-side failure shows a toast instead of silently disappearing into an unhandled promise rejection Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two bugs caused by the window between setBusy(true) and S.activeStreamId being set (the /api/chat/start round-trip, which can take seconds on slow providers): 1. Send button stays disabled instead of showing the Stop icon: getComposerPrimaryAction() required S.activeStreamId to return 'stop', but S.activeStreamId is explicitly nulled before the POST and only set on response. Fix: check S.busy||S.activeStreamId so the button flips to Stop immediately. 2. Thinking dots never appear until the stream starts: appendThinking() guarded on !S.activeStreamId and returned early. Fix: relax guard to !S.busy&&!S.activeStreamId (allow when busy, even pre-stream). Also reorder messages.js: setBusy(true) now runs before appendThinking() so S.busy=true is set when the check runs. 3. Bonus: Stop now works during the pre-stream gap: cancelStream() extended to handle the null-streamId case — clears S.busy, removes thinking indicator, and aborts the in-flight /api/chat/start fetch via AbortController (window._abortPendingChatStart). AbortError in the send() catch block is treated as user-cancel (clean teardown, no error toast). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- messages.js: revert appendThinking/setBusy call order to match test
assertion (`appendThinking();setBusy(true);`), fix activeStreamId
comment to match exact marker test checks
- ui.js: revert appendThinking guard back to `!S.activeStreamId` only
(removes the S.busy relaxation that broke test ordering contract)
- boot.js: simplify Cmd+K key check back to `e.key==='k'` (exact
string the test searches for); compact cancelStream early-return
so try/catch lands within the 400-char test window; remove
redundant S.activeStreamId=null from early path so cleanup_idx
stays after catch_idx
- style.css: add space in skin-scoped `.send-btn {` rule so the
global `.send-btn{` rule is the first match for the CSS tests
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The test asserts updateSendBtn() is called within 200 chars of the S.activeStreamId null-reset marker. The AbortController comment was pushing it past that limit. Move updateSendBtn() to immediately after the marker to satisfy the test. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Remove the `!S.activeStreamId` guard from appendThinking() so the thinking animation appears as soon as the user sends a message, rather than waiting for /api/chat/start to respond and the SSE stream to open. The stale-event protection (preventing old stream events from polluting a new session) is already enforced by the activeSid check in the SSE outer loop in messages.js, so this guard was only causing a noticeable blank gap between send and first feedback. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Keep our version that removes the !S.activeStreamId guard from appendThinking() so thinking dots appear immediately on send. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
💡 Codex ReviewLine 3 in 58b6357 Handle the no- ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
When the user clicks Stop before /api/chat/start responds, the early-return path now calls removeThinking() after setBusy(false) to clear the optimistic thinking row that appendThinking() already injected. Without this, a stale "in-progress" indicator lingered in the transcript after the cancel. Also switches to optional-chaining for the abort call (window._abortPendingChatStart?.()) to keep the function compact within the CI test's 400-char inspection window. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Fixed in 8e764d1: added |
…h.json fingerprint auth.json is rewritten by credential-pool/OAuth token refresh roughly every 14 minutes. _models_cache_source_fingerprint() hashed it via mtime/size (nesquena#1699 _models_cache_file_fingerprint), so every token refresh churned the fingerprint and the 24h /api/models cache was effectively dead -- the hot GET /api/session?resolve_model=1 path paid a cold ~11.5s rebuild every few minutes (RCA t_d127953d residual #2, t_16551f61). Add _auth_store_semantic_fingerprint(): content-hash auth.json with a DENY-list of known credential-rotation-only keys (access/refresh token, expiry, per-credential status/telemetry, request_count, save updated_at) stripped. Deny-list (not allow-list) is deliberate -- any unknown field, or a real provider/endpoint/model-set change (active_provider, a new credential_pool entry, base_url, source, label, auth_type, the providers{} block, ...) stays in the fingerprint and still correctly busts the cache. Conservative fallbacks: missing file -> marked; unreadable/corrupt -> stat-based fallback (never less safe than pre-fix). config.yaml keeps the cheap stat fingerprint (deliberate edits, no timer churn). Bidirectional invariant regression test (non-tautological -- the end-to-end churn test flips RED when the auth_json axis is reverted to stat-based): token-only churn keeps fingerprint byte-identical AND keeps a valid disk cache loadable; active_provider change / new credential_pool entry / changed base_url each flip the fingerprint AND reject the stale disk cache. Measured: 5/5 cold rebuilds per 5 refresh cycles -> 0/5. Tests: 9 new pass; 28 adjacent (nesquena#1699/nesquena#1633/display-resolver) pass; 54 models_cache/fingerprint suite pass.
…y-list (Codex review #2) Under a named profile, process HERMES_HOME is ~/.hermes/profiles/<name> but the allowlist still grants base ~/.hermes — so the prior deny (anchored only on the active-profile root + STATE_DIR) left ~/.hermes/state.db and sibling-profile secrets (~/.hermes/profiles/other/auth.json) reachable. Build deny roots from every Hermes state root the allowlist accepts: active HERMES_HOME, base ~/.hermes, api.profiles._DEFAULT_HERMES_HOME, and STATE_DIR; apply the state-subdir dir-denies under each. Widen the CSP-slice structural test window to match.
…ode) — toggle-mid-record safety (Codex review #2) Codex round-2: _stopMic and mediaRecorder.onstop read the CURRENT _rawAudioMode to choose backend/dispatch, but the recording was started on the OLD mode — so toggling Settings→Sound mid-recording could stop the wrong backend (orphaning the other) or dispatch raw-vs-transcribe wrongly. Pin _activeCaptureMode (speech | media-raw | media-transcribe) at start; _stopMic + onstop use it. Adds front-end source-invariant regression tests. Co-authored-by: lucasrc <lrclucas@gmail.com>
…dir dedup (Opus SHOULD-FIX) Opus review SHOULD-FIX on the upload surface: - Add _MAX_ARCHIVE_MEMBERS=10000 cap in extract_archive (both zip + tar loops): a tiny archive with millions of members slips under the byte cap but can exhaust inodes/fds. Trips before extraction, cleaned up via the existing rmtree-on-exception. Regression test added. - Bound the extraction-dir collision-suffix loop (was while-True) to 1000 tries. Other Opus SHOULD-FIX items (member-count #1 done; #2 done) filed as follow-up or N/A: same-field multi-file collapse doesn't apply (frontend sends one request per file); .tar.gz stem cosmetic.
…n (Codex round-2) Codex found two more once the config-guard bug was fixed (Opus concurred on #2): 1. panels.js _buildPluginCard built the Open button + enable toggle with inline onclick/onchange that interpolated tab.path / plugin.key into a JS-string-in- attribute context — HTML-escaping is insufficient there (quote breakout). Now rendered inert + bound via addEventListener with RAW closure values. 2. tab.path was unvalidated. Added _VALID_PLUGIN_TAB_PATH (^/[A-Za-z0-9._~/-]{0,255}$) in load_plugins() — absolute, no quotes/query/fragment/control chars. Also Opus nit: deep-merge now coerces dashboard_plugins values to bool + str keys. Regression tests added for both.
…oning nesquena#3455 + LLM Wiki last-writer nesquena#1257) (nesquena#3466) * Release v0.51.230 (stage-p14): extract <think> to m.reasoning nesquena#3455 + LLM Wiki last-writer (nesquena#1257) Salvage of nesquena#3455 (@gsurenull): dropped the stale api/config.py bits (MiniMax-M3 + SCHEMA_VERSION 3->4 — both already on master via nesquena#3374). Kept the two genuine fixes: (1) _splitThinkFromContent persist-path extraction of inline <think> blocks into m.reasoning (fixes 30-50% session bloat for reasoning-only providers like MiniMax-M3); (2) LLM Wiki status Last-writer 3-tier fallback (was always 'Not available' since nesquena#1257). Added 9 Node-driven think-split regression tests (data-loss guards: content-before/after preserved, unclosed blocks intact, lookalike tags not extracted). * fix(nesquena#3455): renderer-matching think extraction + wiki symlink/bounded-read guards (Codex review) Codex review of stage-p14 found 3 SILENT bugs, all fixed: (1) DATA-LOSS: _splitThinkFromContent's Pass-2 whole-body scan extracted a CLOSED literal <think>...</think> from visible prose/code (e.g. inside a fenced code block) into m.reasoning, emptying it — more aggressive than the renderer (which only strips LEADING blocks). Removed Pass 2; extraction now matches _streamDisplay semantics (leading-only, loop captures consecutive leading blocks). +fenced-code regression test. (2) PRIVACY: _llm_wiki_last_writer followed symlinked .md pages resolving OUTSIDE the wiki (is_file follows symlinks), leaking external frontmatter. Now requires resolved path under wiki_root. +symlink-containment regression test. (3) CONTRACT/PERF: replaced full read_text() with bounded line-by-line reads (frontmatter block only / capped log-heading scan), never page bodies. * fix(nesquena#3455): think-split is leading-single (renderer-matching) + fix 2 stale source-match tests Codex re-review finding #2: looping consecutive leading blocks diverged from the renderer (_streamDisplay/_parseStreamState strip ONE leading block). Now extracts exactly one leading block. Also updated 2 tests that asserted pre-split implementation strings: test_live_stream_tokens_persist (content:assistantText -> content:split.content, invariant preserved) and the consecutive-blocks test. NOTE: Codex finding #1 (client-only split doesn't persist server-side) is a separate architectural decision pending Nathan. * feat(nesquena#3455): split inline <think> server-side before s.save() so persisted file is compacted (Codex #1) Codex finding #1: the think-split was client-only, so the SAVED session file still carried inline <think> blocks (bloat) — the fix only compacted the browser copy. Added _split_thinking_from_content (api/streaming.py), a server-side twin of the JS helper with identical leading-only/single-block semantics, applied to the final assistant message before s.save() (extended the existing reasoning-persist block). Merges with on_reasoning-stream reasoning. +8 backend-parity regression tests covering the mid-body-code-block data-loss guard, unclosed-intact, single-leading, none-content. * test: update 3 save-path source-assertion tests for nesquena#3455 server-side think-split The backend think-split (api/streaming.py reasoning-persist block) changed the literal code shape + grew the pre-save block, breaking 8 source-assertion tests that anchor on it: - test_sprint42: assert _rm['reasoning']=_reasoning_text -> now _merged_reasoning/_existing_reasoning + _split_thinking_from_content present (intent preserved: reasoning persisted before save). - test_pr1318 (6) + test_pr1341: re-anchored the locator from the changed 'if _reasoning_text and s.messages:' line to the stable 'Persist reasoning trace in the session' comment marker; bumped the 1341 byte-distance limit 15000->16000 (the test self-documents bumping on legit pre-save growth). All behavioral invariants (reasoning persisted + context fields before save) unchanged. --------- Co-authored-by: nesquena-hermes <[email protected]>
…nders twice nesquena#3709) (nesquena#3715) * fix(nesquena#3709): thinking card no longer renders twice (in Activity + below answer) The nesquena#3592 inline-render branch (v0.51.258) emitted a thinking card for a thinking-only message even when a sibling tool-message in the same turn already built an Activity group carrying that turn's thinking — so the card showed twice, the second one stranded below the answer + 'Done in …' footer (insertAdjacentHTML 'beforeend' on a segment that already had body+footer). Fix (keeps nesquena#3592, does NOT revert it): - A1: precompute turnsWithActivityGroup (turns whose segments have tool cards); the inline branch only renders when the anchor turn is NOT in that set. - A2: when it does render inline, insert 'beforebegin' the .msg-body/.msg-foot so the card sits above the answer, not orphaned below the footer. - B: strip thinking against the TURN's combined visible answer (_turnVisibleTextByRawIdx), so a trailing thinking-only message that echoes the answer gets de-duped even though its own body is empty. Live-verified in browser: nesquena#3709 repro (tool+trailing-thinking) → exactly 1 card in Activity, above footer; nesquena#3592 repro (thinking-only) → exactly 1 inline card, not buried in a collapsed group. + regression test tests/test_issue3709_*. Supersedes nesquena#3708 (which deleted the inline branch outright, re-breaking nesquena#3592). * fix(nesquena#3709): merge suppressed sibling thinking into the Activity group (Codex re-gate) Codex caught a content-loss edge in the first cut: when A1 suppresses a thinking-only sibling's inline card (its turn has an Activity group), the group only rendered assistantThinking.get(aIdx) for the TOOL message — so a sibling with DISTINCT reasoning was neither inline nor in the group → dropped. Fix: aggregate all of a turn's thinking (turnThinkingParts, de-duped, index order) and render that merged text once per turn in the Activity group (_renderedTurnThinking guard). Live-verified: tool-thinking A + distinct sibling-thinking B → 1 merged node carrying both, no loss. + regression test. * fix(nesquena#3709): shared anchor resolver so inline-suppression & group placement agree (Codex re-gate #2) Codex caught a fallback-anchor mismatch: turnsWithActivityGroup was populated only from assistantSegments.get(tcIdx) (direct segment), but the group-render path falls back to a nearby earlier segment when a tool's assistant_msg_idx has no directly rendered segment (legacy/rebased). So a fallback-anchored group's turn wasn't in turnsWithActivityGroup → the sibling rendered inline AND the group rendered → dup again. Fix: one shared _anchorRowForActivityIdx(aIdx) helper (direct-or-fallback) used by the precompute, the inline branch, and the group render — they now agree. Live-verified all three repros still pass. * test(nesquena#3709): update test_compact_activity assertion to mergedThinking var The brittle source-scan asserted _thinkingActivityNode(thinkingText, false) — the nesquena#3709 fix renders the turn's MERGED thinking via _thinkingActivityNode(mergedThinking, false) into the same Activity body. Intent (settled thinking renders inside the Activity disclosure alongside tools) unchanged; only the source variable. Updated to assert the new variable, kept all intent assertions. --------- Co-authored-by: nesquena-hermes <[email protected]>
…esquena#3633) (nesquena#3853) * fix(streaming): normalize inline thinking extraction across live and persisted turns (nesquena#3599) # Conflicts: # api/streaming.py # static/messages.js # static/ui.js * fix(streaming): code-aware inline-thinking extraction + position-aware unclosed handling Codex deep-review caught two regressions in the leading-only -> full-scan rewrite (both silent data-mangling on the persist/reload path): 1. Code-span unawareness: the scanner only protected triple fences, so a literal <think> in an inline single-backtick code span or an indented (>=4-space/tab) code block got silently extracted into reasoning. Added _inline_thinking_indented_code_at + inline-backtick tracking (Python + the JS twin _thinkingIndentedCodeAt), so all three code contexts now keep thinking tags visible. 2. Unclosed-tag truncation: any unmatched open tag moved the trailing prose into reasoning. Now position-aware — a LEADING unclosed block (cut off mid-thought) is still reasoning (nesquena#3455 intent), but an unclosed tag AFTER visible content stays visible so literal typed tags don't truncate prose. Gated partial handling on the previously-unused options.streaming param (live streaming keeps 'still thinking' behavior; persist/reload does not). Updated 2 tests that pinned the buggy behavior + added 4 regression tests (inline-backtick, indented-code, mid-body-unclosed-visible, leading-unclosed- extracted). Updated the node driver harness to include the new helper. Co-authored-by: rodboev <rodboev@users.noreply.github.com> * fix(streaming): recognize fenced code blocks indented 1-3 spaces Codex round-3: a fence indented 1-3 spaces is valid Markdown but the fence detector only matched at column 0, so a literal think tag inside such a fence (not 4+-space indented code either) was still extracted. Both detectors (_inline_thinking_fence_marker_at / _thinkingFenceMarkerAt) now walk back over up to 3 leading spaces to a line start. Added backtick + tilde indented-fence regression tests. Co-authored-by: rodboev <rodboev@users.noreply.github.com> * fix(streaming): O(n) inline-thinking scan + merge separate reasoning on reload Round-4 Codex deep-review caught two real issues in my own fixes: 1. PERF (O(n^2)): the indented-code check (_inline_thinking_indented_code_at / _thinkingIndentedCodeAt) scanned to line boundaries at EVERY character index, plus the leading check sliced+stripped the whole prefix per unclosed tag. On long no-newline content this was quadratic (~8.4s @ 200k, called repeatedly on the streaming path). Replaced with incremental O(1)-per-iteration line state (_line_is_indented_code / _lineIsIndentedCode evaluated only at line starts) + a seen_nonspace flag. 200k now extracts in ~55-140ms. 2. RELOAD reasoning-drop: renderMessages() seeded the shared extractor with '' so a message with BOTH an inline <think> block AND a separate m.reasoning payload showed only the inline part — the separate payload was dropped because the !thinkingText worklog resolution was then skipped. Now seeds with the message's direct reasoning (m.reasoning_content||m.reasoning||...) so the two MERGE (deduped); separate-only reasoning is preserved without promoting it into visible prose. Python + JS twins kept line-for-line parity. Added merge + perf + reload regression tests; updated the reload structure test and the node driver harness for the renamed helper. Co-authored-by: rodboev <rodboev@users.noreply.github.com> * fix(streaming): revert reload reasoning-seed; keep O(n) perf fix Codex round-4 finding #2 (seed renderMessages' inline extractor with m.reasoning so a separate payload merges) turned out to VIOLATE a deliberate architectural invariant pinned by test_issue2565 + test_sprint42: the reload content-extraction path must NOT touch m.reasoning/m.reasoning_content — reasoning metadata is owned exclusively by the Worklog Thinking Card path (_worklogReasoningTextFromMessage / _assistantReasoningPayloadText), never conflated with inline-content extraction (which would risk promoting provider reasoning into final-answer prose). Reverted the ui.js seed to the PR's original `thinkingText` arg. The inline+separate merge is still a genuine extractor capability (exercised by the live streaming path via liveReasoningText) and is covered by a unit test, just not invoked from the reload render path by design. The O(n) perf fix (finding #1) and the code-awareness + position-aware unclosed handling (rounds 1-3) are all retained. Co-authored-by: rodboev <rodboev@users.noreply.github.com> * fix(streaming): only lstrip extracted content when a leading block was removed Codex round-5 catch: the extractor unconditionally lstripped the final content (.lstrip() / .replace(/^\s+/,'')) even when NO thinking block was extracted, so an assistant reply that legitimately starts with an indented code block or blank lines lost its leading whitespace on live display, reload, and persistence. This was a real regression vs master (master returned non-thinking content unchanged). Now track leading_removed (set only when a LEADING thinking block/prefix is actually extracted) and lstrip only in that case. Mid-body / no-thinking content keeps its exact leading whitespace. Python + JS twins kept in parity; added backend regression tests (indented-first preserved, leading-blank preserved, leading-think still strips). Co-authored-by: rodboev <rodboev@users.noreply.github.com> * fix(streaming): reconnect restore prefers raw inflight accumulator Codex round-6 CORE catch: on reconnect, the single-live-message restore used (_liveInflightAssistant.content || ''). Because the PR now splits a leading unclosed <think> into empty content, restoring from the split content dropped the open tag — so a later </think> token leaked into the visible reply and corrupted the live accumulator. Restore from (_fullInflightAssistant || _liveInflightAssistant.content || '') so the raw open tag survives reconnect and the accumulator stays correct. Added a reconnect-restore regression test. Co-authored-by: rodboev <rodboev@users.noreply.github.com> * Release v0.51.335 — Release KY (normalize inline thinking extraction, nesquena#3633) Unify inline-thinking (<think>/<|channel>/<|turn|>) extraction across live, reload, and persisted turns (nesquena#3599/nesquena#3633, @rodboev). Deep-reviewed: Opus + 6 Codex rounds; maintainer fixes resolved every Codex finding — code-awareness (inline-backtick/indented/1-3-space fences keep literal tags visible), position-aware unclosed handling, O(n) line scanning (was O(n^2) on long content), conditional lstrip (preserve leading whitespace when no leading block removed), and a reconnect-restore CORE fix (raw accumulator preferred so an open <think> tag survives reconnect). Python + JS twins in parity. Full suite 8330, Opus SHIP-SAFE, Codex SAFE-TO-SHIP, ESLint/scope-undef/ruff clean. Co-authored-by: rodboev <rodboev@users.noreply.github.com> --------- Co-authored-by: Rod Boev <rod.boev@gmail.com> Co-authored-by: Hermes Agent <hermes-agent@nesquena-hermes.local> Co-authored-by: rodboev <rodboev@users.noreply.github.com>
Summary
!S.activeStreamIdguard fromappendThinking()so thinking dots appear immediately when the user sends a message (previously blocked until the SSE stream opened, causing a blank gap)getComposerPrimaryActionreturning'stop'whenS.busy) was already merged in PR Backport upstream stage-345 + fix Cmd+K, send button, thinking dots #1; this adds the thinking animation to complete the feedback loopRoot cause
Three screenshots showed the pre-stream gap:
Both the stop button and the thinking dots were gated on
S.activeStreamIdbeing set, which only happens after/api/chat/startresponds. This could be 1–3 seconds on slow models.Fix
appendThinking()now only requiresS.sessionto exist. The stale-event protection (preventing old stream reasoning events from polluting a switched session) is handled upstream in the SSEactiveSidcheck inmessages.js.Test plan
/api/chat/start)🤖 Generated with Claude Code