Skip to content

Commit 066f727

Browse files
nesquena-hermesrodboevHermes Agent
authored
Release v0.51.335 — Release KY (normalize inline thinking extraction, nesquena#3633) (nesquena#3853)
* fix(streaming): normalize inline thinking extraction across live and persisted turns (nesquena#3599) # Conflicts: # api/streaming.py # static/messages.js # static/ui.js * fix(streaming): code-aware inline-thinking extraction + position-aware unclosed handling Codex deep-review caught two regressions in the leading-only -> full-scan rewrite (both silent data-mangling on the persist/reload path): 1. Code-span unawareness: the scanner only protected triple fences, so a literal <think> in an inline single-backtick code span or an indented (>=4-space/tab) code block got silently extracted into reasoning. Added _inline_thinking_indented_code_at + inline-backtick tracking (Python + the JS twin _thinkingIndentedCodeAt), so all three code contexts now keep thinking tags visible. 2. Unclosed-tag truncation: any unmatched open tag moved the trailing prose into reasoning. Now position-aware — a LEADING unclosed block (cut off mid-thought) is still reasoning (nesquena#3455 intent), but an unclosed tag AFTER visible content stays visible so literal typed tags don't truncate prose. Gated partial handling on the previously-unused options.streaming param (live streaming keeps 'still thinking' behavior; persist/reload does not). Updated 2 tests that pinned the buggy behavior + added 4 regression tests (inline-backtick, indented-code, mid-body-unclosed-visible, leading-unclosed- extracted). Updated the node driver harness to include the new helper. Co-authored-by: rodboev <rodboev@users.noreply.github.com> * fix(streaming): recognize fenced code blocks indented 1-3 spaces Codex round-3: a fence indented 1-3 spaces is valid Markdown but the fence detector only matched at column 0, so a literal think tag inside such a fence (not 4+-space indented code either) was still extracted. Both detectors (_inline_thinking_fence_marker_at / _thinkingFenceMarkerAt) now walk back over up to 3 leading spaces to a line start. Added backtick + tilde indented-fence regression tests. Co-authored-by: rodboev <rodboev@users.noreply.github.com> * fix(streaming): O(n) inline-thinking scan + merge separate reasoning on reload Round-4 Codex deep-review caught two real issues in my own fixes: 1. PERF (O(n^2)): the indented-code check (_inline_thinking_indented_code_at / _thinkingIndentedCodeAt) scanned to line boundaries at EVERY character index, plus the leading check sliced+stripped the whole prefix per unclosed tag. On long no-newline content this was quadratic (~8.4s @ 200k, called repeatedly on the streaming path). Replaced with incremental O(1)-per-iteration line state (_line_is_indented_code / _lineIsIndentedCode evaluated only at line starts) + a seen_nonspace flag. 200k now extracts in ~55-140ms. 2. RELOAD reasoning-drop: renderMessages() seeded the shared extractor with '' so a message with BOTH an inline <think> block AND a separate m.reasoning payload showed only the inline part — the separate payload was dropped because the !thinkingText worklog resolution was then skipped. Now seeds with the message's direct reasoning (m.reasoning_content||m.reasoning||...) so the two MERGE (deduped); separate-only reasoning is preserved without promoting it into visible prose. Python + JS twins kept line-for-line parity. Added merge + perf + reload regression tests; updated the reload structure test and the node driver harness for the renamed helper. Co-authored-by: rodboev <rodboev@users.noreply.github.com> * fix(streaming): revert reload reasoning-seed; keep O(n) perf fix Codex round-4 finding #2 (seed renderMessages' inline extractor with m.reasoning so a separate payload merges) turned out to VIOLATE a deliberate architectural invariant pinned by test_issue2565 + test_sprint42: the reload content-extraction path must NOT touch m.reasoning/m.reasoning_content — reasoning metadata is owned exclusively by the Worklog Thinking Card path (_worklogReasoningTextFromMessage / _assistantReasoningPayloadText), never conflated with inline-content extraction (which would risk promoting provider reasoning into final-answer prose). Reverted the ui.js seed to the PR's original `thinkingText` arg. The inline+separate merge is still a genuine extractor capability (exercised by the live streaming path via liveReasoningText) and is covered by a unit test, just not invoked from the reload render path by design. The O(n) perf fix (finding #1) and the code-awareness + position-aware unclosed handling (rounds 1-3) are all retained. Co-authored-by: rodboev <rodboev@users.noreply.github.com> * fix(streaming): only lstrip extracted content when a leading block was removed Codex round-5 catch: the extractor unconditionally lstripped the final content (.lstrip() / .replace(/^\s+/,'')) even when NO thinking block was extracted, so an assistant reply that legitimately starts with an indented code block or blank lines lost its leading whitespace on live display, reload, and persistence. This was a real regression vs master (master returned non-thinking content unchanged). Now track leading_removed (set only when a LEADING thinking block/prefix is actually extracted) and lstrip only in that case. Mid-body / no-thinking content keeps its exact leading whitespace. Python + JS twins kept in parity; added backend regression tests (indented-first preserved, leading-blank preserved, leading-think still strips). Co-authored-by: rodboev <rodboev@users.noreply.github.com> * fix(streaming): reconnect restore prefers raw inflight accumulator Codex round-6 CORE catch: on reconnect, the single-live-message restore used (_liveInflightAssistant.content || ''). Because the PR now splits a leading unclosed <think> into empty content, restoring from the split content dropped the open tag — so a later </think> token leaked into the visible reply and corrupted the live accumulator. Restore from (_fullInflightAssistant || _liveInflightAssistant.content || '') so the raw open tag survives reconnect and the accumulator stays correct. Added a reconnect-restore regression test. Co-authored-by: rodboev <rodboev@users.noreply.github.com> * Release v0.51.335 — Release KY (normalize inline thinking extraction, nesquena#3633) Unify inline-thinking (<think>/<|channel>/<|turn|>) extraction across live, reload, and persisted turns (nesquena#3599/nesquena#3633, @rodboev). Deep-reviewed: Opus + 6 Codex rounds; maintainer fixes resolved every Codex finding — code-awareness (inline-backtick/indented/1-3-space fences keep literal tags visible), position-aware unclosed handling, O(n) line scanning (was O(n^2) on long content), conditional lstrip (preserve leading whitespace when no leading block removed), and a reconnect-restore CORE fix (raw accumulator preferred so an open <think> tag survives reconnect). Python + JS twins in parity. Full suite 8330, Opus SHIP-SAFE, Codex SAFE-TO-SHIP, ESLint/scope-undef/ruff clean. Co-authored-by: rodboev <rodboev@users.noreply.github.com> --------- Co-authored-by: Rod Boev <rod.boev@gmail.com> Co-authored-by: Hermes Agent <hermes-agent@nesquena-hermes.local> Co-authored-by: rodboev <rodboev@users.noreply.github.com>
1 parent a71dbcd commit 066f727

9 files changed

Lines changed: 689 additions & 205 deletions

CHANGELOG.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,11 @@
33

44
## [Unreleased]
55

6+
## [v0.51.335] — 2026-06-08 — Release KY (normalize inline thinking extraction)
7+
8+
### Fixed
9+
- **Inline reasoning traces are extracted consistently across live, reload, and persisted turns.** Inline-thinking providers (MiniMax-M3, Gemma, OpenAI-compat, Ollama Cloud) that emit `<think>…</think>` (or `<|channel>`/`<|turn|>` variants) anywhere in the response now have those traces moved into the Thinking Card uniformly — live, on reload, and in the saved session file — instead of leaving them in the visible answer or bloating the persisted content. Literal thinking tags inside code (inline `` `<think>` ``, fenced blocks, or indented code) stay visible, leading whitespace is preserved when no thinking block is removed, and an unclosed tag only collapses into reasoning when it leads the message. (#3599, #3633, @rodboev)
10+
611
## [v0.51.334] — 2026-06-08 — Release KX (new-message cue when scrolled up)
712

813
### Added

api/streaming.py

Lines changed: 185 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -1489,47 +1489,194 @@ def _build_native_multimodal_message(workspace_ctx: str, msg_text: str, attachme
14891489
return parts if image_count else workspace_ctx + msg_text
14901490

14911491

1492-
def _split_thinking_from_content(raw_content, existing_reasoning=''):
1493-
"""Split a single LEADING <think> block out of assistant content.
1494-
1495-
Server-side twin of the JS ``_splitThinkFromContent`` (static/messages.js).
1496-
Inline-thinking providers (e.g. MiniMax-M3, OpenAI-compat) leave the thinking
1497-
trace inside the saved ``m['content']``, bloating session files 30-50% and
1498-
bypassing the ``m['reasoning']`` field the thinking card reads on reload
1499-
(#3455). This extracts exactly ONE leading block (after lstrip) — matching the
1500-
live renderer's _streamDisplay/_parseStreamState semantics — so a closed
1501-
``<think>...</think>`` that appears MID-BODY (e.g. a literal tag in a fenced
1502-
code block) stays visible content and is never moved into reasoning, and a
1503-
partial/unclosed block is left intact.
1504-
1505-
Returns ``(content, reasoning)``. ``reasoning`` merges ``existing_reasoning``
1506-
(e.g. from a separate on_reasoning stream) with the extracted block.
1492+
_INLINE_THINKING_TAG_PAIRS = (
1493+
('<think>', '</think>'),
1494+
('<|channel>thought\n', '<channel|>'),
1495+
('<|turn|>thinking\n', '<turn|>'),
1496+
)
1497+
1498+
1499+
def _inline_thinking_fence_marker_at(text, index):
1500+
# A fenced code block opener may be indented up to 3 spaces in Markdown
1501+
# (4+ spaces is an indented code block, handled separately). The marker is
1502+
# only a fence when it sits at the start of a line (after optional 1-3
1503+
# spaces of indentation).
1504+
if index > 0 and text[index - 1] != '\n':
1505+
# Allow up to 3 leading spaces: walk back over spaces to a line start.
1506+
back = index - 1
1507+
spaces = 0
1508+
while back >= 0 and text[back] == ' ' and spaces < 3:
1509+
back -= 1
1510+
spaces += 1
1511+
if not (back < 0 or text[back] == '\n'):
1512+
return ''
1513+
if text.startswith('```', index):
1514+
return '```'
1515+
if text.startswith('~~~', index):
1516+
return '~~~'
1517+
return ''
1518+
1519+
1520+
def _line_is_indented_code(text, line_start):
1521+
"""True when the line beginning at `line_start` is a markdown indented code
1522+
block line (>=4 leading spaces or a leading tab, and not blank). `line_start`
1523+
must be the index of the first character of the line. O(1)-ish: only inspects
1524+
the line's leading characters, not the whole document (the per-character
1525+
variant was O(n^2) on long no-newline content — #3633 Codex perf catch)."""
1526+
if line_start >= len(text):
1527+
return False
1528+
if text[line_start] == '\t':
1529+
# A leading tab is indented code only if the line isn't otherwise blank.
1530+
nl = text.find('\n', line_start)
1531+
seg = text[line_start:(nl if nl != -1 else len(text))]
1532+
return bool(seg.strip())
1533+
if text.startswith(' ', line_start):
1534+
nl = text.find('\n', line_start)
1535+
seg = text[line_start:(nl if nl != -1 else len(text))]
1536+
return bool(seg.strip())
1537+
return False
1538+
1539+
1540+
def _merge_inline_thinking_reasoning(existing_reasoning, extracted_parts):
1541+
out = str(existing_reasoning or '').strip()
1542+
for part in extracted_parts or ():
1543+
item = str(part or '').strip()
1544+
if not item:
1545+
continue
1546+
if not out:
1547+
out = item
1548+
continue
1549+
if out == item or any(existing.strip() == item for existing in out.split('\n\n')):
1550+
continue
1551+
out = out + '\n\n' + item
1552+
return out
1553+
1554+
1555+
def _extract_inline_thinking_from_content(raw_content, existing_reasoning='', *, streaming=False):
1556+
"""Split inline thinking blocks out of assistant content.
1557+
1558+
Code-aware: thinking tags inside a triple-fence (``` / ~~~), an inline
1559+
single-backtick code span, or an indented (>=4-space / tab) code block are
1560+
LEFT VISIBLE — they are literal text a user typed/pasted, not a real thinking
1561+
trace. (#3633 deep-review / Codex catch: the earlier full-scan version only
1562+
protected triple fences, so a literal `<think>` in an inline code span got
1563+
silently extracted.)
1564+
1565+
``streaming`` gates partial/unclosed-block handling: during live streaming an
1566+
unmatched open tag means "still thinking" and its tail is shown as reasoning;
1567+
on the persist/reload path (streaming=False) an unclosed tag is LEFT VISIBLE
1568+
so prose after a literal ``<think>`` is never silently truncated on save.
15071569
"""
15081570
text = '' if raw_content is None else str(raw_content)
15091571
if not text:
1510-
return text, (existing_reasoning or '')
1511-
# Leading-only, single block — same three tag pairs as the JS helper.
1512-
_pairs = (
1513-
('<think>', '</think>'),
1514-
('<|channel>thought\n', '<channel|>'),
1515-
('<|turn|>thinking\n', '<turn|>'),
1572+
return text, str(existing_reasoning or '').strip()
1573+
visible = []
1574+
extracted = []
1575+
cursor = 0
1576+
index = 0
1577+
fence = ''
1578+
in_backtick = False
1579+
length = len(text)
1580+
# Incremental, O(1)-per-iteration line state (the previous per-character line
1581+
# scan made the whole pass O(n^2) on long no-newline content — #3633 Codex
1582+
# perf catch). `line_is_indented_code` is recomputed only at a line start.
1583+
line_is_indented_code = _line_is_indented_code(text, 0)
1584+
# Whether any non-whitespace char appeared in text[:index] — the cheap
1585+
# equivalent of the old `text[:index].strip() != ''` leading check.
1586+
seen_nonspace = False
1587+
# Whether a LEADING thinking block/prefix was removed — only then do we
1588+
# lstrip the final content (so a reply that legitimately starts with
1589+
# indented code / whitespace and has NO leading thinking wrapper keeps its
1590+
# leading whitespace — #3633 Codex catch).
1591+
leading_removed = False
1592+
while index < length:
1593+
ch = text[index]
1594+
if index > 0 and text[index - 1] == '\n':
1595+
line_is_indented_code = _line_is_indented_code(text, index)
1596+
marker = _inline_thinking_fence_marker_at(text, index)
1597+
if marker:
1598+
fence = '' if fence == marker else (fence or marker)
1599+
# Inline single-backtick code span toggles on each lone backtick that is
1600+
# not part of a triple fence. Only tracked outside a triple fence.
1601+
if not fence and not marker and ch == '`':
1602+
in_backtick = not in_backtick
1603+
in_code = bool(fence) or in_backtick or line_is_indented_code
1604+
if not in_code:
1605+
pair = None
1606+
for open_tag, close_tag in _INLINE_THINKING_TAG_PAIRS:
1607+
if text.startswith(open_tag, index):
1608+
pair = (open_tag, close_tag)
1609+
break
1610+
if pair:
1611+
open_tag, close_tag = pair
1612+
close_index = text.find(close_tag, index + len(open_tag))
1613+
if close_index == -1:
1614+
# Unclosed open tag. A LEADING unclosed block (nothing
1615+
# visible before it) is a genuine thinking trace that got
1616+
# cut off / persisted mid-thought → reasoning (master #3455
1617+
# leading-only intent, and the live-stream "still thinking"
1618+
# case). An unclosed tag AFTER visible content on the persist
1619+
# path is almost always a literal typed tag — leave it (and
1620+
# the prose after it) visible so nothing is silently
1621+
# truncated (#3633 Codex catch). During live streaming any
1622+
# unmatched open tag is treated as in-progress thinking.
1623+
leading = not seen_nonspace
1624+
if not streaming and not leading:
1625+
break
1626+
if leading:
1627+
leading_removed = True
1628+
visible.append(text[cursor:index])
1629+
partial = text[index + len(open_tag):]
1630+
if partial:
1631+
extracted.append(partial)
1632+
cursor = length
1633+
index = length
1634+
break
1635+
visible.append(text[cursor:index])
1636+
extracted.append(text[index + len(open_tag):close_index])
1637+
if not seen_nonspace:
1638+
leading_removed = True
1639+
seen_nonspace = True # the extracted tag span is non-whitespace
1640+
index = close_index + len(close_tag)
1641+
cursor = index
1642+
continue
1643+
if streaming:
1644+
matched_partial = False
1645+
for open_tag, _close_tag in _INLINE_THINKING_TAG_PAIRS:
1646+
rest = text[index:]
1647+
if len(rest) < len(open_tag) and open_tag.startswith(rest):
1648+
if not seen_nonspace:
1649+
leading_removed = True
1650+
visible.append(text[cursor:index])
1651+
cursor = length
1652+
index = length
1653+
matched_partial = True
1654+
break
1655+
if matched_partial or index >= length:
1656+
break
1657+
if not ch.isspace():
1658+
seen_nonspace = True
1659+
index += 1
1660+
if cursor < length:
1661+
visible.append(text[cursor:])
1662+
content = ''.join(visible)
1663+
if leading_removed:
1664+
content = content.lstrip()
1665+
reasoning = _merge_inline_thinking_reasoning(existing_reasoning, extracted)
1666+
return content, reasoning
1667+
1668+
1669+
def _split_thinking_from_content(raw_content, existing_reasoning=''):
1670+
"""Split inline thinking blocks out of assistant content for persistence.
1671+
1672+
Persistence path: streaming=False, so an unclosed tag stays visible content
1673+
(a partial block only means "still thinking" during a live stream).
1674+
"""
1675+
return _extract_inline_thinking_from_content(
1676+
raw_content,
1677+
existing_reasoning=existing_reasoning,
1678+
streaming=False,
15161679
)
1517-
trimmed = text.lstrip()
1518-
extracted = ''
1519-
remaining = text
1520-
for open_tag, close_tag in _pairs:
1521-
if not trimmed.startswith(open_tag):
1522-
continue
1523-
ci = trimmed.find(close_tag, len(open_tag))
1524-
if ci == -1:
1525-
break # partial open — leave intact
1526-
extracted = trimmed[len(open_tag):ci]
1527-
remaining = trimmed[ci + len(close_tag):].lstrip()
1528-
break
1529-
if not extracted:
1530-
return raw_content, (existing_reasoning or '')
1531-
final_reasoning = (existing_reasoning + '\n\n' + extracted) if existing_reasoning else extracted
1532-
return remaining, final_reasoning
15331680

15341681

15351682
def _strip_thinking_markup(text: str) -> str:
@@ -6570,14 +6717,11 @@ def _periodic_checkpoint():
65706717
# memory until the next turn's save, and the last-turn thinking card
65716718
# is lost when the user reloads immediately after a response.
65726719
#
6573-
# #3455: also split any inline leading <think> block out of the saved
6720+
# #3455/#3599: split inline thinking blocks out of the saved
65746721
# assistant content into m['reasoning'] (server-side twin of the JS
65756722
# _splitThinkFromContent). Inline-thinking providers (e.g. MiniMax-M3)
65766723
# otherwise leave the thinking trace in m['content'], bloating the
65776724
# persisted session file 30-50% and bypassing the thinking card. The
6578-
# split is leading-only/single-block so mid-body literal tags (e.g. in
6579-
# a fenced code block) stay visible content.
6580-
#
65816725
# #3587: use per-message segments so intermediate assistant turns
65826726
# (before tool calls) each receive their own reasoning trace rather
65836727
# than all reasoning being written only to the last assistant message.

0 commit comments

Comments
 (0)