Skip to content

perf: skip lineage sidecar hydration for session tails#4137

Open
ai-ag2026 wants to merge 1 commit into
nesquena:masterfrom
ai-ag2026:perf/session-tail-state-db-fastpath
Open

perf: skip lineage sidecar hydration for session tails#4137
ai-ag2026 wants to merge 1 commit into
nesquena:masterfrom
ai-ag2026:perf/session-tail-state-db-fastpath

Conversation

@ai-ag2026

Copy link
Copy Markdown
Contributor

Thinking Path

Long compression-lineage sessions can make an ordinary /api/session?msg_limit=... load hydrate every parent sidecar before slicing the visible tail. On a real large local lineage session, that path spent ~20s in _webui_sidecar_lineage_messages_for_display(), while the current segment's state.db messages were available in milliseconds. This PR keeps the full-history path intact but avoids parent sidecar hydration for the initial tail-window load.

What Changed

  • Added a narrow _state_db_tail_fastpath_eligible(...) guard for /api/session tail-window requests.
  • For eligible WebUI sessions, the route uses metadata-only session data plus the current segment's state.db transcript and preserves the global message coordinate space via the sidecar metadata count.
  • Kept messaging sessions, full-history loads, msg_before older-history loads, active/pending sessions, truncation-watermark sessions, and ordinary non-lineage sessions with incomplete state.db coverage on the existing full reconciliation path.
  • Added regression tests proving the tail fastpath does not full-load sidecar transcripts and preserves global message_count / _messages_offset.
  • Added an Unreleased changelog note.

Why It Matters

Switching into a long compressed/continued conversation should not block the UI while the server parses and merges every historical parent sidecar just to return the newest visible tail. The explicit older-history/full-transcript paths remain available when the user asks for them.

Local real-corpus timing on a long lineage session:

  • Before/old behavior profile: /api/session?messages=1&msg_limit=80&expand_renderable=1 spent ~24.7s cumulatively in _webui_sidecar_lineage_messages_for_display().
  • After this patch, same direct route probe: median ~13.1ms for the initial tail (83 current-segment messages, message_count=4388, _messages_offset=4305).
  • Full-history load without msg_limit still uses the existing path and remains intentionally expensive (~20s on that corpus), because it returns the full merged history.

Contract Routing

Task type: runtime/session metadata performance fix.
Touched areas:

  • api/routes.py /api/session load path
  • session metadata / visible transcript coordinate space
    Relevant public docs:
  • AGENTS.md
  • CONTRIBUTING.md
  • docs/CONTRACTS.md
  • docs/rfcs/webui-run-state-consistency-contract.md
    Scope boundaries:
  • Does not change full-history, older-history msg_before, messaging/CLI, truncation, active/pending recovery, or sidecar persistence semantics.
    Evidence needed before claiming done:
  • Regression tests for fastpath and fallback behavior.
  • Local timing evidence on a large lineage session.

Verification

  • python3 -m pytest tests/test_session_tail_state_db_fastpath.py tests/test_session_tail_payload.py tests/test_session_message_window_renderable_tail.py tests/test_webui_state_db_reconciliation.py tests/test_session_lineage_full_transcript.py -q39 passed in 2.55s
  • python3 -m py_compile api/routes.py tests/test_session_tail_state_db_fastpath.py
  • git diff --check
  • Added-line static risk scan → static scan findings: 0
  • Direct route timing probe on a large local lineage session:
    • limited tail: median 13.1ms, msgs=83, count=4388, offset=4305
    • full history unchanged: median 20327.5ms, msgs=5914, offset=0

Risks / Follow-ups

  • The fastpath deliberately uses the current segment's state.db transcript for initial display and relies on metadata count for global offsets; if a very old lineage session lacks current-segment state.db rows, it falls back.
  • A deeper follow-up could add a true SQL window reader so even current-segment state.db loads do not materialize all current rows.
  • This is complementary to existing PRs around tool-heavy tail pagination and frontend virtualization; it avoids the backend lineage sidecar hydration before those layers matter.

Model Used

OpenAI GPT-5.5 via Hermes Agent WebUI, with terminal/file tooling and local pytest/timing probes.

@ai-ag2026 ai-ag2026 force-pushed the perf/session-tail-state-db-fastpath branch from ef58fdd to b8cf6de Compare June 13, 2026 19:38
@nesquena

Copy link
Copy Markdown
Owner

@ai-ag2026 Can you shoot me an email? I’d like to invite you to our dev chat discord as a contributor!

@greptile-apps

greptile-apps Bot commented Jun 13, 2026

Copy link
Copy Markdown

Greptile Summary

This PR adds a narrow fast-path for /api/session tail-window loads that skips full sidecar hydration when metadata proves the current segment's state.db transcript is complete, there is no compression-lineage parent, and the session carries no session-level tool calls. Messaging sessions, lineage children, truncation-watermark sessions, and active/pending sessions continue through the existing full-reconciliation path.

  • api/models.py persists tool_call_count in the sidecar JSON (via compact()) and parses it back as _metadata_tool_call_count; old sidecars without the field default to None, which conservatively disables the fastpath.
  • api/routes.py now loads metadata_only=True first, calls _state_db_tail_fastpath_eligible, and only falls back to the full get_session(metadata_only=False) load when the guard rejects the request; the global _messages_offset is adjusted to preserve the full coordinate space.
  • New tests assert that the fastpath never calls _webui_sidecar_lineage_messages_for_display and that message_count / _messages_offset are correct.

Confidence Score: 5/5

Safe to merge — the fastpath gate is deliberately conservative and falls back to full hydration for every legacy or ambiguous case.

The eligibility guard rejects any session that could produce an incorrect tail: lineage children, sessions with session-level tool calls (or unknown tool-call counts), truncation-watermark sessions, and active/pending sessions all fall through to the unmodified full-reconciliation path. Old sidecars without the new tool_call_count field default to None, which also forces the full path, so there is no silent regression on pre-existing data. The offset arithmetic is correct and verified by the new tests. The only finding is a dead _slice variable that has no runtime effect.

No files require special attention beyond the dead _slice variable in api/routes.py.

Important Files Changed

Filename Overview
api/routes.py Adds fastpath eligibility guard and rewires the session-load flow; one dead _slice variable introduced in the new offset-adjustment block.
api/models.py Adds tool_call_count metadata field; parsing is defensive and mirrors the existing message_count pattern; correctly defaults to None so old sidecars without the field fall back to the full-load path.
tests/test_session_tail_state_db_fastpath.py New test file; covers fastpath eligibility guard and asserts full-sidecar hydration is never called; offset/count assertions are consistent with the implementation.
tests/test_stale_stream_cleanup.py Updates source-position test to track the new metadata_only=True first load and adds a new assertion that the sidecar-hydration fallback also clears stale stream state.
CHANGELOG.md Adds Unreleased entry describing the fastpath; entry text is accurate to the implementation.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[GET /api/session?msg_limit=N] --> B[get_session metadata_only=True]
    B --> C{is_messaging_session?}
    C -- Yes --> D[get_session metadata_only=False\n+ get_cli_session_messages]
    C -- No --> E{load_messages?}
    E -- No --> F[metadata-only path\n_metadata_only_message_summary]
    E -- Yes --> G[get_state_db_session_messages]
    G --> H{_state_db_tail_fastpath_eligible?\nmsg_limit set, msg_before None,\nno parent, tool_call_count=0,\nstate.db >= sidecar_count}
    H -- Yes --> I[FASTPATH\n_all_msgs = state_db_messages\nadjust _messages_offset\nwith historical delta]
    H -- No --> J[get_session metadata_only=False\nFull sidecar hydration]
    J --> K[_limited_webui_messages_for_display\nor full lineage merge]
    I --> L[_message_window_for_display\nslice tail window]
    K --> L
    D --> L
    L --> M[Build payload\nmessage_count = state_db_tail_total_count\n_messages_offset adjusted]
Loading

Reviews (8): Last reviewed commit: "perf: skip lineage sidecar hydration for..." | Re-trigger Greptile

Comment thread api/routes.py Outdated
Comment thread api/routes.py Outdated
@ai-ag2026 ai-ag2026 force-pushed the perf/session-tail-state-db-fastpath branch from b8cf6de to 8fdee6c Compare June 13, 2026 19:46
@ai-ag2026 ai-ag2026 force-pushed the perf/session-tail-state-db-fastpath branch 3 times, most recently from f38e68a to 191b5f3 Compare June 13, 2026 21:33
@nesquena-hermes

Copy link
Copy Markdown
Collaborator

Re-review — fast path is unsound for lineage children (changes requested)

Thanks @ai-ag2026 — the no-parent fast path is sound and a real win: len(state_db_messages) >= sidecar_count correctly proves the state.db segment is the full suffix, so the tail + count + offset all match full hydration. But both deep gates (Codex + Opus) independently reproduced that the lineage-child branch (return True for any parent_session_id) is unsound — and it's exactly the #4070 regression class:

CORE 1 — lineage child drops parent rows + reports wrong global count

With an isolated parent + child continuation, the fast path returned child-only rows with message_count=2, while full hydration returned parent+child with message_count=4. The offset math mixes stores (total − len(state_db_messages) across sidecar-count vs state.db-count), which only holds when the segment is a true suffix of the lineage-wide count — false for partial/continuation children.

CORE 2 — "Load earlier" never reaches parent history for fast-pathed lineage children

The UI's normal Load-earlier sends a larger msg_limit without msg_before, and that request stays fast-path eligible (routes.py:3588) → it returns the same child-only rows again (offset=2, truncated=true). Only an explicit msg_before=2 loads parent rows, which the frontend doesn't send first. So a lineage child can't load its parent history via the normal control at all.

SILENT — legacy session-level tool cards vanish

Because the fast-path s is metadata-only, s.tool_calls is [], so _tool_calls_for_message_window() returns no legacy JSON tool calls. A state.db-complete non-lineage session whose tool cards live in the session-level list loses them on cold load (reproduced: FAST tool_calls: [] vs FULL [{'name':'terminal',...}]).

Fix (cheapest closes the holes)

  1. In the lineage branch (~routes.py:3600-3606), don't blanket return True. The release-safe fix is return False for parent_session_id sessions (keep only the sound no-parent fast path). If you want lineage children to fast-path, you must first prove a verifiable cumulative signal (the child sidecar's persisted count reflects the full lineage) and exclude forks/child_session.
  2. For the legacy tool-cards gap: either preserve/hydrate session-level tool_calls for the returned window, or make the fast path ineligible when legacy JSON tool calls may be needed.
  3. Add regression tests that compare the full fast-path payload (messages, message_count, _messages_offset, _messages_truncated) against full-hydration for: (a) a cumulative child, (b) a partial child where the parent holds rows the child lacks, (c) a child whose state.db segment count differs from the sidecar's, and (d) a legacy session with JSON-only tool calls. The current test mocks the comparison path away, so it can't catch these.

The perf goal is great and the no-parent path is ready — it's specifically the lineage-child return True (and the metadata-only tool-calls gap) that need closing. Heads-up: this is the same session-tail/lineage area as your #4070 (also bounced for a lineage-merge bypass) — worth reconciling both against the full-hydration payload. Ping me and I'll re-gate (Codex + Opus + full suite).

@nesquena-hermes nesquena-hermes added the changes-requested Maintainer left detailed feedback requesting changes; PR is waiting on author to address label Jun 13, 2026
@ai-ag2026 ai-ag2026 force-pushed the perf/session-tail-state-db-fastpath branch from 191b5f3 to f18dc71 Compare June 14, 2026 15:49
@nesquena-hermes

Copy link
Copy Markdown
Collaborator

Thanks for reworking the eligibility gate — the narrow _state_db_tail_fastpath_eligible() (no msg_before, no truncation watermark, no pending/active stream, tool_call_count == 0, no parent_session_id) correctly closes the original lineage-child return True unsoundness. But a deep re-review (rebased onto current master, full Codex regression gate) surfaced a separate CORE display-correctness issue that still blocks merge:

The fast path serves raw state.db rows as the display transcript, skipping the sidecar-aware merge/dedupe.

In handle_get (api/routes.py ~6222) the fast path sets _all_msgs = state_db_messages and bypasses the normal merge_session_messages_append_only + _merged_webui_lineage_messages_for_display reconciliation. The gate (api/routes.py ~3735) only proves len(state_db_messages) >= sidecar_count — a count check, not a display-equivalence check. Even for a no-parent, no-legacy-tool session, the raw state.db replay/projection rows are not guaranteed to be byte-equivalent to the clean WebUI sidecar tail the normal path produces (projection rows, formatting, and dedupe can differ). Result: /api/session?msg_limit=N can return a tail that differs from what the full-load path would render for the same session — a silent wrong-transcript, which the green suite doesn't catch because no test asserts fast-path-vs-full-path display equivalence.

To make this safe you'd need to prove the state.db rows are display-equivalent to the sidecar, not merely as numerous — e.g. a persisted equivalence fingerprint/marker written when the sidecar and state.db are known in sync, checked here before taking the fast path; or run the same merge/projection over the state.db rows before slicing (which reclaims much of the cost the fast path is trying to avoid, so the fingerprint route is likely the real design). A behavioral test that asserts the fast-path window == the full-load window for the same session would lock it in.

This is genuinely good direction (sidecar hydration on tail loads IS a real cost) — but the equivalence proof is the crux and needs a design pass, so I'm returning it rather than inline-patching. Note this overlaps #4070 and #4138, which touch the same _message_window_for_display / session-tail display path; it may be worth converging the three on one shared, fingerprint-backed approach.

@nesquena-hermes

Copy link
Copy Markdown
Collaborator

Re-review — all three blocking concerns are now closed; re-gating green

Thanks @ai-ag2026. The rewrite addresses every issue from the last round head-on, and the fast path is now narrow in exactly the right places. Reading the new _state_db_tail_fastpath_eligible() (api/routes.py:3705-3735) on the PR head against my prior findings:

CORE 1 + CORE 2 (lineage children) — fixed

The blanket return True for any parent_session_id is gone. The gate now bails outright for lineage children:

if str(getattr(session, "parent_session_id", "") or "").strip():
    return False

So both the wrong-global-count case and the "Load earlier never reaches parent history" case can't trigger — lineage children stay on the full-hydration / _merged_webui_lineage_messages_for_display path. test_session_tail_fastpath_falls_back_for_lineage_children pins this.

SILENT (legacy session-level tool cards) — fixed

This is the part I want to call out as well-designed. to_dict() now persists the count (api/models.py:768):

meta['tool_call_count'] = len(self.tool_calls or [])

and the gate refuses the fast path unless metadata proves there are zero legacy tool calls (routes.py:3727-3730):

tool_call_count = int(getattr(session, "_metadata_tool_call_count", -1))
...
if tool_call_count != 0:
    return False

The default-to--1 on a missing attr is the right call: old sidecars written before this field existed parse to None_metadata_tool_call_count stays Noneint(... -1) path → not eligible → safe full load. New writes carry the real count. test_session_tail_fastpath_requires_known_empty_legacy_tool_calls covers both the None and 1 cases.

Offset / count math

The metadata-vs-state.db count mixing I flagged is now bounded to the no-parent, tool-call-free case where len(state_db_messages) >= sidecar_count holds, and state_db_tail_total_count = max(_metadata_count, len(state_db_messages)) (routes.py:6179-6184) feeds both _merged_message_count and the _messages_truncated recompute consistently. test_session_tail_load_uses_state_db_without_full_sidecar_hydration asserts the full payload shape (messages == state_messages[-2:], message_count == 4, _messages_offset == 2, _messages_truncated is True) against a get_session(metadata_only=False) that raises if hit — so the win is verified behaviorally, not just structurally.

One small thing worth a glance (non-blocking)

test_stale_stream_cleanup.py now pins two ordering invariants: the metadata-only load clears stale stream state, and the fallback get_session(sid, metadata_only=False) re-clears it (test_session_full_load_reclears_stale_stream_after_sidecar_hydration). That matches the code — both the messaging branch and the non-eligible fallback re-run _clear_stale_stream_state(s) after the full load (routes.py:6160-6189). Good that the fallback path re-clears; a metadata-only s whose active_stream_id was stale would otherwise have been re-read fresh by the full load.

Re-gating now (Codex + Opus + full suite). No blockers from me on this revision — the perf goal is intact and the unsound branches are all excluded. Note this still pairs with #4138 on the merge-loop win, so worth a combined large-history benchmark before/after.

@ai-ag2026 ai-ag2026 force-pushed the perf/session-tail-state-db-fastpath branch from 3dbbe8e to f4e07f1 Compare June 15, 2026 13:05
@nesquena-hermes

Copy link
Copy Markdown
Collaborator

Re-gate after rebase — two blocking items before this can ship

Thanks @ai-ag2026 — picking this back up. I rebased the PR onto current master (it was 65 behind; resolved a handle_get conflict where master added the _session_visible_to_active_profile 404 guard right where this PR switches to metadata_only=True) and ran the full ship gate (Codex + Opus + suite). The core fast-path design is in good shape and the suite is green, but the two deep gates surfaced two items that block merge — one security, one display-correctness.

BLOCKER 1 (security) — re-validate the profile boundary after the full re-fetch

The fast path correctly inherits the active-profile visibility check from the initial metadata_only=True load. But on the ineligible branch (and the messaging branch) the code does a second get_session(sid, metadata_only=False) and recomputes _session_profile from that fresh load — without re-running the visibility guard. The metadata-only check doesn't carry over to a separate full load, so the boundary must be re-asserted.

Fix (mechanical — mirror the existing 404 shape) after each metadata_only=False re-fetch in handle_get:

s = get_session(sid, metadata_only=False)
original_stream_id = getattr(s, "active_stream_id", None)
_clear_stale_stream_state(s)
_session_profile = getattr(s, 'profile', None) or None
if not _session_visible_to_active_profile(_session_profile, handler):
    return bad(handler, "Session not found", 404)

(Applies to both the messaging branch ~routes.py:7045 and the fast-path-ineligible branch ~routes.py:7066.)

BLOCKER 2 (display correctness) — prove zero attachment / display-only sidecar fields before taking the fast path

_state_db_tail_fastpath_eligible() gates on tool_call_count == 0 (good — compact() now persists that), but it does not account for message-level attachment / display-only sidecar fields. Those live only in the sidecar JSON, not in state.db. When the fast path serves raw state.db rows it would silently drop a tail message's attachment / display-only content that full hydration would include.

Fix: extend the same metadata-persist-and-prove-zero pattern you used for tool_call_count:

  1. In api/models.py compact() (~768) persist a count (or boolean) of message-level attachment / display-only sidecar fields, and parse it back (~907) as _metadata_* like _metadata_tool_call_count.
  2. In _state_db_tail_fastpath_eligible() require that metadata to be present and zero — if it's absent (older sidecar) or nonzero, fall back to full hydration (fail-safe).

Non-blocking (worth doing while you're in here)

  • Dead code routes.py:~7150: _slice is computed in both branches but never read — remove it.
  • Test matrix gap in tests/test_session_tail_state_db_fastpath.py: the lineage / incomplete-sidecar / tool-call bails are covered, but the truncation_watermark, pending/active_stream, and msg_before bails are not — three cheap asserts would close the matrix.

Why this is close

The no-parent fast path itself is sound (Opus re-confirmed: with tool_call_count==0 + no lineage parent + len(state_db) >= sidecar_count, the fast-path tail/count/offset provably equal full hydration), and the lineage-child unsoundness from the earlier rounds stays fixed. Land Blockers 1 + 2 and I'll re-gate (Codex + Opus + full suite) and ship. Keeping the changes-requested label until then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changes-requested Maintainer left detailed feedback requesting changes; PR is waiting on author to address

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants