Skip to content

perf: reuse prefix duplicate match during message merge#4138

Open
ai-ag2026 wants to merge 1 commit into
nesquena:masterfrom
ai-ag2026:perf/merge-visible-duplicate-fastpath
Open

perf: reuse prefix duplicate match during message merge#4138
ai-ag2026 wants to merge 1 commit into
nesquena:masterfrom
ai-ag2026:perf/merge-visible-duplicate-fastpath

Conversation

@ai-ag2026

Copy link
Copy Markdown
Contributor

Thinking Path

merge_session_messages_append_only() already checks whether a state.db row replays the next expected sidecar-visible message. When that ordered prefix check matched fuzzily, the code immediately performed a second duplicate lookup against the entire sidecar visible-key set just to update skipped counts. On long replay prefixes this becomes an O(n²) hot path.

What Changed

  • Reuse the visible key found by the ordered prefix check instead of rescanning all sidecar visible keys for the same state row.
  • Preserve the existing global duplicate lookup for non-prefix rows.
  • Add a regression test that makes fuzzy ordered-prefix replay rows fail if they invoke broad sidecar duplicate lookup.
  • Add an Unreleased changelog note.

Why It Matters

Long sidecar/state.db reconciliation can happen during full session loads, recovery, and fallback paths. Avoiding the redundant global lookup keeps the merge linear for ordered replay prefixes while leaving the existing dedupe/truncation/tool-call semantics intact.

Local synthetic timing probe:

  • Before: 3,000 sidecar rows + 3,000 fuzzy ordered state replay rows → ~4,595ms, ~22.9M function calls.
  • After: same probe → ~106ms, ~321k function calls.

Contract Routing

Task type: runtime/session reconciliation performance fix.
Touched areas:

  • api/models.py merge_session_messages_append_only()
  • session sidecar/state.db reconciliation
    Relevant public docs:
  • AGENTS.md
  • CONTRIBUTING.md
  • docs/CONTRACTS.md
    Scope boundaries:
  • Does not change full transcript inputs, truncation watermark rules, message-id dedupe, tool-call keying, or non-prefix duplicate detection.
  • Only avoids repeating an already-known prefix duplicate lookup.

Verification

  • python3 -m pytest tests/test_merge_prefix_duplicate_fastpath.py tests/test_merge_key_tool_calls.py tests/test_webui_state_db_reconciliation.py tests/test_session_lineage_full_transcript.py -q38 passed in 2.53s
  • python3 -m py_compile api/models.py tests/test_merge_prefix_duplicate_fastpath.py
  • git diff --check
  • Added-line static risk scan → static scan findings: 0
  • Synthetic timing probe: 4594.6ms before vs 105.7ms after for 3k/3k fuzzy ordered replay rows.

Risks / Follow-ups

Model Used

OpenAI GPT-5.5 via Hermes Agent WebUI, with terminal/file tooling and local pytest/timing probes.

@greptile-apps

greptile-apps Bot commented Jun 13, 2026

Copy link
Copy Markdown

Greptile Summary

This PR eliminates an O(n²) hot path in merge_session_messages_append_only() by reusing the visible key already found during the ordered prefix check, rather than rescanning the entire sidecar visible-key set for each prefix-matched state row. A prefix_visible_lookup_cache is also introduced to avoid rebuilding single-element duplicate lookups when the same expected key recurs.

  • api/models.py: Splits the old combined exact+fuzzy prefix guard into two explicit branches; the fuzzy branch builds and caches a single-key lookup, captures matched_prefix_visible_key from the result, and skips the full-sidecar rescan entirely.
  • tests/test_merge_prefix_duplicate_fastpath.py: Two new tests — one monkeypatch-based check confirming no broad sidecar lookup fires on prefix rows, and one exercising the duplicate-budget counter.
  • CHANGELOG.md: Adds an Unreleased entry describing the O(n²) → linear improvement.

Confidence Score: 5/5

Safe to merge — the change only affects how the prefix-replay skip-count is attributed and avoids a redundant full-sidecar rescan; all non-prefix dedup logic is untouched.

The optimization is logically correct: the ordered prefix check already pins the only sidecar position the state row can match, so reusing expected_visible_key is both faster and more accurate than letting the full-sidecar fuzzy scan pick an arbitrary matching key. The old code had a latent mis-attribution bug that the new code inadvertently fixes. The cache is function-local and bounded.

No files require special attention. The test for duplicate-budget attribution is slightly under-specified, but this does not affect correctness of the production path.

Important Files Changed

Filename Overview
api/models.py Core optimization: splits exact/fuzzy prefix detection into two branches, adds a per-call prefix_visible_lookup_cache, and correctly attributes skipped_state_visible_counts to expected_visible_key. Non-prefix dedup path is unchanged.
tests/test_merge_prefix_duplicate_fastpath.py Two new tests covering the no-broad-scan guarantee and duplicate-budget attribution for fuzzy prefix replay rows.
CHANGELOG.md Adds a clear Unreleased entry under Fixed describing the O(n²) to linear improvement.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[state row: visible_key] --> B{state_replay_idx < len sidecar_visible_sequence?}
    B -- No --> G[Non-prefix dedup path: full sidecar scan]
    B -- Yes --> C{visible_key == expected_visible_key?}
    C -- Exact match --> D[replays_sidecar_prefix = True, matched_prefix_visible_key = expected_visible_key, state_replay_idx++]
    C -- Fuzzy check --> E{prefix_visible_lookup_cache hit?}
    E -- Miss --> F[_build_visible_duplicate_lookup, cache result]
    E -- Hit --> H[_matching_visible_duplicate against single-key set]
    F --> H
    H -- matched --> I[replays_sidecar_prefix = True, state_replay_idx++]
    H -- no match --> G
    D --> J[skipped_state_visible_counts expected_visible_key += 1, continue]
    I --> J
    G --> K{matched in full sidecar set?}
    K -- Yes, budget left --> L[skipped_state_visible_counts += 1, continue]
    K -- Exhausted or no match --> M[Append to merged_messages]
Loading

Reviews (8): Last reviewed commit: "perf: reuse prefix duplicate match durin..." | Re-trigger Greptile

Comment thread tests/test_merge_prefix_duplicate_fastpath.py
Comment thread api/models.py
@ai-ag2026 ai-ag2026 force-pushed the perf/merge-visible-duplicate-fastpath branch from ed79091 to 829c68d Compare June 13, 2026 20:12
@nesquena-hermes

Copy link
Copy Markdown
Collaborator

Review — the prefix-match reuse is sound, and arguably a touch more correct than before

Reading api/models.py:4469-4502 on the PR head against origin/master, the optimization holds up. The hot path it removes is real: on the old code, every ordered-prefix match still did a second _matching_visible_duplicate(visible_key, sidecar_visible_keys, …) against the entire sidecar key set purely to debit the skip counter, which is the O(n²) behavior on long replay prefixes.

What the diff actually does

The ordered-prefix branch now records the key it already matched and reuses it for the counter, instead of rescanning:

if visible_key == expected_visible_key:
    replays_sidecar_prefix = True
    matched_prefix_visible_key = expected_visible_key   # ← exact case, no lookup
    state_replay_idx += 1
else:
    expected_visible_keys = {expected_visible_key}
    ...                                                  # fuzzy case: lookup over a 1-element set
    matched_prefix_visible_key = _matching_visible_duplicate(visible_key, expected_visible_keys, expected_visible_lookup)

and the counter debits that key directly (models.py:4500-4502):

if replays_sidecar_prefix:
    if matched_prefix_visible_key is not None:
        skipped_state_visible_counts[matched_prefix_visible_key] = (
            skipped_state_visible_counts.get(matched_prefix_visible_key, 0) + 1

Two things I checked specifically

  1. Not a dead-variable refactor. sidecar_visible_lookup / sidecar_visible_keys are still consumed by the non-prefix duplicate path (models.py:4570-4573), and the skipped_count < sidecar_count budget there is untouched — so the broad lookup is preserved exactly where the PR claims (non-prefix rows only).

  2. The counter target is at least as correct as before. The old global _matching_visible_duplicate could debit some matching sidecar key, not necessarily the one at the current replay index; the new code debits expected_visible_key — the specific message the prefix cursor is consuming. For sessions with repeated visible payloads (your test_prefix_replay_counts_matched_sidecar_key_for_later_duplicate_skips covers exactly this), debiting the indexed key is the more defensible choice, and the duplicate-budget skip downstream still works because both paths feed the same skipped_state_visible_counts.

Tests

test_prefix_replay_reuses_expected_visible_key_without_global_lookup monkeypatches _matching_visible_duplicate and asserts no call ever arrives with len(visible_keys) > 1 during a 64-row fuzzy ordered prefix — a clean behavioral guard that the broad scan is gone rather than just an output check. The second test pins the duplicate-budget semantics. Both are the right shape. CI is green across 3.11/3.12/3.13.

Verdict

LGTM on correctness and scope. One small note for the maintainer: per the PR's own follow-up, the headline win here is partly gated behind #4137 (which removes the worst initial-tail path before this merge loop is reached), so the two are worth benchmarking together on a real large-history load rather than only the synthetic 3k/3k probe.

@ai-ag2026 ai-ag2026 force-pushed the perf/merge-visible-duplicate-fastpath branch 3 times, most recently from ec602a4 to c33d347 Compare June 14, 2026 15:51
@nesquena-hermes

Copy link
Copy Markdown
Collaborator

Thanks — the intent (avoiding the O(n²) duplicate rescan on long replay-prefix merges) is a real cost worth fixing. But a deep re-review (rebased onto current master, full regression gate) shows this is not a pure-perf change — it alters the merge output in a fuzzy-duplicate edge case and can append a duplicate transcript row.

Root cause. On origin/master, when a state row replays the sidecar prefix, the skipped-count is debited against the key found by matching the row against the entire sidecar visible-key set:

matched_visible_key = _matching_visible_duplicate(visible_key, sidecar_visible_keys, sidecar_visible_lookup)

This PR debits against matched_prefix_visible_key, which is matched only against the singleton {expected_visible_key} at the current state_replay_idx. When a state row fuzzy-matches the expected prefix position but the actual sidecar duplicate lives at a different position, the wrong key gets debited from skipped_state_visible_counts, so the later dedup guard (#3346) no longer suppresses the real duplicate.

Reproduction (Codex-verified, PYTHONHASHSEED=4):

  • sidecar visible = ["alpha beta", "alpha gamma"]
  • state = ["alpha", "alpha beta"]
  • origin/master: output stays the two sidecar rows.
  • this PR: output appends a second "alpha beta" row (duplicate).

Fix. For prefix-replay skips, keep debiting the full-sidecar matched key, i.e. restore:

matched_visible_key = _matching_visible_duplicate(visible_key, sidecar_visible_keys, sidecar_visible_lookup)
# ...debit skipped_state_visible_counts[matched_visible_key]

rather than matched_prefix_visible_key. Note this is in tension with the perf goal for the fuzzy branch (it reintroduces the full-set lookup there), so the honest options are: (a) keep the fast path only for the exact-== prefix case and fall back to the original full-set debit for the fuzzy branch, or (b) drop the "pure perf / no behavior change" framing and add explicit tests for ambiguous fuzzy duplicates that lock in the intended output. A behavioral test asserting the merge output for the repro above (and a few fuzzy-duplicate permutations) would make whichever choice safe to land.

Returning for that — happy to re-gate once the dedup bookkeeping is equivalence-proven.

@nesquena-hermes nesquena-hermes added the changes-requested Maintainer left detailed feedback requesting changes; PR is waiting on author to address label Jun 14, 2026
@ai-ag2026 ai-ag2026 force-pushed the perf/merge-visible-duplicate-fastpath branch 2 times, most recently from 1928a22 to 50bf776 Compare June 14, 2026 19:10
@ai-ag2026 ai-ag2026 force-pushed the perf/merge-visible-duplicate-fastpath branch from 50bf776 to 3bdbc1e Compare June 15, 2026 13:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changes-requested Maintainer left detailed feedback requesting changes; PR is waiting on author to address

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants