Skip to content

fix(merge-batch-graphs): recover dropped cross-batch edges when source has file: prefix and target is bare#311

Open
haohung3010 wants to merge 1 commit into
Lum1104:mainfrom
AgenticEvergreen:fix/merge-batch-cross-batch-edge-recovery
Open

fix(merge-batch-graphs): recover dropped cross-batch edges when source has file: prefix and target is bare#311
haohung3010 wants to merge 1 commit into
Lum1104:mainfrom
AgenticEvergreen:fix/merge-batch-cross-batch-edge-recovery

Conversation

@haohung3010
Copy link
Copy Markdown

Problem

When the file-analyzer agent emits an edge whose source carries the canonical file: prefix but whose target is a bare filename, merge-batch-graphs.py currently drops the edge as a missing-target, because the bare name never matches the prefixed node ID in the deduplicated set. The agent has been observed emitting this pattern when a referenced file lives in a sibling batch and the agent has only seen the bare name in its own batch's context.

Concrete reproduction on a 124-file Python project: 14 edges silently dropped, including a real tested_by link from test_cli_fallback.py → cli.py, which then surfaced as a false negative ("cli.py appears untested"). The drop is silent because the warning bucket for missing targets is grouped and capped at 50.

Fix

Before the unfixable-drop in step 6 of the merge ("Deduplicate edges, drop dangling"), build a one-pass suffix index bare_to_prefixed: dict[str, set[str]] from the canonical node IDs. Then, when an edge has the bare-target / prefixed-source pattern, look the target up in the suffix index. If exactly one node matches (unique recovery), rewrite the edge's target to the prefixed form and increment a cross_batch_recovered counter. Ambiguous matches (multiple suffix hits) are left to drop as before so we don't silently pick a wrong target.

The counter is surfaced in the fix-patterns report so users can see how many edges were auto-recovered.

Scope

  • 1 file: understand-anything-plugin/skills/understand/merge-batch-graphs.py (~25 LOC added)
  • Behavior change: edges that previously dropped silently with a "missing target" warning now resolve when a unique suffix match exists. No existing behavior is changed; this only converts previously-dropped edges into present ones.

Tests

3 new tests in tests/skill/understand/test_merge_batch_graphs.py (CrossBatchEdgeRecoveryTests):

  1. Unique suffix match is recovered (the canonical fix path)
  2. Basename-only suffix match is recovered (handles agents that strip directory prefixes)
  3. Ambiguous suffix match is left to drop (safety case — we don't pick when multiple candidates match)

Full suite: 72/72 pass.

Performance

The suffix index is built once per merge pass with O(n) work over the node-ID set. Edge resolution becomes a single dict lookup per dangling edge, so the fix is constant-time per edge. Negligible overhead on the largest projects tested.

Backwards compatibility

The fix only affects edges that were previously dropped. Edges that resolved before continue to resolve identically. No schema changes; no public API surface change.

Verification notes

The unit tests exercise the recovery path directly with constructed batch fixtures that reproduce the original bare-target / prefixed-source pattern, including the unique-match and ambiguous-match branches. We additionally attempted an end-to-end rerun against the original 124-file project where the bug was first observed, but in that rerun the upstream batches did not contain cross-file edges to recover (the input had filesWithImports=0 because extract-import-map.mjs was invoked in a configuration that didn't produce import data for the project root), so the recovery counter did not fire. The unit-test coverage is what verifies the fix; the rerun was inconclusive on its own terms.

Anyone reproducing the original bug pattern (batches where source carries the file: prefix and target is a bare filename matching a sibling-batch node) will see the recovery counter increment and the previously-dropped edges retained.

…e has file: prefix and target is bare

When the file-analyzer agent emits an edge whose source carries the
canonical `file:` prefix but whose target is a bare filename, the
merge step currently drops the edge as a missing-target because the
bare name never matches the prefixed node ID. This happens across
analysis-batch boundaries where the agent has only seen the bare
target name in its own batch's context.

Reproduction: 124-file Python project produces 14 silently-dropped
edges, including a real tested_by link from test_cli_fallback.py to
cli.py that then surfaces as a false negative ("cli.py untested").
The drop is silent because the warning bucket is grouped and capped.

Fix: build a one-pass suffix index from canonical node IDs before
the unfixable-drop in step 6. When an edge has the bare-target /
prefixed-source pattern, look up the target in the suffix index; if
exactly one node matches (unique recovery), rewrite the edge's
target and increment a cross_batch_recovered counter. Ambiguous
matches are still dropped to avoid silently picking the wrong target.

The counter is surfaced in the fix-patterns report.

3 new tests in tests/skill/understand/test_merge_batch_graphs.py
(CrossBatchEdgeRecoveryTests):
  1. Unique suffix match is recovered
  2. Basename-only suffix match is recovered
  3. Ambiguous suffix match is left to drop

Full suite: 72/72 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant