fix(merge-batch-graphs): recover dropped cross-batch edges when source has file: prefix and target is bare#311
Open
haohung3010 wants to merge 1 commit into
Conversation
…e has file: prefix and target is bare
When the file-analyzer agent emits an edge whose source carries the
canonical `file:` prefix but whose target is a bare filename, the
merge step currently drops the edge as a missing-target because the
bare name never matches the prefixed node ID. This happens across
analysis-batch boundaries where the agent has only seen the bare
target name in its own batch's context.
Reproduction: 124-file Python project produces 14 silently-dropped
edges, including a real tested_by link from test_cli_fallback.py to
cli.py that then surfaces as a false negative ("cli.py untested").
The drop is silent because the warning bucket is grouped and capped.
Fix: build a one-pass suffix index from canonical node IDs before
the unfixable-drop in step 6. When an edge has the bare-target /
prefixed-source pattern, look up the target in the suffix index; if
exactly one node matches (unique recovery), rewrite the edge's
target and increment a cross_batch_recovered counter. Ambiguous
matches are still dropped to avoid silently picking the wrong target.
The counter is surfaced in the fix-patterns report.
3 new tests in tests/skill/understand/test_merge_batch_graphs.py
(CrossBatchEdgeRecoveryTests):
1. Unique suffix match is recovered
2. Basename-only suffix match is recovered
3. Ambiguous suffix match is left to drop
Full suite: 72/72 pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When the file-analyzer agent emits an edge whose
sourcecarries the canonicalfile:prefix but whosetargetis a bare filename,merge-batch-graphs.pycurrently drops the edge as a missing-target, because the bare name never matches the prefixed node ID in the deduplicated set. The agent has been observed emitting this pattern when a referenced file lives in a sibling batch and the agent has only seen the bare name in its own batch's context.Concrete reproduction on a 124-file Python project: 14 edges silently dropped, including a real
tested_bylink fromtest_cli_fallback.py → cli.py, which then surfaced as a false negative ("cli.pyappears untested"). The drop is silent because the warning bucket for missing targets is grouped and capped at 50.Fix
Before the unfixable-drop in step 6 of the merge ("Deduplicate edges, drop dangling"), build a one-pass suffix index
bare_to_prefixed: dict[str, set[str]]from the canonical node IDs. Then, when an edge has the bare-target / prefixed-source pattern, look the target up in the suffix index. If exactly one node matches (unique recovery), rewrite the edge's target to the prefixed form and increment across_batch_recoveredcounter. Ambiguous matches (multiple suffix hits) are left to drop as before so we don't silently pick a wrong target.The counter is surfaced in the fix-patterns report so users can see how many edges were auto-recovered.
Scope
understand-anything-plugin/skills/understand/merge-batch-graphs.py(~25 LOC added)Tests
3 new tests in
tests/skill/understand/test_merge_batch_graphs.py(CrossBatchEdgeRecoveryTests):Full suite: 72/72 pass.
Performance
The suffix index is built once per merge pass with O(n) work over the node-ID set. Edge resolution becomes a single dict lookup per dangling edge, so the fix is constant-time per edge. Negligible overhead on the largest projects tested.
Backwards compatibility
The fix only affects edges that were previously dropped. Edges that resolved before continue to resolve identically. No schema changes; no public API surface change.
Verification notes
The unit tests exercise the recovery path directly with constructed batch fixtures that reproduce the original bare-target / prefixed-source pattern, including the unique-match and ambiguous-match branches. We additionally attempted an end-to-end rerun against the original 124-file project where the bug was first observed, but in that rerun the upstream batches did not contain cross-file edges to recover (the input had
filesWithImports=0becauseextract-import-map.mjswas invoked in a configuration that didn't produce import data for the project root), so the recovery counter did not fire. The unit-test coverage is what verifies the fix; the rerun was inconclusive on its own terms.Anyone reproducing the original bug pattern (batches where
sourcecarries thefile:prefix andtargetis a bare filename matching a sibling-batch node) will see the recovery counter increment and the previously-dropped edges retained.