What
During Phase 2 batch analysis, doc-batch file-analyzer subagents emit edges to nodes they don't own using the wrong type prefix. On a 12-batch run over a 69-file repo, 19 dangling edges were produced by doc-batches; 17 of 19 were recoverable via prefix remap in Phase 3's assemble-reviewer, but those should never have been dangling in the first place.
Examples from a real run (Could not fix output from merge-batch-graphs.py):
- Edge document:docs/prds/2026-05-26-chart-overlays.md → file:COACH_PROMPT.md (documents)
REAL TARGET IS: document:COACH_PROMPT.md
- Edge document:docs/prds/2026-05-26-chart-overlays.md → file:.mcp.json (related)
REAL TARGET IS: config:.mcp.json
- Edge document:docs/plans/2026-05-26-five-overlay-roadmap.md → file:zone2_charts/chart_renderer.py (documents)
REAL TARGET IS: file:zone2-charts/src/zone2_charts/chart_renderer.py
- Edge file:zone2-charts/tests/test_streams_fetcher.py → file:zone2-charts/tests/fixtures/known_good_run.json (depends_on)
REAL TARGET IS: config:zone2-charts/tests/fixtures/known_good_run.json
Pattern: doc-batches and test-batches that reference code/config/docs/fixtures owned by OTHER batches default to file: prefix (and sometimes guess wrong path segments), because they have no way to know what type prefix other batches assigned.
Why this happens
Each file-analyzer dispatch (skills/understand/agents/file-analyzer.md) receives:
batchImportData[] — pre-resolved internal imports for files in this batch (correct, used directly)
neighborMap{} — cross-batch neighbors with their exported symbols (for confidence-boosting cross-batch calls and imports edges)
But the dispatch does NOT include a map of "here are the canonical node IDs allocated by other batches." So when a doc-batch wants to emit documents or related to a sibling node it didn't analyze, it falls back to file:<guess-path> (matching its own file-tagging convention) — usually wrong for documents (document:), configs (config:), or even paths (the src/zone2_charts/ ↔ zone2_charts/ slip above).
Cost
- Phase 3 assemble-reviewer recovers most via prefix remap, but that's a post-hoc rescue dispatched on an LLM call. 17/19 fixed on my run, 2 truly missing (
.understandignore-excluded targets, correct to drop). The reviewer adds latency + LLM cost.
- The 2 truly missing edges are also a side-effect of this: doc-batches reference nodes that were intentionally excluded by
.understandignore (symlink targets, doc aliases). With a known-prefix map at dispatch, these would never be emitted, eliminating the "Could not fix" tail.
Fix candidate
Phase 1.5 compute-batches.mjs already produces batches.json with a neighborMap per batch. Extend it to also emit a knownCrossBatchNodeIds[] — a flat array of "<type>:<path>" IDs allocated by other batches (everything except the current batch's own files). Inject this list into the file-analyzer dispatch prompt:
**Cross-batch known node IDs (use these verbatim when emitting edges to nodes outside this batch):**
```json
["document:CLAUDE.md", "document:COACH_PROMPT.md", "config:.mcp.json", "file:zone2-charts/src/zone2_charts/chart_renderer.py", ...]
DO NOT invent node IDs. If the file you want to reference is NOT in this list, it's not part of the graph (likely excluded via .understandignore); drop the edge.
Doc-batches and test-batches would then resolve cross-batch references correctly on the first pass, and the assemble-reviewer wouldn't need to do prefix recovery as a routine activity.
## Why not just have the assemble-reviewer always fix it
It does — 17/19 fix rate observed. But:
1. The reviewer is a separate LLM dispatch; it's not free.
2. The recovery is prefix-only; it can't fix path-segment errors like `zone2_charts/foo.py` → `zone2-charts/src/zone2_charts/foo.py`. The reviewer recovered those because I gave it the `[REAL TARGET IS: ...]` annotations explicitly in the prompt; without that, the per-segment guess would have stayed dangling.
3. Even if recovery is 100%, "dispatch produces correct output" beats "dispatch produces wrong output, then a downstream agent corrects it" for cost + clarity.
## Repro
1. Take any repo with a mix of `code/` + `docs/` + `config/` that gets sorted into separate batches by `compute-batches.mjs` (typical for repos with >30 files).
2. Run `/understand .`.
3. After Phase 2 merge, inspect the stderr "Could not fix (N issues — needs agent review)" section.
4. The dangling edges that came from doc-batches will show the wrong-prefix pattern.
## Additional context
- Plugin v2.7.5, run on a 69-file repo with `compute-batches.mjs` producing 12 batches (2 code-only Python batches, 5 doc-only batches, 1 config batch, 1 test batch, 3 mixed misc batches).
- Pre-merge: 283 nodes / 403 edges. Post-merge: 283 nodes / 384 edges (-19 dangling). Post-assemble-review: 283 nodes / 401 edges (+17 recovered).
- The 4 production nodes' fixture references (`tests/fixtures/known_good_run.json` etc.) had a path-segment error in addition to the prefix slip — `test_streams_fetcher.py` emitted `file:zone2-charts/tests/fixtures/known_good_run.json` while the real ID is `config:zone2-charts/tests/fixtures/known_good_run.json`. Both the prefix (`file:` → `config:`) and the type-assignment by the fixture-batch (chose `config` over `file`) were unknowable to the test-batch.
What
During Phase 2 batch analysis, doc-batch file-analyzer subagents emit edges to nodes they don't own using the wrong type prefix. On a 12-batch run over a 69-file repo, 19 dangling edges were produced by doc-batches; 17 of 19 were recoverable via prefix remap in Phase 3's assemble-reviewer, but those should never have been dangling in the first place.
Examples from a real run (
Could not fixoutput frommerge-batch-graphs.py):Pattern: doc-batches and test-batches that reference code/config/docs/fixtures owned by OTHER batches default to
file:prefix (and sometimes guess wrong path segments), because they have no way to know what type prefix other batches assigned.Why this happens
Each file-analyzer dispatch (
skills/understand/agents/file-analyzer.md) receives:batchImportData[]— pre-resolved internal imports for files in this batch (correct, used directly)neighborMap{}— cross-batch neighbors with their exported symbols (for confidence-boosting cross-batchcallsandimportsedges)But the dispatch does NOT include a map of "here are the canonical node IDs allocated by other batches." So when a doc-batch wants to emit
documentsorrelatedto a sibling node it didn't analyze, it falls back tofile:<guess-path>(matching its own file-tagging convention) — usually wrong for documents (document:), configs (config:), or even paths (thesrc/zone2_charts/↔zone2_charts/slip above).Cost
.understandignore-excluded targets, correct to drop). The reviewer adds latency + LLM cost..understandignore(symlink targets, doc aliases). With a known-prefix map at dispatch, these would never be emitted, eliminating the "Could not fix" tail.Fix candidate
Phase 1.5
compute-batches.mjsalready producesbatches.jsonwith aneighborMapper batch. Extend it to also emit aknownCrossBatchNodeIds[]— a flat array of"<type>:<path>"IDs allocated by other batches (everything except the current batch's own files). Inject this list into the file-analyzer dispatch prompt:DO NOT invent node IDs. If the file you want to reference is NOT in this list, it's not part of the graph (likely excluded via
.understandignore); drop the edge.