Fix #563: prevent rationale-node leakage and preserve calls direction#576
Fix #563: prevent rationale-node leakage and preserve calls direction#576opnsrcntrbtr wants to merge 3 commits intosafishamsi:v5from
Conversation
… direction
Two AST-extractor bugs were inflating god-node centrality:
1. Rationale leakage in cross-file resolvers
_resolve_cross_file_imports indexed any node whose label didn't end in
')' or '.py' as an importable entity, so rationale nodes (whose labels
are docstring text) ended up in both stem_to_entities and local_classes.
Result: phantom '<file>_rationale_N --uses--> ImportedThing' edges for
every imported entity in every file with docstrings.
The same leak existed in cross-file call resolution via
global_label_to_nid, where rationale labels could collide with callee
names.
Fix: skip nodes with file_type == 'rationale' in all three index loops.
2. Calls / rationale_for direction lost at export
to_json() called nx.json_graph.node_link_data() on a NetworkX undirected
Graph, which canonicalizes endpoint order. The build path already
stashes _src and _tgt on every edge for exactly this case (with a
comment to that effect at build.py:100), but the exporter never
consulted them, so 'calls' and 'rationale_for' edges were emitted with
arbitrary direction.
Fix: in to_json(), restore link['source'] / link['target'] from
_src / _tgt, then pop those internal fields before writing graph.json.
Repro on a minimal 3-file project (queue.py + contact_form.py + a test
file with multi-paragraph docstrings) showed:
Before fix: 31 edges, 12 incident on SQLiteQueue, 7 phantom uses-edges
from rationale nodes, 1 inverted calls edge.
After fix: 24 edges, 5 incident on SQLiteQueue, 0 phantom edges,
all calls edges go caller -> callee.
Adds tests/test_issue_563.py with 5 regression tests covering both bugs
plus a direction-pin on the existing sample_calls.py fixture. Full suite
passes (446/446).
There was a problem hiding this comment.
Pull request overview
Fixes issue #563 by preventing rationale nodes (docstring/comment fragments) from being treated as import/call participants, and by restoring directional edge endpoints during JSON export so calls/rationale_for directions are preserved.
Changes:
- Exclude
file_type == "rationale"nodes from cross-file import indexing and cross-file call label resolution. - In
to_json, restorelink["source"]/link["target"]from preserved_src/_tgtand strip those internal fields fromgraph.json. - Add a dedicated regression test suite covering rationale leakage, calls direction, rationale_for direction, and
_src/_tgtnon-leakage.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
tests/test_issue_563.py |
Adds regression tests validating rationale-node isolation, edge direction correctness, and _src/_tgt cleanup. |
graphify/extract.py |
Filters out rationale nodes from cross-file resolvers to prevent phantom uses/calls edges. |
graphify/export.py |
Restores directional endpoints from _src/_tgt during JSON serialization and removes internal metadata keys. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| e for e in repro_graph["links"] | ||
| if ("rationale" in e["source"] or "rationale" in e["target"]) | ||
| and e["relation"] != "rationale_for" | ||
| ] |
There was a problem hiding this comment.
The tests detect rationale nodes by checking whether the edge endpoint IDs contain the substring "rationale". This couples the regression tests to the current ID-generation scheme and can produce false negatives (if rationale IDs ever change) or false positives (if a legitimate entity ID contains "rationale"). Consider building an id→file_type map from repro_graph["nodes"] and using that to identify rationale endpoints instead.
| # Restore original edge direction. Undirected NetworkX storage may | ||
| # canonicalize endpoint order, flipping `calls` and other directional | ||
| # edges in graph.json. The build path stashes the true endpoints in | ||
| # _src/_tgt for exactly this purpose (#563). | ||
| true_src = link.pop("_src", None) | ||
| true_tgt = link.pop("_tgt", None) | ||
| if true_src is not None and true_tgt is not None: | ||
| link["source"] = true_src | ||
| link["target"] = true_tgt |
There was a problem hiding this comment.
Edge-direction is restored for graph.json here, but graph.html generation still iterates for u, v, data in G.edges(data=True) and uses from=u, to=v (export.py around the to_html edge loop). On an undirected Graph that can still flip directional relations in the HTML visualization (since arrows are enabled). Consider using the preserved _src/_tgt there as well (or disabling arrows for undirected relations) so calls/rationale_for direction is consistent across exports.
|
@copilot apply changes based on the comments in this thread |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| bad = [ | ||
| e for e in repro_graph["links"] | ||
| if ("rationale" in e["source"] or "rationale" in e["target"]) | ||
| and e["relation"] != "rationale_for" | ||
| ] |
There was a problem hiding this comment.
The test identifies rationale nodes by checking whether endpoint IDs contain the substring "rationale". This couples the regression to the current ID scheme and can miss/false-trigger if IDs change. Prefer deriving an id→file_type map from repro_graph["nodes"] and checking endpoints’ file_type == "rationale" instead.
| rats = [e for e in repro_graph["links"] if e["relation"] == "rationale_for"] | ||
| assert rats, "expected some rationale_for edges in the repro" | ||
| bad = [e for e in rats if "rationale" not in e["source"]] | ||
| assert bad == [], ( |
There was a problem hiding this comment.
Same issue here: determining rationale endpoints via "rationale" substring in the ID is brittle. Use the node metadata (file_type == "rationale") from repro_graph["nodes"] to assert rationale_for direction instead of relying on naming conventions.
| # Only index real classes/functions (not file nodes, not method stubs, | ||
| # and not rationale nodes whose labels are free-form docstring text — #563) | ||
| if ( | ||
| label | ||
| and not label.endswith((")", ".py")) | ||
| and "_" not in label[:1] | ||
| and node.get("file_type") != "rationale" | ||
| ): |
There was a problem hiding this comment.
The comment says this indexes “classes/functions”, but the filter excludes any label ending with ")". In the generic extractor, function labels are formatted as name() (see graphify/extract.py around where nodes are created), so functions will never be indexed here. Either update the comment/docstring to reflect that only class-like nodes are indexed, or normalize labels (e.g., strip trailing "()") so imported functions can be resolved as intended.
| # contact_form() calls SQLiteQueue.check_idempotency() and SQLiteQueue.enqueue() | ||
| assert ( | ||
| "contact_form_contact_form", | ||
| "jobq_sqlitequeue_check_idempotency", | ||
| ) in calls, f"expected caller->callee edge missing. got: {calls}" | ||
|
|
||
| # The test function calls contact_form() | ||
| assert ( | ||
| "test_customer_dedup_test_odoo_tag_missing_still_chatter", | ||
| "contact_form_contact_form", | ||
| ) in calls, f"test->contact_form edge missing. got: {calls}" | ||
|
|
There was a problem hiding this comment.
This test claims contact_form() calls both check_idempotency() and enqueue(), but it only asserts the presence of the check_idempotency edge. If the enqueue call edge stops being emitted, the test would still pass as long as the inverted edge is also absent. Add an explicit assertion that (contact_form_contact_form -> jobq_sqlitequeue_enqueue) exists to fully pin the reported regression surface from #563.
- tests: identify rationale nodes by file_type metadata, not id substring (decouples regression from current `<stem>_rationale_<line>` naming) - tests: assert contact_form -> SQLiteQueue.enqueue calls edge directly (previously only inversions were guarded; a dropped enqueue edge would have passed silently) - extract.py: clarify cross-file import indexing comment — the filter intentionally indexes only class-level entities (function/method labels end in "()" and are excluded by design)
to_json was patched to consult _src/_tgt before serializing edge endpoints, but to_html still read endpoints directly from G.edges(), so calls and rationale_for arrows in the rendered graph.html still pointed in canonicalized (often inverted) order. Apply the same direction restore in to_html when building vis.js edges. Use data.get (not pop) since G.edges(data=True) yields the live attribute dict and other exporters may run after to_html. Adds test_to_html_preserves_calls_and_rationale_for_direction: parses RAW_EDGES out of the rendered HTML and pins both the caller->callee assertions from safishamsi#563 and the rationale->parent direction. Without the fix, the test reports 11 flipped arrows on the 3-file repro.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| (root / "jobq.py").write_text(textwrap.dedent(''' | ||
| """Job-queue module — provides SQLiteQueue for idempotent job tracking.""" | ||
|
|
There was a problem hiding this comment.
Path.write_text() defaults to the platform’s locale encoding. This fixture string includes a Unicode em dash (—), so on Windows this can write non‑UTF‑8 bytes that tree-sitter (expects UTF‑8) may parse incorrectly. Specify encoding="utf-8" (and ideally newline="\n") for the repro files to keep the test deterministic across platforms.
| (root / "contact_form.py").write_text(textwrap.dedent(''' | ||
| """HTTP router for the contact form endpoint.""" | ||
| from jobq import SQLiteQueue | ||
|
|
There was a problem hiding this comment.
Same portability issue here: use encoding="utf-8" on this write_text() call (the content includes Unicode characters). Otherwise the file may be written in a locale encoding and break tree-sitter parsing on Windows.
| (root / "test_customer_dedup.py").write_text(textwrap.dedent(''' | ||
| """Tests for customer dedup — exercises phone normalization and Odoo tagging.""" | ||
| from jobq import SQLiteQueue | ||
| from contact_form import contact_form |
There was a problem hiding this comment.
Same issue as above: add encoding="utf-8" to this write_text() call to avoid locale-dependent encodings (especially since the docstring contains an em dash).
| ) | ||
| graph_path = root / "graphify-out" / "graph.json" | ||
| assert graph_path.exists(), f"graph.json not produced. stderr: {result.stderr}" | ||
| return json.loads(graph_path.read_text()) |
There was a problem hiding this comment.
graph_path.read_text() uses the default locale encoding. Since graph.json is written as UTF‑8, this can fail or misdecode on non‑UTF‑8 locales. Read with encoding="utf-8" for cross-platform reliability.
| return json.loads(graph_path.read_text()) | |
| return json.loads(graph_path.read_text(encoding="utf-8")) |
Fix #563: prevent rationale-node leakage and preserve calls direction
Fixes #563.
Explain it like I'm 6 (and I just learned to code)
Imagine your code is a city, and graphify draws a map of it. Every function is a house, and an arrow from house A to house B means "A talks to B."
The map had two problems.
Problem 1 — The sticky-note houses.
Whenever you write a comment in your code (like
"""this function checks phone numbers"""), graphify also draws a little sticky-note house on the map for that comment. That part is fine.But there was a bug: the program that draws arrows treated those sticky-notes like real houses. So if your file had a sticky-note saying "phone numbers must start with +1" and your file also said
from queue import SQLiteQueueat the top, the map drew an arrow:That arrow is silly — a sticky-note can't "use" a house. But the bug drew thousands of these arrows. And then graphify would say "look how popular SQLiteQueue is, lots of things point to it!" — but most of the things pointing to it were sticky-notes, not real houses.
Fix: When picking which houses to draw arrows from, just check
is this a sticky-note? if yes, skip it.Three lines of code, three places.Problem 2 — The arrows pointed the wrong way.
When function A calls function B, the arrow should go A ➡️ B. graphify's drawing code knew this and got it right.
But then graphify uses a library called NetworkX to store the map, and the version of NetworkX storage being used is "two-way streets" — it doesn't remember which end of the arrow is the head. So when graphify saved the map to a file, the arrows came out pointing in random directions. Sometimes A ➡️ B, sometimes B ➡️ A.
The code already knew this would happen — it secretly wrote down the real direction on each arrow as little notes called
_src(real start) and_tgt(real end). But when it was time to save the map, it forgot to read its own notes.Fix: Before saving the map, read the secret notes and put the arrows back the right way. Then erase the notes (they were only for graphify's own use, no need to ship them in the file).
The result:
In a tiny test project I built (3 files, one class, a few functions calling each other):
And I added 5 new tests that will fail loudly if anyone accidentally re-introduces either bug later.
Summary
Two AST-extractor bugs were inflating god-node centrality and silently corrupting
graph.json. Both are observable in the AST-only path (graphify update, no LLM required), so the regression test runs in <1s with no API key.SQLiteQueuerationale --uses--> Xedgescallsedges in repro_src/_tgtkeys leaking intograph.jsonWhat was broken
Bug 1 — rationale leakage into cross-file resolvers
graphify/extract.py::_resolve_cross_file_importsindexed any node whose label didn't end in)or.pyas an "importable entity". Rationale nodes have IDs like<stem>_rationale_<line>and labels that are free-form docstring text, so they sailed through both filters and ended up in:stem_to_entities— making them resolvable as "imported names"local_classes— making them act as "importing classes" in the destination fileThe same leak existed in
extract()'sglobal_label_to_nidfor cross-file call resolution, where a rationale label could collide with a callee name and produce a spuriouscallsedge.Bug 2 —
callsandrationale_fordirection lost at exportgraphify/export.py::to_jsoncallsnx.json_graph.node_link_data()on a NetworkX undirectedGraph. NetworkX may store edge endpoints in either canonical order, which silently flipped directional relations (calls,rationale_for) on serialization.graphify/build.py::build_from_jsonalready stashes the true endpoints in_src/_tgtprecisely for this case — there's even a comment at line 100 explaining why — but the exporter never consulted them.The reporter on issue #563 attributed this to the AST visitor, but the AST visitor is correct (every
add_edge(func_nid, target_nid, "calls", ...)site at lines 1733, 2042, 2219, 2390, 2554, 2984, 3158 emits caller→callee). The corruption happens later, at JSON export.What changed
graphify/extract.pyThree additions of
node.get("file_type") != "rationale"to the indexing filters:_resolve_cross_file_importsPass 1 (line ~2616) — keep rationale nodes out ofstem_to_entities_resolve_cross_file_importsPass 2 (line ~2628) — keep rationale nodes out oflocal_classesextract()cross-file call resolution (line ~3354) — skip rationale nodes when buildingglobal_label_to_nidgraphify/export.pyIn
to_json(afternode_link_data), restore each link'ssource/targetfrom the preserved_src/_tgtand pop those internal fields so they don't leak into the publishedgraph.json.tests/test_issue_563.py(new, 240 lines)Five regression tests:
test_no_rationale_nodes_in_uses_or_calls_edges— rationale nodes must only participate inrationale_foredges.test_rationale_for_direction_is_rationale_to_parent— pins the direction the AST extractor actually emits, which the export bug was silently reversing.test_calls_direction_is_caller_to_callee— pins the three exact directions reported as inverted in Two extractor bugs systematically inflate god-node centrality (rationale fragments + calls direction inversion) #563.test_calls_direction_on_existing_sample_calls_fixture— adds a direction assertion on the in-treetests/fixtures/sample_calls.pyso the existing AST coverage gains direction safety.test_src_tgt_metadata_not_leaked_into_graph_json— guards the export-side cleanup.Reproduction (without this patch)
On
v0.5.0(default branch) this reports 7 phantomusesedges and 12 incident onSQLiteQueue. After this patch, both numbers drop to 0 and 5 respectively.Note on the issue's framing
The reporter in #563 described Bug 2 as an AST visitor inversion (e.g., "the
sourceshould be the function/method containing the call site"). The AST visitor is in fact already correct inmain— the inversion is at the JSON export layer. I've kept the reporter's example assertions (e.g.,contact_form() --calls--> queue.check_idempotency()) as the assertions in the regression test, since those are the user-visible directions they expected to hold and which now do.The reporter also described Bug 1 as
SQLiteQueue --uses--> rationale_node(rationale as target). On reproducing, the actual direction wasrationale_node --uses--> SQLiteQueue. The fix collapses the same root cause either way: rationale nodes were participating in cross-file resolution at all.Out of scope
The skill prompt files (
graphify/skill*.md) instruct LLM subagents to userationale_foralready; subagent compliance is a separate, harder problem (the reporter's audit shows0/350rationale-targeted edges usingrationale_forin their LLM-extracted graph). This PR focuses on the deterministic AST path, where the bugs are unambiguous and the fix is small. A follow-up PR can add a defensive sanitizer over LLM-emitted edges that rewritesuses → rationale_forand drops anything pointing at a node whosefile_type == "rationale".