Skip to content

Fix #563: prevent rationale-node leakage and preserve calls direction#576

Open
opnsrcntrbtr wants to merge 3 commits intosafishamsi:v5from
opnsrcntrbtr:fix/issue-563-rationale-leak
Open

Fix #563: prevent rationale-node leakage and preserve calls direction#576
opnsrcntrbtr wants to merge 3 commits intosafishamsi:v5from
opnsrcntrbtr:fix/issue-563-rationale-leak

Conversation

@opnsrcntrbtr
Copy link
Copy Markdown

Fix #563: prevent rationale-node leakage and preserve calls direction

Fixes #563.

Explain it like I'm 6 (and I just learned to code)

Imagine your code is a city, and graphify draws a map of it. Every function is a house, and an arrow from house A to house B means "A talks to B."

The map had two problems.

Problem 1 — The sticky-note houses.

Whenever you write a comment in your code (like """this function checks phone numbers"""), graphify also draws a little sticky-note house on the map for that comment. That part is fine.

But there was a bug: the program that draws arrows treated those sticky-notes like real houses. So if your file had a sticky-note saying "phone numbers must start with +1" and your file also said from queue import SQLiteQueue at the top, the map drew an arrow:

📝 sticky-note about phone numbers ➡️ 🏠 SQLiteQueue

That arrow is silly — a sticky-note can't "use" a house. But the bug drew thousands of these arrows. And then graphify would say "look how popular SQLiteQueue is, lots of things point to it!" — but most of the things pointing to it were sticky-notes, not real houses.

Fix: When picking which houses to draw arrows from, just check is this a sticky-note? if yes, skip it. Three lines of code, three places.

Problem 2 — The arrows pointed the wrong way.

When function A calls function B, the arrow should go A ➡️ B. graphify's drawing code knew this and got it right.

But then graphify uses a library called NetworkX to store the map, and the version of NetworkX storage being used is "two-way streets" — it doesn't remember which end of the arrow is the head. So when graphify saved the map to a file, the arrows came out pointing in random directions. Sometimes A ➡️ B, sometimes B ➡️ A.

The code already knew this would happen — it secretly wrote down the real direction on each arrow as little notes called _src (real start) and _tgt (real end). But when it was time to save the map, it forgot to read its own notes.

Fix: Before saving the map, read the secret notes and put the arrows back the right way. Then erase the notes (they were only for graphify's own use, no need to ship them in the file).

The result:

In a tiny test project I built (3 files, one class, a few functions calling each other):

  • The map went from 31 arrows to 24 arrows — the 7 silly sticky-note arrows are gone.
  • SQLiteQueue went from "12 arrows touching it" to "5 arrows touching it" — the rest were lies.
  • Every "A calls B" arrow now actually points from A to B.

And I added 5 new tests that will fail loudly if anyone accidentally re-introduces either bug later.

Summary

Two AST-extractor bugs were inflating god-node centrality and silently corrupting graph.json. Both are observable in the AST-only path (graphify update, no LLM required), so the regression test runs in <1s with no API key.

Before After
Edges in 3-file repro 31 24
Edges incident on SQLiteQueue 12 5
Phantom rationale --uses--> X edges 7 0
Inverted calls edges in repro 1 0
_src / _tgt keys leaking into graph.json yes no
Existing test suite 441 passed 446 passed (+5 new)

What was broken

Bug 1 — rationale leakage into cross-file resolvers

graphify/extract.py::_resolve_cross_file_imports indexed any node whose label didn't end in ) or .py as an "importable entity". Rationale nodes have IDs like <stem>_rationale_<line> and labels that are free-form docstring text, so they sailed through both filters and ended up in:

  • stem_to_entities — making them resolvable as "imported names"
  • local_classes — making them act as "importing classes" in the destination file

The same leak existed in extract()'s global_label_to_nid for cross-file call resolution, where a rationale label could collide with a callee name and produce a spurious calls edge.

Bug 2 — calls and rationale_for direction lost at export

graphify/export.py::to_json calls nx.json_graph.node_link_data() on a NetworkX undirected Graph. NetworkX may store edge endpoints in either canonical order, which silently flipped directional relations (calls, rationale_for) on serialization.

graphify/build.py::build_from_json already stashes the true endpoints in _src / _tgt precisely for this case — there's even a comment at line 100 explaining why — but the exporter never consulted them.

The reporter on issue #563 attributed this to the AST visitor, but the AST visitor is correct (every add_edge(func_nid, target_nid, "calls", ...) site at lines 1733, 2042, 2219, 2390, 2554, 2984, 3158 emits caller→callee). The corruption happens later, at JSON export.

What changed

graphify/extract.py

Three additions of node.get("file_type") != "rationale" to the indexing filters:

  • _resolve_cross_file_imports Pass 1 (line ~2616) — keep rationale nodes out of stem_to_entities
  • _resolve_cross_file_imports Pass 2 (line ~2628) — keep rationale nodes out of local_classes
  • extract() cross-file call resolution (line ~3354) — skip rationale nodes when building global_label_to_nid

graphify/export.py

In to_json (after node_link_data), restore each link's source / target from the preserved _src / _tgt and pop those internal fields so they don't leak into the published graph.json.

tests/test_issue_563.py (new, 240 lines)

Five regression tests:

  1. test_no_rationale_nodes_in_uses_or_calls_edges — rationale nodes must only participate in rationale_for edges.
  2. test_rationale_for_direction_is_rationale_to_parent — pins the direction the AST extractor actually emits, which the export bug was silently reversing.
  3. test_calls_direction_is_caller_to_callee — pins the three exact directions reported as inverted in Two extractor bugs systematically inflate god-node centrality (rationale fragments + calls direction inversion) #563.
  4. test_calls_direction_on_existing_sample_calls_fixture — adds a direction assertion on the in-tree tests/fixtures/sample_calls.py so the existing AST coverage gains direction safety.
  5. test_src_tgt_metadata_not_leaked_into_graph_json — guards the export-side cleanup.

Reproduction (without this patch)

mkdir repro && cd repro

cat > jobq.py <<'PY'
"""Job-queue module."""
class SQLiteQueue:
    """Lightweight SQLite-backed queue."""
    def check_idempotency(self, key):
        """Return existing lead for key, or None."""
        return None
    def enqueue(self, payload):
        """Persist payload to the queue."""
        return True
PY

cat > contact_form.py <<'PY'
"""HTTP router for the contact form endpoint."""
from jobq import SQLiteQueue
q = SQLiteQueue()
def contact_form(payload):
    """Handle a contact form submission."""
    if q.check_idempotency(payload):
        return None
    q.enqueue(payload)
PY

cat > test_dedup.py <<'PY'
"""Tests for dedup."""
from jobq import SQLiteQueue
from contact_form import contact_form
def test_us_phone():
    """US phone normalization."""
def test_uk_phone():
    """UK phone with +44 prefix."""
def test_odoo_tag():
    """Tag write failure logged but doesn't stop chatter note."""
    SQLiteQueue().enqueue({})
    contact_form(None)
PY

graphify update .
python3 -c "
import json
g = json.load(open('graphify-out/graph.json'))
print('rationale-source uses edges:',
      sum(1 for e in g['links'] if 'rationale' in e['source'] and e['relation'] == 'uses'))
print('SQLiteQueue degree:',
      sum(1 for e in g['links']
          if 'sqlitequeue' in e['source'] or 'sqlitequeue' in e['target']))
"

On v0.5.0 (default branch) this reports 7 phantom uses edges and 12 incident on SQLiteQueue. After this patch, both numbers drop to 0 and 5 respectively.

Note on the issue's framing

The reporter in #563 described Bug 2 as an AST visitor inversion (e.g., "the source should be the function/method containing the call site"). The AST visitor is in fact already correct in main — the inversion is at the JSON export layer. I've kept the reporter's example assertions (e.g., contact_form() --calls--> queue.check_idempotency()) as the assertions in the regression test, since those are the user-visible directions they expected to hold and which now do.

The reporter also described Bug 1 as SQLiteQueue --uses--> rationale_node (rationale as target). On reproducing, the actual direction was rationale_node --uses--> SQLiteQueue. The fix collapses the same root cause either way: rationale nodes were participating in cross-file resolution at all.

Out of scope

The skill prompt files (graphify/skill*.md) instruct LLM subagents to use rationale_for already; subagent compliance is a separate, harder problem (the reporter's audit shows 0/350 rationale-targeted edges using rationale_for in their LLM-extracted graph). This PR focuses on the deterministic AST path, where the bugs are unambiguous and the fix is small. A follow-up PR can add a defensive sanitizer over LLM-emitted edges that rewrites uses → rationale_for and drops anything pointing at a node whose file_type == "rationale".

… direction

Two AST-extractor bugs were inflating god-node centrality:

1. Rationale leakage in cross-file resolvers
   _resolve_cross_file_imports indexed any node whose label didn't end in
   ')' or '.py' as an importable entity, so rationale nodes (whose labels
   are docstring text) ended up in both stem_to_entities and local_classes.
   Result: phantom '<file>_rationale_N --uses--> ImportedThing' edges for
   every imported entity in every file with docstrings.

   The same leak existed in cross-file call resolution via
   global_label_to_nid, where rationale labels could collide with callee
   names.

   Fix: skip nodes with file_type == 'rationale' in all three index loops.

2. Calls / rationale_for direction lost at export
   to_json() called nx.json_graph.node_link_data() on a NetworkX undirected
   Graph, which canonicalizes endpoint order. The build path already
   stashes _src and _tgt on every edge for exactly this case (with a
   comment to that effect at build.py:100), but the exporter never
   consulted them, so 'calls' and 'rationale_for' edges were emitted with
   arbitrary direction.

   Fix: in to_json(), restore link['source'] / link['target'] from
   _src / _tgt, then pop those internal fields before writing graph.json.

Repro on a minimal 3-file project (queue.py + contact_form.py + a test
file with multi-paragraph docstrings) showed:

  Before fix: 31 edges, 12 incident on SQLiteQueue, 7 phantom uses-edges
              from rationale nodes, 1 inverted calls edge.
  After fix:  24 edges, 5 incident on SQLiteQueue, 0 phantom edges,
              all calls edges go caller -> callee.

Adds tests/test_issue_563.py with 5 regression tests covering both bugs
plus a direction-pin on the existing sample_calls.py fixture. Full suite
passes (446/446).
Copilot AI review requested due to automatic review settings April 27, 2026 18:20
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes issue #563 by preventing rationale nodes (docstring/comment fragments) from being treated as import/call participants, and by restoring directional edge endpoints during JSON export so calls/rationale_for directions are preserved.

Changes:

  • Exclude file_type == "rationale" nodes from cross-file import indexing and cross-file call label resolution.
  • In to_json, restore link["source"]/link["target"] from preserved _src/_tgt and strip those internal fields from graph.json.
  • Add a dedicated regression test suite covering rationale leakage, calls direction, rationale_for direction, and _src/_tgt non-leakage.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
tests/test_issue_563.py Adds regression tests validating rationale-node isolation, edge direction correctness, and _src/_tgt cleanup.
graphify/extract.py Filters out rationale nodes from cross-file resolvers to prevent phantom uses/calls edges.
graphify/export.py Restores directional endpoints from _src/_tgt during JSON serialization and removes internal metadata keys.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/test_issue_563.py
Comment on lines +133 to +136
e for e in repro_graph["links"]
if ("rationale" in e["source"] or "rationale" in e["target"])
and e["relation"] != "rationale_for"
]
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests detect rationale nodes by checking whether the edge endpoint IDs contain the substring "rationale". This couples the regression tests to the current ID-generation scheme and can produce false negatives (if rationale IDs ever change) or false positives (if a legitimate entity ID contains "rationale"). Consider building an id→file_type map from repro_graph["nodes"] and using that to identify rationale endpoints instead.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot apply changes based on this feedback

Comment thread graphify/export.py
Comment on lines +315 to +323
# Restore original edge direction. Undirected NetworkX storage may
# canonicalize endpoint order, flipping `calls` and other directional
# edges in graph.json. The build path stashes the true endpoints in
# _src/_tgt for exactly this purpose (#563).
true_src = link.pop("_src", None)
true_tgt = link.pop("_tgt", None)
if true_src is not None and true_tgt is not None:
link["source"] = true_src
link["target"] = true_tgt
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edge-direction is restored for graph.json here, but graph.html generation still iterates for u, v, data in G.edges(data=True) and uses from=u, to=v (export.py around the to_html edge loop). On an undirected Graph that can still flip directional relations in the HTML visualization (since arrows are enabled). Consider using the preserved _src/_tgt there as well (or disabling arrows for undirected relations) so calls/rationale_for direction is consistent across exports.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot apply changes based on this feedback

@opnsrcntrbtr
Copy link
Copy Markdown
Author

@copilot apply changes based on the comments in this thread

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/test_issue_563.py
Comment on lines +132 to +136
bad = [
e for e in repro_graph["links"]
if ("rationale" in e["source"] or "rationale" in e["target"])
and e["relation"] != "rationale_for"
]
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test identifies rationale nodes by checking whether endpoint IDs contain the substring "rationale". This couples the regression to the current ID scheme and can miss/false-trigger if IDs change. Prefer deriving an id→file_type map from repro_graph["nodes"] and checking endpoints’ file_type == "rationale" instead.

Copilot uses AI. Check for mistakes.
Comment thread tests/test_issue_563.py
Comment on lines +149 to +152
rats = [e for e in repro_graph["links"] if e["relation"] == "rationale_for"]
assert rats, "expected some rationale_for edges in the repro"
bad = [e for e in rats if "rationale" not in e["source"]]
assert bad == [], (
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue here: determining rationale endpoints via "rationale" substring in the ID is brittle. Use the node metadata (file_type == "rationale") from repro_graph["nodes"] to assert rationale_for direction instead of relying on naming conventions.

Copilot uses AI. Check for mistakes.
Comment thread graphify/extract.py Outdated
Comment on lines +2615 to +2622
# Only index real classes/functions (not file nodes, not method stubs,
# and not rationale nodes whose labels are free-form docstring text — #563)
if (
label
and not label.endswith((")", ".py"))
and "_" not in label[:1]
and node.get("file_type") != "rationale"
):
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says this indexes “classes/functions”, but the filter excludes any label ending with ")". In the generic extractor, function labels are formatted as name() (see graphify/extract.py around where nodes are created), so functions will never be indexed here. Either update the comment/docstring to reflect that only class-like nodes are indexed, or normalize labels (e.g., strip trailing "()") so imported functions can be resolved as intended.

Copilot uses AI. Check for mistakes.
Comment thread tests/test_issue_563.py
Comment on lines +175 to +186
# contact_form() calls SQLiteQueue.check_idempotency() and SQLiteQueue.enqueue()
assert (
"contact_form_contact_form",
"jobq_sqlitequeue_check_idempotency",
) in calls, f"expected caller->callee edge missing. got: {calls}"

# The test function calls contact_form()
assert (
"test_customer_dedup_test_odoo_tag_missing_still_chatter",
"contact_form_contact_form",
) in calls, f"test->contact_form edge missing. got: {calls}"

Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test claims contact_form() calls both check_idempotency() and enqueue(), but it only asserts the presence of the check_idempotency edge. If the enqueue call edge stops being emitted, the test would still pass as long as the inverted edge is also absent. Add an explicit assertion that (contact_form_contact_form -> jobq_sqlitequeue_enqueue) exists to fully pin the reported regression surface from #563.

Copilot uses AI. Check for mistakes.
- tests: identify rationale nodes by file_type metadata, not id substring
  (decouples regression from current `<stem>_rationale_<line>` naming)
- tests: assert contact_form -> SQLiteQueue.enqueue calls edge directly
  (previously only inversions were guarded; a dropped enqueue edge would
  have passed silently)
- extract.py: clarify cross-file import indexing comment — the filter
  intentionally indexes only class-level entities (function/method labels
  end in "()" and are excluded by design)
to_json was patched to consult _src/_tgt before serializing edge
endpoints, but to_html still read endpoints directly from G.edges(),
so calls and rationale_for arrows in the rendered graph.html still
pointed in canonicalized (often inverted) order.

Apply the same direction restore in to_html when building vis.js
edges. Use data.get (not pop) since G.edges(data=True) yields the
live attribute dict and other exporters may run after to_html.

Adds test_to_html_preserves_calls_and_rationale_for_direction:
parses RAW_EDGES out of the rendered HTML and pins both the
caller->callee assertions from safishamsi#563 and the rationale->parent
direction. Without the fix, the test reports 11 flipped arrows
on the 3-file repro.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/test_issue_563.py
Comment on lines +34 to +36
(root / "jobq.py").write_text(textwrap.dedent('''
"""Job-queue module — provides SQLiteQueue for idempotent job tracking."""

Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Path.write_text() defaults to the platform’s locale encoding. This fixture string includes a Unicode em dash (—), so on Windows this can write non‑UTF‑8 bytes that tree-sitter (expects UTF‑8) may parse incorrectly. Specify encoding="utf-8" (and ideally newline="\n") for the repro files to keep the test deterministic across platforms.

Copilot uses AI. Check for mistakes.
Comment thread tests/test_issue_563.py
Comment on lines +54 to +57
(root / "contact_form.py").write_text(textwrap.dedent('''
"""HTTP router for the contact form endpoint."""
from jobq import SQLiteQueue

Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same portability issue here: use encoding="utf-8" on this write_text() call (the content includes Unicode characters). Otherwise the file may be written in a locale encoding and break tree-sitter parsing on Windows.

Copilot uses AI. Check for mistakes.
Comment thread tests/test_issue_563.py
Comment on lines +73 to +76
(root / "test_customer_dedup.py").write_text(textwrap.dedent('''
"""Tests for customer dedup — exercises phone normalization and Odoo tagging."""
from jobq import SQLiteQueue
from contact_form import contact_form
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as above: add encoding="utf-8" to this write_text() call to avoid locale-dependent encodings (especially since the docstring contains an em dash).

Copilot uses AI. Check for mistakes.
Comment thread tests/test_issue_563.py
)
graph_path = root / "graphify-out" / "graph.json"
assert graph_path.exists(), f"graph.json not produced. stderr: {result.stderr}"
return json.loads(graph_path.read_text())
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

graph_path.read_text() uses the default locale encoding. Since graph.json is written as UTF‑8, this can fail or misdecode on non‑UTF‑8 locales. Read with encoding="utf-8" for cross-platform reliability.

Suggested change
return json.loads(graph_path.read_text())
return json.loads(graph_path.read_text(encoding="utf-8"))

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Two extractor bugs systematically inflate god-node centrality (rationale fragments + calls direction inversion)

2 participants