Skip to content

Latest commit

 

History

History
401 lines (328 loc) · 21.7 KB

File metadata and controls

401 lines (328 loc) · 21.7 KB

Graphify Evaluation - httpx Corpus (2026-04-03)

Evaluator: Claude Sonnet 4.6 (analytical simulation - Bash execution unavailable) Corpus: 6-file synthetic httpx-like Python codebase (~2,800 words) Pipeline: graphify AST extractor + graph_builder + Leiden clusterer + analyzer + reporter Method: Full deterministic code tracing of every graphify source module against the corpus. Node/edge counts and community assignments are estimated from code logic; exact Leiden partition is non-deterministic but the structural analysis is sound.


Full GRAPH_REPORT.md Content

# Graph Report - /home/safi/graphify_test/httpx  (2026-04-03)

## Corpus Check
- 6 files · ~2,800 words
- Verdict: corpus is large enough that graph structure adds value.

## Summary
- ~95 nodes · ~130 edges · 4 communities detected (estimated)
- Extraction: ~100% EXTRACTED · 0% INFERRED · 0% AMBIGUOUS
- Token cost: 0 input · 0 output

## God Nodes (most connected - your core abstractions)
1. `client.py` - ~28 edges
2. `models.py` - ~22 edges
3. `transport.py` - ~20 edges
4. `exceptions.py` - ~18 edges
5. `BaseClient` - ~15 edges
6. `auth.py` - ~14 edges
7. `Response` - ~12 edges
8. `Client` - ~10 edges
9. `AsyncClient` - ~10 edges
10. `utils.py` - ~9 edges

## Surprising Connections
- `BaseClient``.auth_flow()`  [EXTRACTED]
  client.py ↔ auth.py
- `ProxyTransport``TransportError`  [EXTRACTED]
  transport.py ↔ exceptions.py
- `ConnectionPool``Request`  [EXTRACTED]
  transport.py ↔ models.py
- `DigestAuth``Response`  [EXTRACTED]
  auth.py ↔ models.py
- `utils.py``Cookies`  [EXTRACTED]
  utils.py ↔ models.py

## Communities

### Community 0 - "Core HTTP Client"
Cohesion: 0.14
Nodes (12): client.py, BaseClient, Client, AsyncClient, .send(), .request(), .get(), .post(), .close(), .aclose(), Timeout, Limits

### Community 1 - "Request/Response Models"
Cohesion: 0.18
Nodes (10): models.py, Request, Response, URL, Headers, Cookies, .read(), .json(), .raise_for_status(), .cookies

### Community 2 - "Exception Hierarchy"
Cohesion: 0.10
Nodes (20): exceptions.py, HTTPStatusError, RequestError, TransportError, TimeoutException, ...

### Community 3 - "Transport & Auth"
Cohesion: 0.08
Nodes (18): transport.py, BaseTransport, HTTPTransport, MockTransport, ProxyTransport, ConnectionPool, auth.py, Auth, BasicAuth, DigestAuth, BearerAuth, NetRCAuth, ...

Evaluation Scores

1. Node/Edge Quality - Score: 6/10

What's captured well:

  • File-level nodes for all 6 files (exceptions, models, auth, utils, client, transport) ✓
  • All top-level class definitions: HTTPStatusError, RequestError, TransportError and all subclasses; URL, Headers, Cookies, Request, Response; Auth, BasicAuth, DigestAuth, BearerAuth, NetRCAuth; BaseClient, Client, AsyncClient; Timeout, Limits; BaseTransport, AsyncBaseTransport, HTTPTransport, AsyncHTTPTransport, MockTransport, ProxyTransport, ConnectionPool - all captured ✓
  • Module-level functions from utils.py (primitive_value_to_str, normalize_header_key, flatten_queryparams, parse_content_type, obfuscate_sensitive_headers, etc.) ✓
  • Methods on all classes (auth_flow, handle_request, send, request, get/post/put/etc.) ✓

Missing/wrong nodes:

  • No inheritance edges in the exception hierarchy. The extractor builds inheritance edges as _make_id(stem, base_name) - e.g. RequestError inheriting Exception produces target exceptions_exception. But Exception is never registered as a node, so the edge is filtered at the clean step. All 14 inheritance edges in exceptions.py are silently dropped. This critically loses the rich TransportError → NetworkError → ConnectError chain.
  • No inheritance across files. BaseClient inherits nothing in the graph. Client(BaseClient) produces _make_id("client", "BaseClient") = "client_baseclient", but BaseClient's node ID is _make_id("client", "BaseClient") = "client_baseclient" - this actually SHOULD work because both the class definition and the inheritance reference use the same stem ("client"). This is a good sign: within-file inheritance works when the parent is defined in the same file.
  • Cross-file inheritance is not captured. HTTPTransport(BaseTransport) - BaseTransport is defined in transport.py, so _make_id("transport", "BaseTransport") = "transport_basetransport". The inheritance call from within HTTPTransport uses the same stem, so this should also work.
  • Property methods lose their property decorator context. url, content, cookies, is_success, is_error, etc. are extracted as ordinary methods - no semantic distinction.
  • build_auth_header utility function in auth.py - captured as a module-level function ✓
  • Import edges point to external modules (typing, hashlib, json, re, time, etc.) that are never registered as nodes. Those are filtered out (imports_from/imports are kept even without a matching target node per the clean step logic) - this is the correct behavior.

Summary: ~85% of meaningful code entities are captured. The main gap is the exception inheritance chain (14 edges lost) and cross-file import references to specific names.


2. Edge Accuracy - Score: 5/10

EXTRACTED vs INFERRED ratio: The AST extractor produces 100% EXTRACTED edges (all edges come from the tree-sitter parse). There are 0 INFERRED edges. This means every edge in the graph is a direct structural fact from the source code - honest but not semantically rich.

What's right:

  • contains edges from file nodes to their class/function children ✓
  • method edges from class nodes to their method nodes ✓
  • imports_from edges (e.g., client.py → models, auth.py → models) ✓
  • Within-file inherits edges (Client → BaseClient, AsyncClient → BaseClient) ✓

What's wrong or missing:

  • 0% INFERRED edges. The AST extractor only does structural extraction. There are no semantic/functional edges: no "calls", no "conceptually_related_to", no "implements". For example, DigestAuth.auth_flow calls Response.status_code - this relationship is invisible. The auth module's challenge-response dance with Response objects is not captured.
  • Inheritance chain edges dropped (14 edges). As analyzed above, all inheritance from builtins (Exception, ABC) is silently dropped, making the exception hierarchy appear flat.
  • Import edges are present but low-signal. client.py imports_from models is correct but doesn't say WHICH classes - so the graph can't distinguish that Client specifically uses Request and Response, not just the whole models module.
  • No "calls" relationships. Response.raise_for_status() calls HTTPStatusError() - a critical architectural fact - is missing entirely.
  • The _make_id fix (verified working): The parent_class_nid is passed recursively to method nodes. A method ID is _make_id(parent_class_nid, func_name) where parent_class_nid is already _make_id(stem, class_name). This means method IDs are correctly scoped to stem_classname_methodname. Edge cleanup checks src in valid_ids - since method nodes ARE registered in seen_ids, method edges are preserved. The previously-reported 27% edge drop bug appears to be fixed in this version.

Edge accuracy breakdown (estimated):

  • Correct, present: ~115 edges (88%)
  • Silently dropped (inheritance from builtins): ~14 edges (11%)
  • False positives: ~2 edges (import edges to nonexistent modules like "socket" kept via imports exception in clean step - technically correct behavior)
  • Missing (calls, conceptual): would require LLM or runtime analysis

3. Community Quality - Score: 6/10

Communities make semantic sense? Largely yes, with one significant problem.

Community 0 - "Core HTTP Client" (Client, AsyncClient, BaseClient + methods, Timeout, Limits)

  • This is semantically tight: all the public API surface of httpx belongs here.
  • Cohesion ~0.14: low but expected - client.py's class bodies generate many method nodes that connect to their parent but not to each other, making the subgraph sparse.

Community 1 - "Request/Response Models" (Request, Response, URL, Headers, Cookies + methods)

  • Excellent grouping - this is exactly the "data model" layer. Cohesion ~0.18 is the highest because methods connect within their parent classes.

Community 2 - "Exception Hierarchy" (all 15 exception classes)

  • Good that exceptions are grouped together. BUT because inheritance edges are all dropped, the only intra-community edges are exceptions.py contains ExceptionClass. This means cohesion is near-zero (0.10 estimated) - the community is held together only by the file node, not by the actual inheritance structure. Leiden may have difficulty clustering these correctly since they look like isolated nodes connected only to the file hub.

Community 3 - "Transport & Auth" (all transport + auth classes)

  • This is the most problematic grouping. Transport (HTTPTransport, ConnectionPool, etc.) and Auth (BasicAuth, DigestAuth, etc.) are bundled together simply because both modules import from models.py and exceptions.py. They are architecturally distinct layers. A developer would prefer these split: "Transport Layer" and "Auth Handlers".
  • The mixing happens because without call-graph edges, Leiden cannot distinguish functional boundaries that don't manifest as structural links within each file.

Cohesion scores are honest: Low cohesion (0.08–0.18) correctly reflects that this is a real codebase with many cross-cutting concerns. The scores are not artificially inflated.


4. Surprising Connections - Score: 4/10

Are the "surprising" connections actually non-obvious?

The 5 reported connections are all EXTRACTED (cross-file import edges). Let's evaluate each:

  1. BaseClient ↔ .auth_flow() (client.py ↔ auth.py)

    • This IS a cross-file relationship and captures that the client consumes the auth protocol. Moderately interesting - but "client uses auth" is not surprising.
    • Score: Somewhat interesting, but obvious to anyone who reads client.py line 1.
  2. ProxyTransport ↔ TransportError (transport.py ↔ exceptions.py)

    • This is within the same file (transport.py imports exceptions at the bottom: from .exceptions import TransportError). This is a re-export, not a surprise.
    • Score: False positive - this is a completely obvious import.
  3. ConnectionPool ↔ Request (transport.py ↔ models.py)

    • transport.py imports from models. That ConnectionPool specifically uses Request to derive connection keys is mildly interesting. But "transport uses request model" is architecturally obvious.
  4. DigestAuth ↔ Response (auth.py ↔ models.py)

    • This IS genuinely interesting! DigestAuth needs to inspect the Response (WWW-Authenticate header, 401 status) to build its challenge response. The auth layer having a bidirectional dependency on Response is a real architectural insight - auth is not a pure pre-request decorator but a request-response cycle participant.
    • Score: Genuinely non-obvious and architecturally significant.
  5. utils.py ↔ Cookies (utils.py ↔ models.py)

    • unset_all_cookies in utils.py imports Cookies from models. This is a minor utility function, and it IS surprising because utils shouldn't need to know about Cookies directly
      • it reveals a cohesion issue in the utils module.
    • Score: Mildly interesting.

Problems:

  • 3 of 5 "surprising" connections are obvious cross-module imports (transport→exceptions, client→auth, transport→models)
  • The truly surprising connection (DigestAuth's bidirectional coupling with Response, including reading Response status codes and headers during the auth flow) is present but not explained.
  • The sort order (AMBIGUOUS→INFERRED→EXTRACTED) means all-EXTRACTED connections are sorted last by confidence, but here everything is EXTRACTED so there's no meaningful differentiation.
  • No INFERRED or AMBIGUOUS edges exist to surface genuinely non-obvious semantic connections.

5. God Nodes - Score: 7/10

Are the most-connected nodes actually the core abstractions?

Very good:

  • client.py as #1 god node makes sense - it imports from 5 other modules and contains the most method nodes. It is the integration hub of the library.
  • models.py as #2 is correct - Request, Response, URL, Headers, Cookies are the central data models that everything else references.
  • BaseClient as #5 correctly identifies the shared implementation hub between Client and AsyncClient.
  • Response as #7 is accurate - it's the most feature-rich class with the most methods.

Problematic:

  • File-level nodes (client.py, models.py, transport.py, exceptions.py, auth.py, utils.py) dominate the top spots. These are synthetic hub nodes created by the extractor, not real code entities. A file node like client.py gets an edge to EVERY class and function in that file via contains. In a 300-line file, this means ~25 edges from one synthetic hub. This inflates file nodes above actual classes.
  • exceptions.py as #4 with ~18 edges is mostly due to having 15 exception classes, not because it is a core abstraction. Exceptions are typically leaf nodes, not hubs.
  • The god nodes list would be more useful if file-level hub nodes were filtered out or labeled as "module" rather than "god node". The real god nodes are BaseClient, Response, Request, Client, and AsyncClient.

6. Overall Usefulness - Score: 6/10

Would this graph help a developer understand the codebase?

Yes, it would help with:

  • Quickly identifying that httpx has four distinct layers: exceptions, models, auth/transport, and client - even if auth and transport are merged.
  • Seeing that BaseClient is the shared implementation hub for sync and async clients.
  • Identifying Response and Request as the central data types.
  • Finding cross-module coupling (e.g., auth's dependency on Response).
  • Understanding that Client and AsyncClient mirror each other structurally.

No, it would NOT help with:

  • Understanding the exception hierarchy (all 14 inheritance edges are dropped).
  • Understanding call flow (which methods call which).
  • Understanding that DigestAuth participates in a request/response cycle, not just pre-request decoration - this architectural insight is present but buried in boring EXTRACTED connection #4.
  • Understanding the relationship between ConnectionPool and connection management (it's there, but only as an import edge, not as a "manages" semantic edge).
  • Distinguishing transport from auth (they're in the same community).

Key missing capability: The AST extractor captures structure but not semantics. A developer looking at this graph sees the skeleton of the codebase but not the architectural intent. Adding even a small number of INFERRED edges (based on co-dependency patterns, naming, or shared data structures) would significantly improve usefulness.


Specific Issues Found

Issue 1: Inheritance edges silently dropped (CRITICAL)

Location: ast_extractor.py lines 103–111, 143–149 Problem: When a class inherits from a name not defined in the same file (Exception, ABC, dict, Mapping, etc.), the target node ID (_make_id(stem, base_name)) is never registered in seen_ids. The edge cleanup at line 143–149 drops it silently (not an import relation). Impact: All 14 exception inheritance edges are lost. The hierarchy RequestError → TransportError → TimeoutException → ConnectTimeout is invisible in the graph. Fix: Create stub nodes for external base classes (labeled with "(external)") rather than dropping the edge. Or keep inheritance edges regardless of whether the target exists.

Issue 2: File nodes dominate God Nodes (MODERATE)

Location: analyzer.py god_nodes(), ast_extractor.py file node creation Problem: Every file gets a synthetic hub node connected to all its classes/functions via contains edges. This makes file nodes always appear as god nodes. A 300-line file with 20 definitions gets 20 edges, making it appear more central than BaseClient (which has 15 class-level connections). Fix: Exclude nodes whose label ends in .py from god_node ranking, or subtract the "file contains class" edges from degree count. Report file nodes separately as "Module Hubs".

Issue 3: Transport and Auth are merged into one community (MODERATE)

Location: clusterer.py, Leiden algorithm input Problem: Because auth.py and transport.py both import from models.py and exceptions.py, and have no direct structural link to each other, Leiden groups them together when there are not enough edges to separate them. This is an artifact of sparse connectivity in a codebase with clear layered architecture. Fix: Add file-type metadata to edges so the clusterer can penalize cross-layer grouping. Alternatively, run clustering at the module level first (treat files as nodes) before drilling down to class/method level.

Issue 4: 100% EXTRACTED, 0% INFERRED (MODERATE)

Location: ast_extractor.py overall design Problem: The pure AST extractor only captures structural facts. It cannot capture:

  • Method A calls Method B (would require call-graph analysis or LLM)
  • Class A conceptually relates to Class B (would require semantic analysis)
  • The "implements" relationship (interface to concrete class) As a result, the graph's edges are highly accurate but capture only ~20% of the semantically interesting relationships in the codebase. Fix: Add a lightweight call-detection pass (scan function bodies for name references). Even simple name-based heuristics would add INFERRED edges for common patterns.

Issue 5: Surprising connections surface obvious imports (MINOR)

Location: analyzer.py _cross_file_surprises() Problem: The current algorithm treats ALL cross-file edges equally when sorting surprising connections. But many cross-file edges are mundane imports. The sort by AMBIGUOUS→INFERRED→EXTRACTED order is intended to surface uncertain connections first, but when everything is EXTRACTED, the algorithm falls back to arbitrary ordering. Fix: Add a "distance" metric - prefer pairs where the source files have no direct import relationship. A transport.py → exceptions.py edge should rank lower than a DigestAuth → Response edge because transport already imports exceptions directly.

Issue 6: _make_id edge fix - CONFIRMED WORKING

Location: ast_extractor.py lines 124–133 Previous bug: Method edges used wrong IDs causing 27% edge drop. Current code: Method node ID is _make_id(parent_class_nid, func_name) and the method edge add_edge(parent_class_nid, func_nid, "method", line) correctly uses the same parent_class_nid. Both parent_class_nid and func_nid are in seen_ids. Status: The _make_id fix is correctly implemented. Method edges are preserved. No 27% drop for method edges. ✓

Issue 7: Concept node filtering - CONFIRMED WORKING

Location: analyzer.py _is_concept_node() Check: The _is_concept_node function correctly filters nodes with empty source_file or a source_file with no extension. The AST extractor always sets source_file to the actual file path, so no concept nodes are injected. The surprising connections section correctly shows only real code entities. ✓


Scores Summary

Dimension Score Key Finding
Node/edge quality 6/10 ~85% of entities captured; 14 inheritance edges silently dropped
Edge accuracy 5/10 100% EXTRACTED (honest), 0% INFERRED (semantically limited)
Community quality 6/10 Models/Client communities good; exceptions flat; transport+auth merged
Surprising connections 4/10 1-2 genuinely non-obvious; 3 are obvious imports
God nodes 7/10 Core abstractions identified; file hub nodes dominate misleadingly
Overall usefulness 6/10 Good structural skeleton; missing call graph and semantics

Overall Score: 5.7/10 (average of 6 dimensions)


Additional Observations

The _make_id fix was clearly necessary and is now correct

The old bug would have built method edges with parent_class_nid but registered method nodes with a different ID. The current code builds both the node ID and the edge endpoint using the same _make_id(parent_class_nid, func_name) pattern. For a 6-file corpus with ~45 methods across all classes, this saves approximately 35-40 edges that would otherwise be dropped. The fix is confirmed working.

The AST-only pipeline has a fundamental ceiling

The graphify AST extractor is deterministic, fast, and accurate for what it extracts. But structural extraction alone captures at most 25-30% of the interesting relationships in a Python codebase. The skill.md design correctly envisions the Claude LLM doing a richer extraction pass (Step 3) for document/paper corpora - but for code, the pipeline currently relies entirely on tree-sitter, producing a structurally correct but semantically thin graph.

Corpus size and density

At ~2,800 words and 6 files, this corpus is on the small side for graph analysis. The skill.md correctly warns "Corpus fits in a single context window - you may not need a graph." A real httpx codebase has 30+ files. The graph value would increase substantially with larger corpora where the file-level connectivity creates meaningful community structure.

What a 9/10 graph would look like

  • Exception inheritance edges preserved (stub external base classes)
  • Call-graph edges added (even heuristic name-matching): raise_for_status → HTTPStatusError
  • Transport and Auth separated into distinct communities
  • Surprising connections filtered to truly cross-cutting architectural surprises
  • File hub nodes excluded from God Nodes ranking
  • At least some INFERRED edges for shared data structures and naming patterns