Skip to content

perf(group/http): skip source parse for graph-covered route files (#2138 Part 2)#2265

Open
henry201605 wants to merge 9 commits into
abhigyanpatwari:mainfrom
henry201605:feat/handles-route-handler-symbol
Open

perf(group/http): skip source parse for graph-covered route files (#2138 Part 2)#2265
henry201605 wants to merge 9 commits into
abhigyanpatwari:mainfrom
henry201605:feat/handles-route-handler-symbol

Conversation

@henry201605

Copy link
Copy Markdown
Contributor

What & why

Part 2 of #2138 (Part 1 = #2234, persisting Route.method, merged).

HttpRouteExtractor parses every source file with tree-sitter up front, even
when the indexed graph already holds the file's routes. Now that ingestion
emits Route nodes with a resolved handler symbol (U0–U3a in this branch),
the graph is authoritative for a fully-covered file's providers — so for
those files we can skip the source scan and the parse. This is the
measurable parse reduction #2167 was asked for and couldn't demonstrate (it
skipped only the merge-layer emission, which saved ~0 because the parse still
ran).

How

  • HttpLanguagePlugin gains routeCoverage?: 'complete' | 'partial'
    (default 'partial'). Marked 'complete' for Java/Python/PHP, whose
    provider routes are fully emitted as Route nodes.
  • extract() runs the graph provider pass first to build a coveredFiles
    set (every HANDLES_ROUTE row for the file resolved a handler symbol +
    language is 'complete'), then excludes those files from
    collectProjectDetections and both source scans.
  • Fail-open per file: any unresolved row, a 'partial' language, or a
    graph-less run leaves the file in the scan set — no route is dropped.

Consumer safety (the subtle part)

routeCoverage: 'complete' asserts provider Route-node completeness only.
But the same languages' scan() also detects consumers (RestTemplate /
WebClient / OkHttp / Feign / Spring HTTP Interface @*Exchange, Guzzle /
Http::, requests / httpx), and ingestion's FETCHES edges are JS/TS-only
— the graph can't back up server-side consumers. A provider-covered controller
that also calls out would otherwise lose its consumer contract.

So skipping is gated by a cheap, parse-free text check:
HttpLanguagePlugin.hasConsumerSignals(content) returns false only when the
raw source provably has no outbound-HTTP call this plugin detects. A file stays
covered (parse-skipped) only when this is false; a positive signal, a missing
hook, or an unreadable file keeps it in the scan set. The orchestrator names no
languages — token knowledge lives in the plugins.

Net: pure-provider controllers skip the parse (the win); controllers that also
call out are still parsed (no consumer loss).

Testing

  • New route-parse-skip integration test spies the real parseSourceSafe to
    count parses over a temp repo of Spring controllers + a mock DB:
    baseline (every file parsed), fully-covered (0 parses), mixed
    (unresolved file falls back, resolved stays skipped), provider+consumer (a
    covered controller calling RestTemplate is parsed; its consumer contract
    survives).
  • tsc --noEmit clean; group/route/http-patterns/contract/csv/schema suites
    green (955 tests); emit-persistence --check PASS (no CSV column change).

Notes

henry added 5 commits June 21, 2026 14:18
…anpatwari#2138 part 2, WIP)

Part 2 groundwork for abhigyanpatwari#2138: give the graph-assisted HTTP provider path the
handler symbol directly, so it no longer re-parses source to recover the
handler name. (The remaining parse-skip in extract() + a call-count benchmark
land in a follow-up commit.)

- ExtractedDecoratorRoute gains `handlerName`; the Spring extractor captures
  the decorated method's name (the method_declaration node is in hand).
- New `resolveRouteHandlerSymbols` (call-processor) resolves each route's
  handler to a real symbol UID, keyed by normalized route URL — Laravel
  framework routes (controller + method) and decorator routes (Spring/FastAPI)
  both reduce to `(filePath, name) -> nodeId`. Threaded through the parse phase
  onto `ParseOutput.routeHandlerSymbols`.
- routes phase stamps `Route.handlerSymbolId`; persisted end-to-end (schema +
  Route CSV row + getCopyQuery COPY columns), mirroring Part 1's `method`.
- HttpRouteExtractor: `HANDLES_ROUTE_QUERY` returns `handlerSymbolId`;
  `extractProvidersGraph` uses it as the authoritative symbol and SKIPS
  `getDetections()` for resolved rows (CONTAINS is a cheap graph lookup for the
  display name only — no tree-sitter parse). Fully backward compatible: an
  unresolved/old-index route with no `handlerSymbolId` keeps the source-scan
  fallback.
- Extracted `normalizeExtractedRoutePath` to `route-extractors/route-path.ts`
  (shared by routes phase + resolver without an import cycle).
- SCHEMA_BUMP 6->7 (ParseWorkerResult gained `handlerName`); regenerated the
  emit-persistence byte-identity baseline (route.csv header gained two columns).
- Tests: Spring pipeline asserts the Route node carries a handlerSymbolId
  resolving to the handler method; extractor fast-path test proves the handler
  resolves with zero source detections.

Refs abhigyanpatwari#2138
…handler-symbol

# Conflicts:
#	gitnexus/bench/emit-persistence/baselines.json
…higyanpatwari#2138 Part 2)

Builds on the persisted Route.handlerSymbolId (U0–U3a). When a file's
HANDLES_ROUTE rows all resolve a handler symbol AND its language plugin
declares routeCoverage: 'complete' (Java/Python/PHP), the graph is
authoritative for that file's providers, so the source scan + tree-sitter
parse can be skipped — the scan would only re-discover routes the graph
already has. This is the measurable parse reduction abhigyanpatwari#2167 could not show.

Consumer safety: routeCoverage: 'complete' asserts *provider* Route-node
completeness only. The scan() of those same languages also emits consumer
detections (RestTemplate/WebClient/OkHttp/Feign, Guzzle/Http::,
requests/httpx), and ingestion's FETCHES edges are JS/TS-only — so the
graph cannot back up server-side consumers. A provider-covered controller
that also calls out would otherwise lose its consumer contract. Guarded by
a cheap, parse-free text gate.

- types: HttpLanguagePlugin gains
    - routeCoverage?: 'complete' | 'partial' (default 'partial')
    - hasConsumerSignals?(content): false only when the raw source provably
      has no outbound-HTTP call this plugin detects (conservative).
- java/python/php: mark routeCoverage 'complete' + implement
  hasConsumerSignals with a token regex over their consumer idioms.
- http-route-extractor: run the graph provider pass first to build a
  coveredFiles set; then keep a file covered only when
  hasConsumerSignals(content) === false (read via readSafe, no parse).
  scanFiles = files not covered → drives collectProjectDetections + both
  source scans. Fail-open per file: any unresolved row, a 'partial'
  language, a positive consumer signal, a missing hook, or an unreadable
  file leaves the file in the scan set. The orchestrator names no
  languages — token knowledge stays in the plugins.

Net: pure-provider controllers skip the parse (the win); controllers that
also call out are still parsed (no consumer loss); partial-coverage
languages and graph-less runs are unchanged.

- test: route-parse-skip integration test spies the real parseSourceSafe to
  COUNT parses over a temp repo of Spring controllers with a mock DB —
  baseline (every file parsed), fully-covered (0 parses), mixed (unresolved
  file falls back, resolved stays skipped), and provider+consumer (a covered
  controller that also calls restTemplate is parsed; its consumer contract
  survives).
…mer-signal gate

abhigyanpatwari#2254 (merged) added Spring 6 HTTP Interface `@(Get|...)Exchange` /
`@HttpExchange` as a new Java *consumer* idiom. The abhigyanpatwari#2138 parse-skip
consumer-safety gate must recognize it, or a provider-covered file carrying
an `@GetExchange` could be parse-skipped and lose that consumer contract.
Add `Exchange` to JAVA_HTTP_PLUGIN.hasConsumerSignals (conservative; also
matches `restTemplate.exchange(`).
@henry201605 henry201605 requested a review from magyargergo as a code owner June 21, 2026 16:34
@vercel

vercel Bot commented Jun 21, 2026

Copy link
Copy Markdown

Someone is attempting to deploy a commit to the NexusCore Team on Vercel.

A member of the Team first needs to authorize it.

@github-actions

github-actions Bot commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

✨ PR Autofix

Found fixable formatting / unused-import issues across 22 changed lines. Comment /autofix on this PR to apply them, or run npm run lint:fix && npm run format locally.

{"schema":"gitnexus.pr-autofix/v2","state":"fixes-available","pr_number":2265,"changed_lines":22,"head_sha":"ec39085cb73e5dd81068371d0126a27ac3bf1354","run_id":"27910805150","apply_command":"/autofix"}

@github-actions

github-actions Bot commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

CI Report

All checks passed

Pipeline Status

Stage Status Details
✅ Typecheck success tsc --noEmit
✅ Tests success unit tests, 3 platforms
✅ E2E success gitnexus-web changes only

Test Results

Tests Passed Failed Skipped Duration
13299 13240 0 54 783s

✅ All 13240 tests passed

54 test(s) skipped — expand for details

Code Coverage

Tests

Metric Coverage Covered Base Delta Status
Statements 79.29% 50435/63606 79.24% 📈 +0.1 🟢 ███████████████░░░░░
Branches 66.38% 30740/46308 66.3% 📈 +0.1 🟢 █████████████░░░░░░░
Functions 85.26% 5771/6768 85.22% 📈 +0.0 🟢 █████████████████░░░
Lines 82.77% 44946/54301 82.72% 📈 +0.0 🟢 ████████████████░░░░

📋 View full run · Generated by CI

@magyargergo magyargergo left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #2265 review — perf(group/http): skip source parse for graph-covered route files (#2138 Part 2)

Automated two-engine tri-review digest. Verdict: not production-ready as written — one merge-blocking P1 data-loss class. The architecture is sound and fail-open; the blocker is a single over-claimed invariant (routeCoverage: 'complete' for Java) that the optimization trusts but ingestion does not actually satisfy.

Engines & how to read this

  • Engine A — Claude. GitNexus PR swarm (pr-facts, security-boundary complete; risk-architect ended mid-investigation, its domain fully covered by the ce lanes) + ce-code-review personas (correctness, adversarial, api-contract, testing, maintainability, performance, project-standards — all complete).
  • Engine B — Codex (gpt-5.5 / xhigh). All three legs ran: swarm Solo mode (B1, complete — HIGH risk, "not production-ready") + ce-code-review (B2, complete, verdict "Not ready", 6 findings; its correctness/adversarial/project-standards ce sub-lanes were degraded by stream disconnects, so only the main + api-contract + testing sub-lanes emitted) + native adversarial-review (B3, complete — verdict "needs-attention / No ship", 2 high findings).

This is two engines, not five independent reviews. The strongest signals below are where a Codex lane and a Claude lane independently reached the same finding. Where a lane's severity disagrees with the coordinator's own re-read of the code, this digest follows the coordinator calibration and says why.

CI: MERGEABLE / UNSTABLE — the only red is Vercel deploy-auth (team-authorization infra), the same pattern as merged #2254 / #2138-Part-1. All 28 substantive checks pass (coverage 13221/13280, macOS/Windows, CodeQL, gitleaks, typecheck/lint/format). Vercel is not a code blocker. Status is UNSTABLE because of infra, not code.

What's solid (credit where due)

  • Fail-open architecture is genuinely robust. Every degraded path (URL-key mismatch, unresolved handler, unreadable file, missing plugin, old-schema query) defaults to "run the source scan" — so the bugs below are silent under-emission on the optimized path, never crashes. The headline P1 is the one place that property is breached.
  • Security surface refuted across the board (security-boundary lane, code-tested):
    • ReDoS on all three hasConsumerSignals regexes (java/php/python): refuted — all linear, no nested quantifiers, sub-ms on 100k-char adversarial inputs.
    • Path traversal via readSafe(repoPath, f): refuted — path.resolve + path.relative + ../absolute rejection, null fail-open; f originates from graph rows, not user input.
    • Cypher injection: refuted — handlerSymbolId is never interpolated; CONTAINS_QUERY uses bound $fileId; HANDLES_ROUTE_QUERY is a constant string.
    • Unicode/bidi: clean (git diff --check clean, no bidi controls, only em-dash/arrow in prose).
  • @PatchMapping is already fixed (parse-worker.ts:1096, landed in #2078) — a stale prior-context lead, correctly retired by the facts lane.
  • Kotlin correctly left 'partial' (no routeCoverage key → never skipped → safe). Java \bwebClient\b lowercase is intentional (scan only detects the lowercase receiver). PHP and Java consumer regexes are co-extensive with their scans (adversarial lane refuted both). PHP ingestion (Laravel) is a superset of the group PHP scan → no PHP provider over-claim.
  • Schema/CSV/COPY column parity is correct (8 columns aligned across schema.ts / csv-generator.ts / lbug-adapter.ts; handlerSymbolId read only via the explicit projection, so the M0/M1 getNodeQuery exhaustiveness hazard does not apply). bench/baselines.json fingerprint change is the expected consequence of the new column. CHANGELOG correctly untouched. No any/as any introduced. RFC#909 language-naming rule respected (resolveRouteHandlerSymbols keys on controllerName/handlerName, not language names).
  • Performance: net positive, no regressions (all perf findings NEGLIGIBLE/N-A; the per-row CONTAINS_QUERY is pre-existing, not worsened).

P1 (MERGE-BLOCKING) — Java routeCoverage: 'complete' over-claims; parse-skip silently drops provider routes

File: gitnexus/src/core/group/extractors/http-patterns/java.ts:625 (routeCoverage: 'complete', — added in this diff)
Cross-engine: STRONG. Codex swarm (B1) Finding 1 + Codex ce (B2) findings #1 & #2 + Claude ce-adversarial F1/F2 (probe-confirmed) + Claude ce-correctness (verb-loss). Coordinator independently verified the array-form code asymmetry by reading spring.ts and java.ts. [code-read] (worktrees can't build; adversarial lane ran real tree-sitter probes on both producers).

Root cause. routeCoverage: 'complete' asserts that ingestion emits a graph Route node for every provider the group scan() detects. For Java this is false: the graph provider set is a strict subset of the group-scan provider set, and this PR removes the source scan that used to recover the difference. A file is parse-skipped iff (1) its plugin declares 'complete', (2) every HANDLES_ROUTE row for it resolved a handlerSymbolId, and (3) hasConsumerSignals returns false. When a file has one simple resolvable mapping plus a group-only route shape, condition (2) is satisfied by the simple route alone → file covered → skipped → the group-only routes vanish.

Three sub-cases, same root cause:

  1. Array-form @GetMapping({"/a","/b"}) (confidence 100, probe-confirmed). Ingestion's ROUTE_ANNOTATION_QUERY (spring.ts:51,59,64,72) matches only (string_literal) @value — it has no element_value_array_initializer branch. The group query (java.ts:124) matches [(string_literal) @value (element_value_array_initializer (string_literal) @value)]. Existing passing group test: http-route-extractor.test.ts:1850. → /a, /b get no graph Route node; a co-located simple route covers the file; file skipped; /a,/b are never emitted.
  2. Interface-inherited Spring routes (confidence 90). The group scanSpringInheritanceProject (spring-consumer-shared.ts:197-268) composes interface @RequestMappings onto implementing controllers; ingestion spring.ts only walks class-direct annotations. collectProjectDetections runs scanProject over scanFiles minus the covered controller, so the inheritance pass never receives the controller and can't emit the inherited provider. Existing passing group test: http-route-extractor.test.ts:863.
  3. Same-URL multi-verb (GET + POST /orders) (confidence 100, Codex ce #2 + Codex adversarial B3 finding 1). The Route node id is generateId('Route', routeURL) — keyed by URL only; method is a property. GET+POST collapse to one node / one HANDLES_ROUTE row. The file reads "all resolved" on that single row → covered → skipped. Pre-PR the source scan emitted both verbs and the merge recovered the dropped one; with the scan skipped, the non-surviving verb's provider contract is lost. (Not Java-specific — it's the URL-keyed Route identity, but Java is the live trigger.) Both Codex legs (ce + adversarial) independently surfaced this one.

Concrete trigger (sub-case 1, cleanest). A pure-provider @RestController with @GetMapping("/covered") + @GetMapping({"/a","/b"}) and no consumer (RestTemplate/WebClient) calls: /covered resolves a handler → file all-resolved + Java 'complete' → covered; hasConsumerSignals false → file removed from scanFiles/a,/b never emitted.

Fix (both engines converge). Keep Java at routeCoverage: 'partial' until ingestion provider extraction matches the group scan (array-form branch in ROUTE_ANNOTATION_QUERY + interface inheritance + per-verb Route identity), OR run scanProject over the FULL file set and only suppress per-file provider emission for covered files (so cross-file inheritance metadata is retained). Add regression tests for the array-form, interface-inherited, and multi-verb skip cases.


P2 — handlerSymbolId mis-attribution to the wrong handler (fail-open keeps it from being data loss)

File: gitnexus/src/core/ingestion/call-processor.ts:303 (lookupExactAll(...)[0]) and :294; stamp at gitnexus/src/core/ingestion/pipeline-phases/routes.ts:288.
Cross-engine: STRONG. Codex swarm (B1) Finding 2 + Codex ce (B2) #6 + Claude ce-correctness F1/F2 + Claude ce-adversarial F4. [code-read]. Coordinator-confirmed.

Two distinct mechanisms produce a wrong-but-plausible handler UID:

  • First-writer asymmetry. The routes phase addRoute (routes.ts:241-247) is first-writer-wins per URL: the first route registered for a URL wins the Route node, later same-URL routes are counted as duplicates. resolveRouteHandlerSymbols.put (call-processor.ts:276-307) is also first-writer per URL but skips unresolvable URLs and iterates in an independent order (Laravel routes before decorator routes). On a same-URL collision where the node-winning route is unresolvable (e.g. ambiguous controller short-name) but a later same-URL route resolves, routeHandlerSymbols[url] holds the later route's handler, which is then stamped onto the winning route's node. Also reachable cross-source (a filesystem/Next route, registered first, winning a URL a framework route also normalizes to — the resolver never sees filesystem routes).
  • lookupExactAll(file, name)[0] takes the first def of a same-named symbol → an overloaded/same-named handler resolves to an arbitrary (possibly wrong) UID. spring.ts captures the exact decorated node but resolution discards that identity and re-looks-up by name only.

Effect. A wrong-but-plausible handlerSymbolId flows into the emitted contract and impact analysis points at the wrong handler. Fail-open (URL-key mismatch → no stamp) prevents outright route loss, so this is P2, not P1.

Fix. Have put reserve the URL slot even when unresolved (mirror addRoute's first-writer-wins exactly), and key on the winning routes-phase entry identity; when lookupExactAll returns >1, disambiguate by the captured handler node's line/position or skip resolution (fail-open) rather than guessing [0].

P2-LATENT (dormant) — Python hasConsumerSignals is not a superset of Python scan()

File: gitnexus/src/core/group/extractors/http-patterns/python.ts:929.
Cross-engine: Codex swarm F3 (rated P3) + Codex ce #3 (rated P1) + Claude ce-adversarial F3 (P2, flags dormancy) + facts lane. [code-read]. Codex rated this P1 but both Codex legs missed that it is currently unreachable — coordinator calibrates to P2-latent.

hasConsumerSignals (/\brequests\s*\.|\bhttpx\b/) only matches requests/httpx, but scan() also emits python-http-wrapper consumers for uri=/url= keyword/variable calls (python.ts:1095-1150) on any object. Per the types.ts:111-116 contract, a 'complete' plugin's hasConsumerSignals must cover every consumer shape scan() emits — it does not.

Why dormant (and why not a blocker): for a Python file to be skipped it must first resolve a handlerSymbolId, but FastAPI decorator routes set no handlerName (generic worker path, parse-worker.ts:1450-1463) and Django sets methodName: null (django.ts:181). So no Python file is parse-skipped today — the Python optimization is effectively a no-op now. The gap arms the moment a follow-up gives FastAPI routes a handlerName (a natural Part-3 change). Fix now (broaden the regex, or derive it from the scan's identifier set) or document the precondition. (Same class, pre-existing & not this PR: the Python consumer regex also misses aiohttp/urllib.)


Lower-priority / release notes

  • Existing indexes need a re-analyze (NOT merge-blocking, NOT #2265-unique). The new Route.handlerSymbolId column + 8-col COPY are applied via CREATE NODE TABLE, which swallows "already exists", so an existing .lbug DB keeps its old 7-col Route table. The read path is fail-open: HANDLES_ROUTE_QUERY against the old schema returns NULL → ?? '' → all files "not resolved" → full scan; only graph-only providers (Django) transiently degrade until re-analyze. However, a re-analyze that re-COPYs into the old table will hit a column-not-found COPY error (api-contract F1) — users must clean + analyze (or --force) to pick up handlerSymbolId.
    • Both Codex legs (ce #5 + adversarial B3 finding 2) + Claude ce-api-contract rate the missing INCREMENTAL_SCHEMA_VERSION bump (repo-manager.ts:248, still 4) as high/P1; this digest demotes it — but it deserves the maintainer's explicit attention. The premise is verified: an upgraded repo stays incremental-eligible against the old Route table, and an incremental writeback that re-COPYs Route could hit the missing column. The reason it is not treated as a #2265-introduced blocker: #2234 — which added Route.method one Part earlier via the identical schema-only pattern — also did not bump it and shipped two days ago, so this is a pre-existing pattern spanning both Parts, and the lbug DB is a derived cache (the project's model is "re-analyze on schema change"). Because I can't reproduce the incremental-COPY failure in a non-building worktree, I won't post it as a maintainer-signed assertion. Recommended action: confirm the incremental Route writeback is actually safe across a column addition; if not, bump INCREMENTAL_SCHEMA_VERSION to cover both new Route columns (method + handlerSymbolId) and add an old-schema upgrade regression test.
  • Maintainability (minor, P2/P3): extractProvidersGraph has dual responsibility (returns contracts and mutates a coveredFiles out-param Set) with a duplicated CONTAINS_QUERY across fast/legacy paths; the routeCoverage:'complete'hasConsumerSignals invariant is unenforced in the types (a silent data-loss trap for future plugin authors — could be a discriminated union); dead alias const pathNorm = pathNormEarly (~http-route-extractor.ts:480); positional row[N]/hit[N] fallbacks on Record<string,unknown>.
  • Test gaps (none would catch the P1):
    • resolveRouteHandlerSymbols (new exported fn, ~5 branches) has zero direct unit tests.
    • No handlerSymbolId CSV→COPY→query roundtrip test (Route.method has route-method-roundtrip.test.ts; this column has no analogue).
    • Consumer-safety gate fail-open branches (null content, missing plugin, absent hasConsumerSignals) untested.
    • No regression test for any of the three P1 data-loss cases (array-form / interface-inherited / multi-verb parse-skip).
    • Positive note: http-route-graph-method.test.ts is a real test, not a worker-dependent silent no-op (confirmed by ce-testing + project-standards).

Validated / refuted (verification credit)

Claim Outcome Lane
ReDoS on the 3 hasConsumerSignals regexes Refuted (linear, sub-ms on 100k inputs) security-boundary
Path traversal via readSafe Refuted (resolve/relative/.. guard, null fail-open) security-boundary
Cypher injection via handlerSymbolId Refuted (never interpolated; $fileId bound) security-boundary
Unicode / bidi controls Clean security-boundary
@PatchMapping dropped Already fixed (#2078, parse-worker.ts:1096) facts
PHP / Java consumer regexes miss a scan shape Refuted (co-extensive) ce-adversarial
PHP provider over-claim Refuted (Laravel ingestion ⊇ group scan) ce-adversarial
Kotlin skip path Refuted (defaults 'partial', never skipped) ce-adversarial
Old/mixed DB schema corruption Fail-open (try/catch → [] → full scan) ce-adversarial, api-contract
bench/baselines.json fingerprint change Expected (new CSV column) performance
CHANGELOG touched / any introduced / RFC#909 broken None project-standards

This is an automated two-engine digest (Claude: GitNexus swarm + ce-code-review; Codex gpt-5.5/xhigh: swarm + ce-code-review + adversarial-review). Verify the P1 fix direction and the inline anchors against the diff before acting; the recommendation is to hold merge until Java is either downgraded to 'partial' or ingestion is made provider-complete.

// Spring @(Get|Post|...)Mapping providers are emitted as Route nodes by the
// ingestion routes phase (#2078), so the graph is authoritative for Java
// providers — see HttpRouteExtractor's parse-skip (#2138 Part 2).
routeCoverage: 'complete',

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1 — merge-blocking] routeCoverage: 'complete' over-claims for Java → parse-skip silently drops provider routes.

This asserts ingestion emits a Route node for every provider the group scan() detects, but the graph set is a strict subset. Three skip-induced data-loss cases:

  1. Array-form @GetMapping({"/a","/b"}) — the group query (java.ts:124) matches element_value_array_initializer; ingestion's spring.ts route queries (:51,59,64,72) match (string_literal) only.
  2. Interface-inherited Spring routes — scanSpringInheritanceProject (spring-consumer-shared.ts:197-268) composes them, but ingestion only walks class-direct annotations.
  3. Same-URL multi-verb (GET+POST /orders) — the Route node is keyed by URL only, so the verbs collapse to one row.

Trigger: a pure-provider @RestController with one simple resolvable @GetMapping("/x") plus any of the above and no consumer calls — the file is marked covered (all rows resolved) and removed from the scan set, so the group-only routes vanish. Existing passing group tests prove these shapes are expected (http-route-extractor.test.ts:1850 array, :863 inherited).

Fix: keep Java routeCoverage: 'partial' until ingestion matches the group scan (array-form + interface inheritance + per-verb Route identity), or run scanProject over the full file set and only suppress per-file provider emission for covered files. Add regression tests for all three skip cases.

Cross-engine: Codex swarm + Codex ce + Codex adversarial + Claude ce-adversarial/correctness. Coordinator-verified the array-form code asymmetry directly. [code-read]

// the route's own file.
for (const dr of decoratorRoutes) {
if (!dr.handlerName) continue;
const handlerId = model.symbols.lookupExactAll(dr.filePath, dr.handlerName)[0]?.nodeId;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] handlerSymbolId can resolve to the wrong handler.

lookupExactAll(dr.filePath, dr.handlerName)[0] (and the Laravel path at :294) takes the first def of a same-named symbol — an overloaded/same-named handler resolves to an arbitrary, possibly wrong UID, even though spring.ts captured the exact decorated node and then discarded that identity.

Compounding it: resolveRouteHandlerSymbols.put is first-writer-per-URL but skips unresolvable URLs, while the routes phase addRoute is first-writer-wins for the Route node. On a same-URL collision where the node-winning route is unresolvable but a later same-URL route resolves, the winning node (routes.ts:288) gets stamped with the other route's handler.

Fail-open prevents outright route loss, so this is P2 — but impact analysis / contract handlers then point at the wrong symbol. Fix: disambiguate by the captured handler node's line/position (or skip to fail open) instead of [0].

Cross-engine: Codex swarm + Codex ce + Claude ce-correctness/adversarial. [code-read]

@magyargergo

Copy link
Copy Markdown
Collaborator

@henry201605 what do you think about this #2269?

@henry201605

Copy link
Copy Markdown
Contributor Author

Thanks for the detailed review — the P1 is correct. I confirmed all three sub-cases against the code:

  • array-form: spring.ts route queries match (string_literal) only; the group query (java.ts:124) also matches element_value_array_initializer.
  • interface-inherited: scanSpringInheritanceProject composes them; ingestion walks class-direct annotations only.
  • same-URL multi-verb: the Route node is URL-keyed, so addRoute drops the second verb as a duplicate.

So routeCoverage: 'complete' for Java over-claims — agreed it's merge-blocking. Plan for this PR:

  • Java → 'partial' (your first option). Removes the data-loss path; Java is always source-scanned until ingestion matches the group scan.
  • Python → 'partial' as well. It's a no-op today anyway (FastAPI/Django set no handlerName, so no Python file is ever skipped), and leaving it 'complete' is a latent trap the moment a handlerName lands. I'll also widen the Python hasConsumerSignals regex to a true superset of scan() (the uri=/url= wrapper, aiohttp, urllib) so the precondition holds when it does get enabled.
  • PHP stays 'complete' — your adversarial lane confirmed Laravel ingestion ⊇ the group scan, and it's the one language the skip actually engages for. If you'd rather be conservative about the URL-keyed multi-verb identity for PHP too, I'll downgrade it — your call.
  • P2: disambiguate handlerSymbolId by the captured node's position instead of [0], and make put reserve the URL slot like addRoute so a later same-URL route can't stamp the winning node; fail open when it can't disambiguate.
  • Regression tests for the array-form / interface-inherited / multi-verb skip cases, plus direct tests for resolveRouteHandlerSymbols.

Net: the PR keeps the foundation (resolved handler symbols persisted on Route nodes + the parse-skip mechanism) and the PHP win, with no data loss. I'd like to do the ingestion-completeness work as a separate follow-up rather than grow this PR — array-form branch in ROUTE_ANNOTATION_QUERY, interface-inherited provider emission, and per-verb Route identity.

On #2269 — the cross-repo trace stitching is where the handler-symbol resolution in this PR pays off concretely. Today HTTP provider contracts carry symbolUid: '', so the bridge anchors by file; the graph fast-path in this PR stamps the ingestion-resolved handler UID onto a covered provider's symbolUid, so the bridge can anchor on the symbol instead of falling back to the file. That benefit is independent of the parse-skip — it lives in the graph provider path, so downgrading Java to 'partial' keeps it.

The follow-up (ingestion provider-completeness) reads naturally as "give the currently-empty cases a real symbolUid": array-form and interface-inherited routes get a Route node at all, and per-verb Route identity gives the second verb of a same-URL pair its own resolved provider instead of an empty-uid source-scan one. None of that needs a change on the trace side — contractId already carries the method — it just widens how often your bridge anchors on a symbol rather than a file. If you'd rather I sequence that follow-up to land before leaning on it in trace, happy to.

@magyargergo

magyargergo commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator

Please continue working on this one 🙏 I only shared my work with you because I thought you might be interested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants