Skip to content

Add AST-based route harvesting for Node.js variable paths#2512

Open
Vedanshu7 wants to merge 1 commit into
open-telemetry:mainfrom
Vedanshu7:nodejs-ast-route-harvesting
Open

Add AST-based route harvesting for Node.js variable paths#2512
Vedanshu7 wants to merge 1 commit into
open-telemetry:mainfrom
Vedanshu7:nodejs-ast-route-harvesting

Conversation

@Vedanshu7

Copy link
Copy Markdown

Summary

The existing regex-based route harvester only matched hardcoded string paths like app.get('/users', handler). It missed routes where the path came from a variable, a template literal, or string concatenation.

This PR adds an AST-based fallback pass using Goja's parser that handles those cases:

  • variable: const p = '/users'; app.get(p, handler)
  • template literal: app.get(/api/${version}/users, handler)
  • string concatenation: app.get('/api' + '/users', handler)
  • object property access and destructuring
  • loop variables from for...of
  • cross-file paths via require()

Link to tracking issue

Fixes #929

Testing

Added unit tests in pkg/internal/transform/route/harvest/js_ast_test.go covering variable paths, template literals, string concatenation, object property access, destructuring, loop variables, and cross-file require(). All tests pass with go test ./pkg/internal/transform/route/harvest/....

Authorship

  • I, a human, wrote this pull request description myself.

@Vedanshu7 Vedanshu7 requested a review from a team as a code owner June 26, 2026 01:36

@mariomac mariomac left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review

Solid feature and good test coverage. I left detailed inline comments anchored to the specific lines — a quick map of what they cover:

Correctness

  • Interpolated template literals → duplicate/conflicting routes (the regex pass already matches backtick strings). Highest priority, and currently untested.
  • Possible slice-aliasing mutation in the ConditionalExpression branch of stringValues.
  • Flat global symbol table (no scoping) → false positives / dropped routes.
  • routeMethod does not constrain the receiver object.

Dead / over-engineered

  • visited/inner recursion plumbing in requireExports is never used (cross-file is one level deep).
  • Hand-rolled 177-line AST walker — check whether goja exposes ast.Walk.

Performance

  • Each file is read and parsed twice.
  • .ts/.tsx files are fully read and parse-attempted for nothing.
  • Unbounded cartesian-product expansion with no cap.

Dependency

  • New heavyweight dep (dop251/goja) pinned to an untagged pseudo-version; unrelated Masterminds/semver bump in go.sum.

The single most impactful fix is the interpolated-template duplication.


🤖 Posted by Claude (via Claude Code) on behalf of @mariomac.

@mariomac mariomac left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline observations below. The most impactful is the interpolated-template duplication on js_ast.go — it's a user-visible correctness regression and currently untested.

if _, isLit := first.(*ast.StringLiteral); isLit {
return
}
if tmpl, isTmpl := first.(*ast.TemplateLiteral); isTmpl && len(tmpl.Expressions) == 0 {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interpolated template literals produce duplicate/conflicting routes (highest priority). The existing Typical regex (js.go:116) includes backtick (\x60) in its char class, so it already matches app.get(/api/${ver}/users) and emits the raw /api/${ver}/users, which CleanupRegexPath turns into /api/:id/users. This AST pass emits /api/v1/users for the same call — both survive dedup, so one registration yields two routes. Same with ${req.params.id} → regex :id vs AST :param. This guard skips plain literals and non-interpolated templates, but not the interpolated case, which is exactly the new one. The tests don't catch it because assertHasRoute only checks presence; consider adding a "no unexpected routes" assertion.

return concatProduct(stringValues(v.Left, vars), stringValues(v.Right, vars))
}
case *ast.ConditionalExpression:
return append(stringValues(v.Consequent, vars), stringValues(v.Alternate, vars)...)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible slice-aliasing mutation. When the consequent is an identifier, stringValues returns vars[name] directly (see the *ast.Identifier case), so append(...) here can mutate the map's backing array when there's spare capacity, corrupting that variable's stored values for later lookups. Safer to allocate a fresh slice before appending both branches.

// Repeated until stable to resolve chained dependencies.
// strVars stores both simple names ("x" → values) and flattened object
// property access ("obj.key" → values) for dot-notation and destructuring.
strVars := map[string][]string{}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No variable scoping — flat global symbol table. strVars is keyed by bare name across the whole file with first-wins semantics. Two functions each declaring const p = '/a' / '/b' silently drop one route, and an unrelated p can resolve to the wrong path. Probably acceptable for a best-effort harvester, but worth acknowledging as a known false-positive/missed-route source.


// routeMethod reports whether callee is a method call like `app.get` /
// `router.post` and, if so, returns the corresponding HTTP method.
func routeMethod(callee ast.Expression) (string, bool) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

routeMethod doesn't constrain the receiver object. Any X.get/.post/.delete/.head/.options/.all(stringVar, …) whose variable resolves to a string starting with / becomes a route — e.g. cache.get(...), map.get(...). The regex pass shares the shape but is bounded to literals physically present; variable resolution broadens the false-positive surface. The HasPrefix(path, "/") check is the only filter.

}

// Extend visited to prevent circular requires in deeper recursion.
inner := make(map[string]bool, len(visited)+1)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dead recursion machinery. This inner copy of visited is never used — requireExports only walks for module.exports/exports.x and does no nested require resolution, so cross-file is strictly one level deep and the circular-require protection is non-functional. Either complete the recursion or drop the inner/visited plumbing.

if !strings.HasPrefix(reqPath, "./") && !strings.HasPrefix(reqPath, "../") {
return nil
}
base := filepath.Join(dir, reqPath)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: requireExports joins arbitrary .//../ paths. openJSFileForScan bounds it (regular file, ≤10MB, O_NOFOLLOW), so traversal risk is low, but the bare extension-less base candidate can read non-JS files before the parse fails.

// walk performs a depth-first traversal of the AST rooted at n, invoking fn for
// every node visited (including n itself).
func walk(n ast.Node, fn func(ast.Node)) {
if n == nil {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hand-rolled AST walker (177 lines). Does goja's ast package already expose ast.Walk/Visitor? If so this is redundant and a maintenance hazard — any node type the manual children switch misses silently drops a subtree (class bodies, spread, optional chaining, etc. are not handled).


// Fallback pass: resolve routes whose path comes from a variable or a
// template literal, which the line-based regex harvesters cannot match.
src, ok, err := readJSFileForScan(filePath)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every file is read and parsed twice. The regex pass above streams line-by-line; this AST pass then reads the whole file into a string (up to MaxJSFileScanBytes = 10MB) and full-parses it, without reusing the content already read.

return err
}
if ok {
e.routes = append(e.routes, e.resolveASTRoutes(filePath, src)...)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TypeScript files pay full cost for nothing. WalkJSFiles includes .ts/.tsx, goja can't parse TS, so every TS file is read fully + parse-attempted + discarded (resolveASTRoutes returns nil on parse error). Consider gating the AST pass by extension or syntax.

Comment thread go.mod
github.com/caarlos0/env/v11 v11.4.1
github.com/cilium/ebpf v0.20.0
github.com/containers/common v0.64.2
github.com/dop251/goja v0.0.0-20260618133527-c9b2ea77db59

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New heavyweight dependency. dop251/goja (a full JS interpreter, plus transitive go-sourcemap/sourcemap) is pulled in only for its parser/ast, and pinned to an untagged pseudo-version. Worth a sentence justifying the weight vs. a lighter JS parser. Separately, go.sum bumps Masterminds/semver/v3 3.4.0 → 3.5.0 — looks like an unrelated go mod tidy side effect; confirm it's intended here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is a good point, if implementing this means we have to bring in a full JS interpreter, perhaps we should wait until a customer reports an issue with our existing route parsing. So far there hasn't been any requests.

@mariomac

Copy link
Copy Markdown
Contributor

Thank you a lot for your contribution @Vedanshu7 ! I've asked Claude for a first review and it placed some inline comments.

As to be expected with AI, many observations might lack context and be incorrect. Feel free to not address all of them but please then reply with a justification to each item you won't address.

@codecov

codecov Bot commented Jun 26, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 81.70974% with 92 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.48%. Comparing base (5f99818) to head (22ed2b7).
⚠️ Report is 29 commits behind head on main.

Files with missing lines Patch % Lines
pkg/internal/transform/route/harvest/js_ast.go 80.16% 57 Missing and 14 partials ⚠️
...kg/internal/transform/route/harvest/js_ast_walk.go 86.95% 17 Missing and 1 partial ⚠️
pkg/internal/transform/route/harvest/js.go 57.14% 2 Missing and 1 partial ⚠️

❗ There is a different number of reports uploaded between BASE (5f99818) and HEAD (22ed2b7). Click for more details.

HEAD has 28 uploads less than BASE
Flag BASE (5f99818) HEAD (22ed2b7)
unittests 4 3
integration-test-vm-rhel8.10 2 0
integration-test-vm-5.15-lts 4 0
integration-test-vm-6.1-lts 2 0
integration-test-vm-rhel9.6 2 0
integration-test-vm-rhel8.9 2 0
integration-test-vm-6.18-lts 4 0
integration-test-vm-bpf 2 0
integration-test-vm-6.12-lts 2 0
integration-test-vm-6.6-lts 2 0
integration-test-vm-5.10-lts 2 0
integration-test 10 9
integration-test-vm-bpf-next 2 0
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2512      +/-   ##
==========================================
- Coverage   69.33%   63.48%   -5.85%     
==========================================
  Files         340      338       -2     
  Lines       45778    44876     -902     
==========================================
- Hits        31738    28488    -3250     
- Misses      12057    14355    +2298     
- Partials     1983     2033      +50     
Flag Coverage Δ
integration-test 50.46% <57.14%> (-1.34%) ⬇️
integration-test-arm 26.62% <43.93%> (-0.67%) ⬇️
integration-test-vm-5.10-lts ?
integration-test-vm-5.15-lts ?
integration-test-vm-6.1-lts ?
integration-test-vm-6.12-lts ?
integration-test-vm-6.18-lts ?
integration-test-vm-6.6-lts ?
integration-test-vm-bpf ?
integration-test-vm-bpf-next ?
integration-test-vm-rhel8.10 ?
integration-test-vm-rhel8.9 ?
integration-test-vm-rhel9.6 ?
k8s-integration-test 35.33% <0.00%> (-1.10%) ⬇️
oats-test 34.66% <0.00%> (-0.95%) ⬇️
unittests 44.25% <87.11%> (-18.74%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@rafaelroquetto

Copy link
Copy Markdown
Contributor

@mariomac review covered the technical issues, I want to step back to the approach itself.

Before going deeper on the details: is this the best approach, and what alternatives did you consider? IMHO #929 is really variables + interpolation, e.g. a ${x} -> :x rewrite on the backtick paths the scanner already matches, and variables a small symbol table on top of the current line scan. Neither needs an AST or a new dependency.

  • why goja + a hand-rolled walker over extending the existing scanner?
  • were lighter options weighed, and what ruled them out?

I ask because whatever lands here becomes ours to maintain, and a generic AST visitor with reflect-based nil checks plus a fixpoint resolver is a lot to keep correct over time for what it adds to a best-effort heuristic. If it genuinely needs to be like this, than so be it, but I'd appreciate if you could expand on the reasins, otherwise perhaps a smaller, scanner-based version could suit us better here.

@grcevski

Copy link
Copy Markdown
Contributor

Given the complexity of this approach, I think we need to close this PR and figure out another approach. I'm not only concerned that we now have full blown JS interpreter as a dependency, but also about how much memory and CPU will it use while parsing some sort of minified JavaScript file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Extend nodejs route harvesting with variable matching

4 participants