Add AST-based route harvesting for Node.js variable paths#2512
Add AST-based route harvesting for Node.js variable paths#2512Vedanshu7 wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Review
Solid feature and good test coverage. I left detailed inline comments anchored to the specific lines — a quick map of what they cover:
Correctness
- Interpolated template literals → duplicate/conflicting routes (the regex pass already matches backtick strings). Highest priority, and currently untested.
- Possible slice-aliasing mutation in the
ConditionalExpressionbranch ofstringValues. - Flat global symbol table (no scoping) → false positives / dropped routes.
routeMethoddoes not constrain the receiver object.
Dead / over-engineered
visited/innerrecursion plumbing inrequireExportsis never used (cross-file is one level deep).- Hand-rolled 177-line AST walker — check whether goja exposes
ast.Walk.
Performance
- Each file is read and parsed twice.
.ts/.tsxfiles are fully read and parse-attempted for nothing.- Unbounded cartesian-product expansion with no cap.
Dependency
- New heavyweight dep (
dop251/goja) pinned to an untagged pseudo-version; unrelatedMasterminds/semverbump ingo.sum.
The single most impactful fix is the interpolated-template duplication.
🤖 Posted by Claude (via Claude Code) on behalf of @mariomac.
mariomac
left a comment
There was a problem hiding this comment.
Inline observations below. The most impactful is the interpolated-template duplication on js_ast.go — it's a user-visible correctness regression and currently untested.
| if _, isLit := first.(*ast.StringLiteral); isLit { | ||
| return | ||
| } | ||
| if tmpl, isTmpl := first.(*ast.TemplateLiteral); isTmpl && len(tmpl.Expressions) == 0 { |
There was a problem hiding this comment.
Interpolated template literals produce duplicate/conflicting routes (highest priority). The existing Typical regex (js.go:116) includes backtick (\x60) in its char class, so it already matches app.get(/api/${ver}/users) and emits the raw /api/${ver}/users, which CleanupRegexPath turns into /api/:id/users. This AST pass emits /api/v1/users for the same call — both survive dedup, so one registration yields two routes. Same with ${req.params.id} → regex :id vs AST :param. This guard skips plain literals and non-interpolated templates, but not the interpolated case, which is exactly the new one. The tests don't catch it because assertHasRoute only checks presence; consider adding a "no unexpected routes" assertion.
| return concatProduct(stringValues(v.Left, vars), stringValues(v.Right, vars)) | ||
| } | ||
| case *ast.ConditionalExpression: | ||
| return append(stringValues(v.Consequent, vars), stringValues(v.Alternate, vars)...) |
There was a problem hiding this comment.
Possible slice-aliasing mutation. When the consequent is an identifier, stringValues returns vars[name] directly (see the *ast.Identifier case), so append(...) here can mutate the map's backing array when there's spare capacity, corrupting that variable's stored values for later lookups. Safer to allocate a fresh slice before appending both branches.
| // Repeated until stable to resolve chained dependencies. | ||
| // strVars stores both simple names ("x" → values) and flattened object | ||
| // property access ("obj.key" → values) for dot-notation and destructuring. | ||
| strVars := map[string][]string{} |
There was a problem hiding this comment.
No variable scoping — flat global symbol table. strVars is keyed by bare name across the whole file with first-wins semantics. Two functions each declaring const p = '/a' / '/b' silently drop one route, and an unrelated p can resolve to the wrong path. Probably acceptable for a best-effort harvester, but worth acknowledging as a known false-positive/missed-route source.
|
|
||
| // routeMethod reports whether callee is a method call like `app.get` / | ||
| // `router.post` and, if so, returns the corresponding HTTP method. | ||
| func routeMethod(callee ast.Expression) (string, bool) { |
There was a problem hiding this comment.
routeMethod doesn't constrain the receiver object. Any X.get/.post/.delete/.head/.options/.all(stringVar, …) whose variable resolves to a string starting with / becomes a route — e.g. cache.get(...), map.get(...). The regex pass shares the shape but is bounded to literals physically present; variable resolution broadens the false-positive surface. The HasPrefix(path, "/") check is the only filter.
| } | ||
|
|
||
| // Extend visited to prevent circular requires in deeper recursion. | ||
| inner := make(map[string]bool, len(visited)+1) |
There was a problem hiding this comment.
Dead recursion machinery. This inner copy of visited is never used — requireExports only walks for module.exports/exports.x and does no nested require resolution, so cross-file is strictly one level deep and the circular-require protection is non-functional. Either complete the recursion or drop the inner/visited plumbing.
| if !strings.HasPrefix(reqPath, "./") && !strings.HasPrefix(reqPath, "../") { | ||
| return nil | ||
| } | ||
| base := filepath.Join(dir, reqPath) |
There was a problem hiding this comment.
Minor: requireExports joins arbitrary .//../ paths. openJSFileForScan bounds it (regular file, ≤10MB, O_NOFOLLOW), so traversal risk is low, but the bare extension-less base candidate can read non-JS files before the parse fails.
| // walk performs a depth-first traversal of the AST rooted at n, invoking fn for | ||
| // every node visited (including n itself). | ||
| func walk(n ast.Node, fn func(ast.Node)) { | ||
| if n == nil { |
There was a problem hiding this comment.
Hand-rolled AST walker (177 lines). Does goja's ast package already expose ast.Walk/Visitor? If so this is redundant and a maintenance hazard — any node type the manual children switch misses silently drops a subtree (class bodies, spread, optional chaining, etc. are not handled).
|
|
||
| // Fallback pass: resolve routes whose path comes from a variable or a | ||
| // template literal, which the line-based regex harvesters cannot match. | ||
| src, ok, err := readJSFileForScan(filePath) |
There was a problem hiding this comment.
Every file is read and parsed twice. The regex pass above streams line-by-line; this AST pass then reads the whole file into a string (up to MaxJSFileScanBytes = 10MB) and full-parses it, without reusing the content already read.
| return err | ||
| } | ||
| if ok { | ||
| e.routes = append(e.routes, e.resolveASTRoutes(filePath, src)...) |
There was a problem hiding this comment.
TypeScript files pay full cost for nothing. WalkJSFiles includes .ts/.tsx, goja can't parse TS, so every TS file is read fully + parse-attempted + discarded (resolveASTRoutes returns nil on parse error). Consider gating the AST pass by extension or syntax.
| github.com/caarlos0/env/v11 v11.4.1 | ||
| github.com/cilium/ebpf v0.20.0 | ||
| github.com/containers/common v0.64.2 | ||
| github.com/dop251/goja v0.0.0-20260618133527-c9b2ea77db59 |
There was a problem hiding this comment.
New heavyweight dependency. dop251/goja (a full JS interpreter, plus transitive go-sourcemap/sourcemap) is pulled in only for its parser/ast, and pinned to an untagged pseudo-version. Worth a sentence justifying the weight vs. a lighter JS parser. Separately, go.sum bumps Masterminds/semver/v3 3.4.0 → 3.5.0 — looks like an unrelated go mod tidy side effect; confirm it's intended here.
There was a problem hiding this comment.
Yeah this is a good point, if implementing this means we have to bring in a full JS interpreter, perhaps we should wait until a customer reports an issue with our existing route parsing. So far there hasn't been any requests.
|
Thank you a lot for your contribution @Vedanshu7 ! I've asked Claude for a first review and it placed some inline comments. As to be expected with AI, many observations might lack context and be incorrect. Feel free to not address all of them but please then reply with a justification to each item you won't address. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2512 +/- ##
==========================================
- Coverage 69.33% 63.48% -5.85%
==========================================
Files 340 338 -2
Lines 45778 44876 -902
==========================================
- Hits 31738 28488 -3250
- Misses 12057 14355 +2298
- Partials 1983 2033 +50
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
@mariomac review covered the technical issues, I want to step back to the approach itself. Before going deeper on the details: is this the best approach, and what alternatives did you consider? IMHO #929 is really variables + interpolation, e.g. a
I ask because whatever lands here becomes ours to maintain, and a generic AST visitor with reflect-based nil checks plus a fixpoint resolver is a lot to keep correct over time for what it adds to a best-effort heuristic. If it genuinely needs to be like this, than so be it, but I'd appreciate if you could expand on the reasins, otherwise perhaps a smaller, scanner-based version could suit us better here. |
|
Given the complexity of this approach, I think we need to close this PR and figure out another approach. I'm not only concerned that we now have full blown JS interpreter as a dependency, but also about how much memory and CPU will it use while parsing some sort of minified JavaScript file. |
Summary
The existing regex-based route harvester only matched hardcoded string paths like
app.get('/users', handler). It missed routes where the path came from a variable, a template literal, or string concatenation.This PR adds an AST-based fallback pass using Goja's parser that handles those cases:
const p = '/users'; app.get(p, handler)app.get(/api/${version}/users, handler)app.get('/api' + '/users', handler)for...ofrequire()Link to tracking issue
Fixes #929
Testing
Added unit tests in
pkg/internal/transform/route/harvest/js_ast_test.gocovering variable paths, template literals, string concatenation, object property access, destructuring, loop variables, and cross-file require(). All tests pass with go test./pkg/internal/transform/route/harvest/....Authorship