module: add module.detectSyntax #57731

marco-ippolito · 2025-04-02T14:49:51Z

Fixes some issue that libraries have to determine the syntax of certain files
#57298 (comment)
since they rely on executing them and checking the error message

nodejs-github-bot · 2025-04-02T14:50:22Z

Review requested:

@nodejs/loaders

ljharb · 2025-04-02T15:36:13Z

doc/api/module.md

+import { detectSyntax } from 'node:module';
+// This could be CommonJS if we interpret < and > as greater and lesser,
+// or it could be TypeScript if we interpret them as type parameters.
+// In this case, it will be interpreted as TypeScript.


this seems potentially problematic, since TS isn't a syntactic superset of JS. Perhaps in this case it could return an array of two choices?

it's an edge case, how do you detect it? 😄 I guess it has to be clear its heuristic, it trades some correctness for convenience

if it can be parsed as both, then it's the edge case.

since this function is intended to be used by libraries and not by most users, it seems like correctness is the better tradeoff to make here. Doing result?.[0] ?? result isn't that inconvenient, but allows the choice and visibility into the edge cases.

the only way I see to disambiguate the syntax is to compileScript and see if it works, and attempt to strip types and if the output is different, it means its an edge case.

Sure, that seems fine to me.

GeoffreyBooth · 2025-04-02T16:21:23Z

The internal function is named containsModuleSyntax, because while it’s possible to affirmatively detect whether a string contains only-valid-in-ESM syntax, it’s not possible to detect whether a string contains syntax that only works in CommonJS. For example, console.log(3) is valid CommonJS or ESM; require('./file.js') is also valid CommonJS or ESM, as there could be a globalThis.require = createRequire(...) added somewhere earlier than this file.

Especially as the world transitions to ESM, it’s not a safe assumption to conclude that any file lacking ESM syntax is therefore CommonJS. That’s the assumption that Node’s module syntax detection makes for the purposes of running an ambiguous file, where CommonJS is the longstanding default behavior; but a new API shouldn’t assume that code that isn’t unambiguously ESM is therefore CommonJS (or anything else).

If you want to provide something useful for libraries, I would suggest exposing the internal containsModuleSyntax function and leave it up to the library to decide how to handle the “it doesn’t contain module syntax” case. You could create a similar containsTypes helper that detects if a string is only parseable as TypeScript. That version would have a similar “it doesn’t contain types” case, as code like console.log(3) and so on is valid TypeScript even if it’s not unambiguously TypeScript.

marco-ippolito · 2025-04-02T16:31:25Z

That is partially relevant.
Unless cross matching the syntax information with, file extension, pjson module type and loader chain, you dont know if its going to be executed as ESM or not.
Even if something has ESM syntax could be executed as cjs (faux ESM), or typescript ESM that gets compiled to cjs.
So this just attempts to guess the syntax indipendently.
If you use this function + getNearestPackageJSON + #55756 only then you know that is going to be executed as.
This function has its value indipendently.
And its a very thin wrapper around containsModuleSyntax

doc/api/module.md

Co-authored-by: Geoffrey Booth <[email protected]>

GeoffreyBooth · 2025-04-02T16:40:44Z

If you use this function + getNearestPackageJSON + #55756 only then you know that is going to be executed as.
This function has its value independently.

What is the use case that this function solves? The comment you cite, #57298 (comment), has a need to know how Node will run a particular file, including based on all those other indicators like file extension and type field and registered hooks etc. So this PR wouldn’t solve that library’s need; so what problems does it solve?

marco-ippolito · 2025-04-02T16:56:44Z

If you use this function + getNearestPackageJSON + #55756 only then you know that is going to be executed as.
This function has its value independently.

What is the use case that this function solves? The comment you cite, #57298 (comment), has a need to know how Node will run a particular file, including based on all those other indicators like file extension and type field and registered hooks etc. So this PR wouldn’t solve that library’s need; so what problems does it solve?

Other steps to detect the syntax are already covered by other APIs. This api will eventually group all those functionality in one. But this is something that is needed, especially for .ts files

GeoffreyBooth · 2025-04-02T18:09:51Z

Other steps to detect the syntax are already covered by other APIs. This api will eventually group all those functionality in one. But this is something that is needed, especially for .ts files

I’m not seeing what exactly the use case is here.

Also if the lower-level APIs like containsModuleSyntax are exposed, the higher-level analysis that this PR does can presumably be implemented in a userland library, right? So that’s the other question, of why this should live in core.

marco-ippolito · 2025-04-02T18:32:44Z

Other steps to detect the syntax are already covered by other APIs. This api will eventually group all those functionality in one. But this is something that is needed, especially for .ts files

I’m not seeing what exactly the use case is here.

Also if the lower-level APIs like containsModuleSyntax are exposed, the higher-level analysis that this PR does can presumably be implemented in a userland library, right? So that’s the other question, of why this should live in core.

This is litteraly a wrapper around containsModuleSyntax so idk what else to tell you... You dont want the typescript syntax to be detected?

GeoffreyBooth · 2025-04-02T20:47:09Z

This is litteraly a wrapper around containsModuleSyntax so idk what else to tell you... You dont want the typescript syntax to be detected?

I can see use cases for exposing containsModuleSyntax and likewise creating a similar containsTypes. Those are APIs that would have clear definitions, where users shouldn’t be confused by when they should get which output.

I am hesitant about the API in this PR, since it returns strings that look like the module hook formats or the module conditions, but are determined via different rules than how Node internally determines module formats or conditions. That strikes me as having great potential for user confusion and frustration, and it wouldn’t solve the problem that prompted this PR. If we’re going to provide an API for “how would Node handle a particular path?” or “how would Node evaluate a particular string?” then these APIs should return the same answers as Node internals would.

legendecas · 2025-04-06T11:41:18Z

For mochajs/mocha#5314, should mocha read the file and use this new module.detectSyntax api to determine import or require it?

marco-ippolito · 2025-04-06T13:44:54Z

For mochajs/mocha#5314, should mocha read the file and use this new module.detectSyntax api to determine import or require it?

Yes that is the idea

legendecas · 2025-04-06T21:35:40Z

I think for mochajs/mocha#5314, we should not rely on TypeScript module syntax to determine its module format. TypeScript module syntax is not reliable, e.g. as tsconfig.json module: node16 specifies, tsc determines output format by package.json#type, and can transpile ESM to CJS.

For frameworks like mocha, it should defer to node's built-in module loader to determine module format. Not TS syntax.

This may be useful, when strip-types is the dominant flavour though (?).

dsherret · 2025-04-07T02:03:02Z

doc/api/module.md

+detectSyntax('export const a: number = 1;'); // 'module-typescript'
+detectSyntax('const a: number = 1;'); // 'commonjs-typescript'
+detectSyntax('const foo;'); // Invalid syntax, returns undefined
+```


Maybe this should instead return two booleans where one indicates if the code contained module specific syntax and the other whether it contained typescript syntax? Stating const a = 1; is commonjs isn't exactly accurate because that can't be determined solely from the text, as alluded to in the description.

nodejs-github-bot added module Issues and PRs related to the module subsystem. needs-ci PRs that need a full CI run. labels Apr 2, 2025

marco-ippolito added the strip-types Issues or PRs related to strip-types support label Apr 2, 2025

marco-ippolito requested a review from legendecas April 2, 2025 14:55

module: add module.detectSyntax

30c7248

marco-ippolito force-pushed the expose-syntax-detection branch from 0179219 to 30c7248 Compare April 2, 2025 14:58

ljharb reviewed Apr 2, 2025

View reviewed changes

ljharb added the semver-minor PRs that contain new features and should be released in the next minor version. label Apr 2, 2025

GeoffreyBooth reviewed Apr 2, 2025

View reviewed changes

doc/api/module.md Outdated Show resolved Hide resolved

Update doc/api/module.md

942bd54

Co-authored-by: Geoffrey Booth <[email protected]>

marco-ippolito mentioned this pull request Apr 6, 2025

Consider removing unstable "node:module".stripTypeScriptTypes to reduce baseline SEA binary size #57744

Open

dsherret reviewed Apr 7, 2025

View reviewed changes

marco-ippolito closed this Apr 7, 2025

Uh oh!

module: add module.detectSyntax #57731

module: add module.detectSyntax #57731

Uh oh!

Conversation

marco-ippolito commented Apr 2, 2025

Uh oh!

nodejs-github-bot commented Apr 2, 2025

Uh oh!

ljharb Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

marco-ippolito Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ljharb Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

marco-ippolito Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

ljharb Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

GeoffreyBooth commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marco-ippolito commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

GeoffreyBooth commented Apr 2, 2025

Uh oh!

marco-ippolito commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GeoffreyBooth commented Apr 2, 2025

Uh oh!

marco-ippolito commented Apr 2, 2025

Uh oh!

GeoffreyBooth commented Apr 2, 2025

Uh oh!

legendecas commented Apr 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marco-ippolito commented Apr 6, 2025

Uh oh!

legendecas commented Apr 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dsherret Apr 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

marco-ippolito Apr 2, 2025 •

edited

Loading

GeoffreyBooth commented Apr 2, 2025 •

edited

Loading

marco-ippolito commented Apr 2, 2025 •

edited

Loading

marco-ippolito commented Apr 2, 2025 •

edited

Loading

legendecas commented Apr 6, 2025 •

edited

Loading

legendecas commented Apr 6, 2025 •

edited

Loading