Skip to content

module: add module.detectSyntax #57731

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

marco-ippolito
Copy link
Member

Fixes some issue that libraries have to determine the syntax of certain files
#57298 (comment)
since they rely on executing them and checking the error message

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/loaders

@nodejs-github-bot nodejs-github-bot added module Issues and PRs related to the module subsystem. needs-ci PRs that need a full CI run. labels Apr 2, 2025
@marco-ippolito marco-ippolito added the strip-types Issues or PRs related to strip-types support label Apr 2, 2025
@marco-ippolito marco-ippolito force-pushed the expose-syntax-detection branch from 0179219 to 30c7248 Compare April 2, 2025 14:58
import { detectSyntax } from 'node:module';
// This could be CommonJS if we interpret < and > as greater and lesser,
// or it could be TypeScript if we interpret them as type parameters.
// In this case, it will be interpreted as TypeScript.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems potentially problematic, since TS isn't a syntactic superset of JS. Perhaps in this case it could return an array of two choices?

Copy link
Member Author

@marco-ippolito marco-ippolito Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's an edge case, how do you detect it? 😄 I guess it has to be clear its heuristic, it trades some correctness for convenience

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it can be parsed as both, then it's the edge case.

since this function is intended to be used by libraries and not by most users, it seems like correctness is the better tradeoff to make here. Doing result?.[0] ?? result isn't that inconvenient, but allows the choice and visibility into the edge cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the only way I see to disambiguate the syntax is to compileScript and see if it works, and attempt to strip types and if the output is different, it means its an edge case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that seems fine to me.

@ljharb ljharb added the semver-minor PRs that contain new features and should be released in the next minor version. label Apr 2, 2025
@GeoffreyBooth
Copy link
Member

GeoffreyBooth commented Apr 2, 2025

The internal function is named containsModuleSyntax, because while it’s possible to affirmatively detect whether a string contains only-valid-in-ESM syntax, it’s not possible to detect whether a string contains syntax that only works in CommonJS. For example, console.log(3) is valid CommonJS or ESM; require('./file.js') is also valid CommonJS or ESM, as there could be a globalThis.require = createRequire(...) added somewhere earlier than this file.

Especially as the world transitions to ESM, it’s not a safe assumption to conclude that any file lacking ESM syntax is therefore CommonJS. That’s the assumption that Node’s module syntax detection makes for the purposes of running an ambiguous file, where CommonJS is the longstanding default behavior; but a new API shouldn’t assume that code that isn’t unambiguously ESM is therefore CommonJS (or anything else).

If you want to provide something useful for libraries, I would suggest exposing the internal containsModuleSyntax function and leave it up to the library to decide how to handle the “it doesn’t contain module syntax” case. You could create a similar containsTypes helper that detects if a string is only parseable as TypeScript. That version would have a similar “it doesn’t contain types” case, as code like console.log(3) and so on is valid TypeScript even if it’s not unambiguously TypeScript.

@marco-ippolito
Copy link
Member Author

marco-ippolito commented Apr 2, 2025

That is partially relevant.
Unless cross matching the syntax information with, file extension, pjson module type and loader chain, you dont know if its going to be executed as ESM or not.
Even if something has ESM syntax could be executed as cjs (faux ESM), or typescript ESM that gets compiled to cjs.
So this just attempts to guess the syntax indipendently.
If you use this function + getNearestPackageJSON + #55756 only then you know that is going to be executed as.
This function has its value indipendently.
And its a very thin wrapper around containsModuleSyntax

Co-authored-by: Geoffrey Booth <[email protected]>
@GeoffreyBooth
Copy link
Member

If you use this function + getNearestPackageJSON + #55756 only then you know that is going to be executed as.
This function has its value independently.

What is the use case that this function solves? The comment you cite, #57298 (comment), has a need to know how Node will run a particular file, including based on all those other indicators like file extension and type field and registered hooks etc. So this PR wouldn’t solve that library’s need; so what problems does it solve?

@marco-ippolito
Copy link
Member Author

marco-ippolito commented Apr 2, 2025

If you use this function + getNearestPackageJSON + #55756 only then you know that is going to be executed as.
This function has its value independently.

What is the use case that this function solves? The comment you cite, #57298 (comment), has a need to know how Node will run a particular file, including based on all those other indicators like file extension and type field and registered hooks etc. So this PR wouldn’t solve that library’s need; so what problems does it solve?

Other steps to detect the syntax are already covered by other APIs. This api will eventually group all those functionality in one. But this is something that is needed, especially for .ts files

@GeoffreyBooth
Copy link
Member

Other steps to detect the syntax are already covered by other APIs. This api will eventually group all those functionality in one. But this is something that is needed, especially for .ts files

I’m not seeing what exactly the use case is here.

Also if the lower-level APIs like containsModuleSyntax are exposed, the higher-level analysis that this PR does can presumably be implemented in a userland library, right? So that’s the other question, of why this should live in core.

@marco-ippolito
Copy link
Member Author

Other steps to detect the syntax are already covered by other APIs. This api will eventually group all those functionality in one. But this is something that is needed, especially for .ts files

I’m not seeing what exactly the use case is here.

Also if the lower-level APIs like containsModuleSyntax are exposed, the higher-level analysis that this PR does can presumably be implemented in a userland library, right? So that’s the other question, of why this should live in core.

This is litteraly a wrapper around containsModuleSyntax so idk what else to tell you... You dont want the typescript syntax to be detected?

@GeoffreyBooth
Copy link
Member

This is litteraly a wrapper around containsModuleSyntax so idk what else to tell you... You dont want the typescript syntax to be detected?

I can see use cases for exposing containsModuleSyntax and likewise creating a similar containsTypes. Those are APIs that would have clear definitions, where users shouldn’t be confused by when they should get which output.

I am hesitant about the API in this PR, since it returns strings that look like the module hook formats or the module conditions, but are determined via different rules than how Node internally determines module formats or conditions. That strikes me as having great potential for user confusion and frustration, and it wouldn’t solve the problem that prompted this PR. If we’re going to provide an API for “how would Node handle a particular path?” or “how would Node evaluate a particular string?” then these APIs should return the same answers as Node internals would.

@legendecas
Copy link
Member

legendecas commented Apr 6, 2025

For mochajs/mocha#5314, should mocha read the file and use this new module.detectSyntax api to determine import or require it?

@marco-ippolito
Copy link
Member Author

For mochajs/mocha#5314, should mocha read the file and use this new module.detectSyntax api to determine import or require it?

Yes that is the idea

@legendecas
Copy link
Member

legendecas commented Apr 6, 2025

I think for mochajs/mocha#5314, we should not rely on TypeScript module syntax to determine its module format. TypeScript module syntax is not reliable, e.g. as tsconfig.json module: node16 specifies, tsc determines output format by package.json#type, and can transpile ESM to CJS.

For frameworks like mocha, it should defer to node's built-in module loader to determine module format. Not TS syntax.

This may be useful, when strip-types is the dominant flavour though (?).

detectSyntax('export const a: number = 1;'); // 'module-typescript'
detectSyntax('const a: number = 1;'); // 'commonjs-typescript'
detectSyntax('const foo;'); // Invalid syntax, returns undefined
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should instead return two booleans where one indicates if the code contained module specific syntax and the other whether it contained typescript syntax? Stating const a = 1; is commonjs isn't exactly accurate because that can't be determined solely from the text, as alluded to in the description.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module Issues and PRs related to the module subsystem. needs-ci PRs that need a full CI run. semver-minor PRs that contain new features and should be released in the next minor version. strip-types Issues or PRs related to strip-types support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants