feat(evasive-transform): add meaning-preserving evasive transform for imports in strings#3026
feat(evasive-transform): add meaning-preserving evasive transform for imports in strings#3026
Conversation
| @@ -0,0 +1,171 @@ | |||
| const evadeRegexp = /import\s*\(|<!--|-->/g; | |||
There was a problem hiding this comment.
Note this now provides multiple evasions in one go
| const multilineimport =\`␊ | ||
| console.log(${a})␊ | ||
| await im${""}port('some-module');␊ | ||
| console.log(${b})␊ | ||
| \`;␊ | ||
| ␊ | ||
| const taggedtemplate = String.dedent\`␊ | ||
| await import('some-module');␊ | ||
| \`;␊ |
There was a problem hiding this comment.
Why is there this difference between tagged vs. untagged template literal contents?
And as an aside, I think this test would be better if the input and expected output appeared in the same file (i.e., not using t.snapshot). It's valuable for high volume, but this use seems to come down on the wrong side of the tradeoff.
There was a problem hiding this comment.
Because the tag function would see an extra split and empty string value in it. Where without a tag the effects of having an extra hole filled with an empty string are unobservable.
There was a problem hiding this comment.
I know it's more code to get right, but if we're making this change, we should be thorough about it, e.g.:
const taggedtemplate = makeEvadedTemplateApplier(String.dedent)`
await ${IMPORT}('some-module');
`;where IMPORT is a unique global sentinel such as Symbol('import') and makeEvadedTemplateApplier is a global function like
const makeEvadedTemplateApplier =
fn =>
(T0, ...A0) => {
const argCount = A0.length;
const T1 = getTemplateMapping(T0) || [];
const args = [T1];
if (!T1.length) {
const raw0 = T0.raw;
const raw1 = [];
for (let i = 0, j = 0; i <= argCount; ++i, ++j) {
T1[j] = T0[i];
raw1[j] = raw0[i];
while (i < argCount && A0[i] === IMPORT) {
i++;
T1[j] += `import${T0[i]}`;
raw1[j] += `import${raw0[i]}`;
}
}
defineProperty(T1, 'raw', {
...{ writable: false, enumerable: false, configurable: false },
value: freeze(raw1),
});
setTemplateMapping(T0, freeze(T1));
}
for (let i = 0, j = 0; i < argCount; ++i) {
if (A0[i] !== IMPORT) args[++j] = A0[i];
}
return apply(fn, undefined, args);
};(which AFAIK is unobservable, until/unless https://github.com/tc39/proposal-array-is-template-object advances to Stage 4)
There was a problem hiding this comment.
We would have to put all that runtime in a single line of code, and it’s a lot. Might have to do some hard trade-off calculus. I’m in favor of making monotonic progress until we hit a cost cliff.
There was a problem hiding this comment.
I don't know to what extent minification is on the table for injected helpers like the above, but if we are comfortable with it then the cost need be no more than 482 bytes, and only necessary when the source text being transformed has one or more tagged template literals with static contents that would be affected.
There was a problem hiding this comment.
It's about the potential issues with sourcemaps or having to find a place to put it. Also, the level of complexity I'm not comfortable with introducing.
It's a complex fix for a very unlikely problem. And if a module contains a tagged template string with an import statement in it, it's likely problematic in many other ways.
HTML in tagged template strings is somewhat common, but as a virtual dom component, where HTML comments don't make sense.
I fail to find an example of tagged template strings containing hardcoded javascript with imports in it.
I'd like to avoid shipping a tagged template string transform for now.
There was a problem hiding this comment.
There was a problem hiding this comment.
I sure would like to avoid that complexity as well.
There was a problem hiding this comment.
I tentatively volunteered to co-champion it. Feed me your concerns.
There was a problem hiding this comment.
The primary concern is taking away the ability to locally virtualize tagged template literals, which is a steep price to pay for what seems to be negligible benefit.
This comment was marked as outdated.
This comment was marked as outdated.
8609251 to
6939389
Compare
55d4269 to
5466fc4
Compare
gibson042
left a comment
There was a problem hiding this comment.
This PR should update the evasive-transform README to mention that it also affects the representation of strings, regular expressions, and template literals (with caveats for last of those, particularly describing the gaps around cooked vs. raw and total disregard of tagged template literals).
| /* import comment ...IMPORT('some-module');*/␊ | ||
| const result = eval("...im"+"port('some-module'); await im"+"port(\\"other\\");");␊ | ||
| const result2 = eval("...im"+"port('some-module'); await im"+"port(\\"other\\");");␊ | ||
| const multilineimport =\`␊ |
There was a problem hiding this comment.
There's strange loss of the space between = and \` here.
There was a problem hiding this comment.
I think it's just babel. Even in whitespace-preserving mode this space is at the end of a line in some meaning, so it's probably not represented anywhere.
Are we worried?
Also, impressive you noticed it
There was a problem hiding this comment.
Turns out that this lost space stems from generation against our locationless synthetic AST nodes, and it's fairly easy to fix:
diff --git i/packages/evasive-transform/src/transform-code.js w/packages/evasive-transform/src/trans
form-code.js
index 1ef096d9d..f186e7ee5 100644
--- i/packages/evasive-transform/src/transform-code.js
+++ w/packages/evasive-transform/src/transform-code.js
@@ -1,6 +1,32 @@
const evadeRegexp = /import\s*\(|<!--|-->/g;
const importRegexp = /import(\s*\()/g;
+/**
+ * Copy the location from one AST node to another (round-tripping through JSON
+ * to sever references), updating the target's end position as if it had zero
+ * length.
+ *
+ * @param {import('@babel/types').Node} target
+ * @param {import('@babel/types').Node} src
+ */
+const adoptStartFrom = (target, src) => {
+ try {
+ const srcLoc = src.loc;
+ if (!srcLoc) return;
+ const loc = /** @type {typeof srcLoc} */ (
+ JSON.parse(JSON.stringify(srcLoc))
+ );
+ const start = loc?.start;
+ target.loc = loc;
+ if (!start) return;
+ target.loc.end = /** @type {typeof start} */ (
+ JSON.parse(JSON.stringify(start))
+ );
+ } catch (_err) {
+ // Ignore errors; this is purely opportunistic.
+ }
+};
+
/**
* Creates a BinaryExpression adding two expressions
*
@@ -36,6 +62,7 @@ export const evadeStrings = p => {
expr = !expr
? { type: 'StringLiteral', value: part }
: addStringToExpressions(expr, part);
+ if (lastIndex === 0) adoptStartFrom(expr, p.node);
lastIndex = index;
}
if (expr) {
@@ -128,11 +155,14 @@ export const evadeTemplates = p => {
newQuasis[newQuasis.length - 1].tail = true;
}
- p.replaceWith({
+ /** @type {import('@babel/types').Node} */
+ const replacement = {
type: 'TemplateLiteral',
quasis: newQuasis,
expressions: newExpressions,
- });
+ };
+ adoptStartFrom(replacement, p.node);
+ p.replaceWith(replacement);
};
/**There was a problem hiding this comment.
I'd like to put this in a separate PR
There was a problem hiding this comment.
I'd like it in this PR so there's never a master state where code transformation loses whitespace, but if you do defer it then make sure to open an issue.
There was a problem hiding this comment.
I'm not sure I understand why target.loc.end is set to start, but I need this merged, so I'll take it.
The first JSON pass should be enough to flatten everything, so I'm changing the second one into a shallow copy with a spread operator.
There was a problem hiding this comment.
Applied and updated the tests.
There was a problem hiding this comment.
I'm not sure I understand why target.loc.end is set to start, but I need this merged, so I'll take it.
The short answer is "future-proofing"—I don't want generation to skip forward too far if it starts paying attention to end. Suggested explanatory comment text at #3026 (comment)
The first JSON pass should be enough to flatten everything, so I'm changing the second one into a shallow copy with a spread operator.
Sure, that was me being overcautious about possible future introduction of new non-primitive properties.
🦋 Changeset detectedLatest commit: 2a93c75 The changes in this PR will be included in the next version bump. This PR includes changesets to release 3 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
50d68ae to
f2fa1de
Compare
… imports in strings
Co-authored-by: Richard Gibson <richard.gibson@gmail.com>
…ansform prior to refactor
… compartment-mapper with a test
f2fa1de to
79884e3
Compare
|
All finished. |
gibson042
left a comment
There was a problem hiding this comment.
I can live with this, but I'd really like to see more complete evasion in regular expression literals and preceding whitespace preservation. Thanks for the detailed explanations along the way!
| /* import comment ...IMPORT('some-module');*/␊ | ||
| const result = eval("...im"+"port('some-module'); await im"+"port(\\"other\\");");␊ | ||
| const result2 = eval("...im"+"port('some-module'); await im"+"port(\\"other\\");");␊ | ||
| const multilineimport =\`␊ |
There was a problem hiding this comment.
Turns out that this lost space stems from generation against our locationless synthetic AST nodes, and it's fairly easy to fix:
diff --git i/packages/evasive-transform/src/transform-code.js w/packages/evasive-transform/src/trans
form-code.js
index 1ef096d9d..f186e7ee5 100644
--- i/packages/evasive-transform/src/transform-code.js
+++ w/packages/evasive-transform/src/transform-code.js
@@ -1,6 +1,32 @@
const evadeRegexp = /import\s*\(|<!--|-->/g;
const importRegexp = /import(\s*\()/g;
+/**
+ * Copy the location from one AST node to another (round-tripping through JSON
+ * to sever references), updating the target's end position as if it had zero
+ * length.
+ *
+ * @param {import('@babel/types').Node} target
+ * @param {import('@babel/types').Node} src
+ */
+const adoptStartFrom = (target, src) => {
+ try {
+ const srcLoc = src.loc;
+ if (!srcLoc) return;
+ const loc = /** @type {typeof srcLoc} */ (
+ JSON.parse(JSON.stringify(srcLoc))
+ );
+ const start = loc?.start;
+ target.loc = loc;
+ if (!start) return;
+ target.loc.end = /** @type {typeof start} */ (
+ JSON.parse(JSON.stringify(start))
+ );
+ } catch (_err) {
+ // Ignore errors; this is purely opportunistic.
+ }
+};
+
/**
* Creates a BinaryExpression adding two expressions
*
@@ -36,6 +62,7 @@ export const evadeStrings = p => {
expr = !expr
? { type: 'StringLiteral', value: part }
: addStringToExpressions(expr, part);
+ if (lastIndex === 0) adoptStartFrom(expr, p.node);
lastIndex = index;
}
if (expr) {
@@ -128,11 +155,14 @@ export const evadeTemplates = p => {
newQuasis[newQuasis.length - 1].tail = true;
}
- p.replaceWith({
+ /** @type {import('@babel/types').Node} */
+ const replacement = {
type: 'TemplateLiteral',
quasis: newQuasis,
expressions: newExpressions,
- });
+ };
+ adoptStartFrom(replacement, p.node);
+ p.replaceWith(replacement);
};
/**Co-authored-by: Richard Gibson <richard.gibson@gmail.com>
…ement operations ever again
Co-authored-by: Richard Gibson <richard.gibson@gmail.com>
| |--------|------|-------------| | ||
| | `sourceUrl` | `string` | The URL or filename of the source file. Used for source map generation and error messages. | | ||
| | `sourceMap` | `string \| object` | Optional. An existing source map (as JSON string or object) to be updated with the transform's mappings. | | ||
| | `sourceType` | `'script' \| 'module'` | Optional. Specifies whether the source is a CommonJS script (`'script'`) or an ES module (`'module'`). When provided, it helps the parser handle the code correctly. | |
There was a problem hiding this comment.
Which does this default to if omitted?
There was a problem hiding this comment.
That's a very good question.
I didn't touch that part, just documented the existence of the option.
The option is being used to decide whether return statements should be allowed
allowReturnOutsideFunction: opts.sourceType === 'script',
And the way it's written here, it defaults to "not script"
But the sourceType itself is being passed to babel, where it's defined as defaulting to "script"
/**
* Indicate the mode the code should be parsed in.
* Can be one of "script", "commonjs", "module", or "unambiguous". Defaults to "script".
* "unambiguous" will make @babel/parser attempt to guess, based on the presence
* of ES6 import or export statements.
* Files with ES6 imports and exports are considered "module" and are otherwise "script".
*
* Use "commonjs" to parse code that is intended to be run in a CommonJS environment such as Node.js.
*/
sourceType?: SourceType;
Let's create an issue about that and fix it separately. We might need to talk about how to avoid it being a breaking change.
| | `sourceUrl` | `string` | The URL or filename of the source file. Used for source map generation and error messages. | | ||
| | `sourceMap` | `string \| object` | Optional. An existing source map (as JSON string or object) to be updated with the transform's mappings. | | ||
| | `sourceType` | `'script' \| 'module'` | Optional. Specifies whether the source is a CommonJS script (`'script'`) or an ES module (`'module'`). When provided, it helps the parser handle the code correctly. | | ||
| | `elideComments` | `boolean` | Optional. If `true`, removes comment contents while preserving newlines. Defaults to `false`. | |
There was a problem hiding this comment.
I hope it also preserves columns. Yes?
| | `elideComments` | `boolean` | Optional. If `true`, removes comment contents while preserving newlines. Defaults to `false`. | | |
| | `elideComments` | `boolean` | Optional. If `true`, removes comment contents while preserving newlines and columns. Defaults to `false`. | |
There was a problem hiding this comment.
The way it's been documented in code (and I stole from there) there doesn't seem to be a promise of preserving columns there.
Which makes sense when
// a line of comment
turns into
//
and also when internal newlines in a multiline comment are kept but the content is gone.
But it seems like it's not replacing inline comments with whitespace to match.
| | `sourceMap` | `string \| object` | Optional. An existing source map (as JSON string or object) to be updated with the transform's mappings. | | ||
| | `sourceType` | `'script' \| 'module'` | Optional. Specifies whether the source is a CommonJS script (`'script'`) or an ES module (`'module'`). When provided, it helps the parser handle the code correctly. | | ||
| | `elideComments` | `boolean` | Optional. If `true`, removes comment contents while preserving newlines. Defaults to `false`. | | ||
| | `onlyComments` | `boolean` | Optional. If `true`, limits transformation to comment contents only, leaving code unchanged. Defaults to `false`. | |
There was a problem hiding this comment.
What's the motivation of this option?
There was a problem hiding this comment.
It's an opt-in to the behavior from before this PR.
This package used to only transform the contents of the comments and none of the code.
| /* HTML comment <!=- should be evaded -=>*/␊ | ||
| var HTMLstring ="<!"+"-- should be evaded --"+">";␊ | ||
| var HTMLtString =\`<!${""}-- should be evaded --${""}>\`;␊ | ||
| /* import comment ...IMPORT('some-module');*/␊ |
There was a problem hiding this comment.
@erights @michaelfig @kriskowal //… comments are converted to /*…*/ comments, but the reason seems to be a mystery. @boneskull asked in #1812, and that answer was "I do not remember being aware that we changed comments this way, so I also do not remember why". It doesn't exactly cause problems because the start sequences are equal-length, although forcing the conversion at all does run against our efforts to preserve input as much as possible. Should we contemplate improving the behavior, or is it best to let this lie?
There was a problem hiding this comment.
We should improve this behavior. But it is low urgency and low importance. Still, we should indeed minimize gratuitous changes.
There was a problem hiding this comment.
If the question is: should we do it in this PR, depends on level of effort. If easy and quick, sure.
There was a problem hiding this comment.
This PR did not touch the comments transforms and I have a slight preference to keep the scope of it as such. I only added more coverage and more visible test results, which surfaced the issue.
Happy to start a separate branch to work on the fixes for comments and white space issues. They're also low priority to me, but if we find a fix that's not costly in performance or contributor confusion, I'm all for it.
…ve changeset description Add meaning-preserving transformation for expressions and literals in evasive-transform. Previously, only comments were transformed; use `onlyComments` option to opt-out.
gibson042
left a comment
There was a problem hiding this comment.
Final suggestions, still with an approval.
Co-authored-by: Richard Gibson <richard.gibson@gmail.com>
Description
With compartment-mapper introducing support for dynamic imports, the old regexp-based ways in which we evaded import censorship in SES are no longer viable and AST based transform can be the only meaning-preserving one.
Open questions
import (- with whitespace? We could add complexity to the transform or assume they're rare enough and let them hit censorship.splitby regexp unless somebody stops meimport(that are left need not be preserved? Could we use a simple regexp-replace there for a cheaper way to get the same result?eval('var a = import("somemodule")')or intend to ?Security Considerations
Scaling Considerations
AST transform is slower than regexp, but for cases where
import(doesn't exist in a string, it could be faster than a regexp search o the entire string.Documentation Considerations
Testing Considerations
Compatibility Considerations
with the transform being meaning-preserving, this should not be breaking for anyone. Also, considering opt-in