Skip to content

Comments

feat(evasive-transform): add meaning-preserving evasive transform for imports in strings#3026

Open
naugtur wants to merge 29 commits intomasterfrom
naugtur/more-evasions
Open

feat(evasive-transform): add meaning-preserving evasive transform for imports in strings#3026
naugtur wants to merge 29 commits intomasterfrom
naugtur/more-evasions

Conversation

@naugtur
Copy link
Member

@naugtur naugtur commented Dec 17, 2025

Description

With compartment-mapper introducing support for dynamic imports, the old regexp-based ways in which we evaded import censorship in SES are no longer viable and AST based transform can be the only meaning-preserving one.

Open questions

  • what about import ( - with whitespace? We could add complexity to the transform or assume they're rare enough and let them hit censorship.
    • will implement as split by regexp unless somebody stops me
  • Should it be opt-in or opt-out?
    • opt-out by unanimous opinion from people present in Endo meeting
    • implement
  • Does a step in the process between compartment-mapper and SES exist where the supported dynamic import call has been changed to its implementation and the only occurences of import( that are left need not be preserved? Could we use a simple regexp-replace there for a cheaper way to get the same result?
    • not a welcome option to create a potential for invalid programs in ModuleSource constructor
  • Do we even support eval('var a = import("somemodule")') or intend to ?

Security Considerations

  • Could the new transform be targeted to unlock malicious meaning in an otherwise harmless code? (it's meaning-preserving and AST based, so it seems unlikely)

Scaling Considerations

AST transform is slower than regexp, but for cases where import( doesn't exist in a string, it could be faster than a regexp search o the entire string.

Documentation Considerations

  • needs documenting after we decide opt-in or out
  • naming things is hard

Testing Considerations

  • added a testcase.
  • once we figure out opt-in/out a more end2end test in compartment-mapper would be nice.
  • more template string tests could help, the transform for them is more complex than I'd like, with room for off-by-one errors.

Compatibility Considerations

with the transform being meaning-preserving, this should not be breaking for anyone. Also, considering opt-in

@@ -0,0 +1,171 @@
const evadeRegexp = /import\s*\(|<!--|-->/g;
Copy link
Member Author

@naugtur naugtur Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note this now provides multiple evasions in one go

@naugtur naugtur marked this pull request as ready for review December 18, 2025 14:08
Comment on lines 185 to 193
const multilineimport =\`␊
console.log(${a})␊
await im${""}port('some-module');␊
console.log(${b})␊
\`;␊
const taggedtemplate = String.dedent\`␊
await import('some-module');␊
\`;␊
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there this difference between tagged vs. untagged template literal contents?

And as an aside, I think this test would be better if the input and expected output appeared in the same file (i.e., not using t.snapshot). It's valuable for high volume, but this use seems to come down on the wrong side of the tradeoff.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the tag function would see an extra split and empty string value in it. Where without a tag the effects of having an extra hole filled with an empty string are unobservable.

Copy link
Member

@gibson042 gibson042 Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's more code to get right, but if we're making this change, we should be thorough about it, e.g.:

        const taggedtemplate = makeEvadedTemplateApplier(String.dedent)`
            await ${IMPORT}('some-module');
        `;

where IMPORT is a unique global sentinel such as Symbol('import') and makeEvadedTemplateApplier is a global function like

const makeEvadedTemplateApplier =
  fn =>
  (T0, ...A0) => {
    const argCount = A0.length;
    const T1 = getTemplateMapping(T0) || [];
    const args = [T1];
    if (!T1.length) {
      const raw0 = T0.raw;
      const raw1 = [];
      for (let i = 0, j = 0; i <= argCount; ++i, ++j) {
        T1[j] = T0[i];
        raw1[j] = raw0[i];
        while (i < argCount && A0[i] === IMPORT) {
          i++;
          T1[j] += `import${T0[i]}`;
          raw1[j] += `import${raw0[i]}`;
        }
      }
      defineProperty(T1, 'raw', {
        ...{ writable: false, enumerable: false, configurable: false },
        value: freeze(raw1),
      });
      setTemplateMapping(T0, freeze(T1));
    }
    for (let i = 0, j = 0; i < argCount; ++i) {
      if (A0[i] !== IMPORT) args[++j] = A0[i];
    }
    return apply(fn, undefined, args);
  };

(which AFAIK is unobservable, until/unless https://github.com/tc39/proposal-array-is-template-object advances to Stage 4)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would have to put all that runtime in a single line of code, and it’s a lot. Might have to do some hard trade-off calculus. I’m in favor of making monotonic progress until we hit a cost cliff.

Copy link
Member

@gibson042 gibson042 Dec 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know to what extent minification is on the table for injected helpers like the above, but if we are comfortable with it then the cost need be no more than 482 bytes, and only necessary when the source text being transformed has one or more tagged template literals with static contents that would be affected.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's about the potential issues with sourcemaps or having to find a place to put it. Also, the level of complexity I'm not comfortable with introducing.
It's a complex fix for a very unlikely problem. And if a module contains a tagged template string with an import statement in it, it's likely problematic in many other ways.

HTML in tagged template strings is somewhat common, but as a virtual dom component, where HTML comments don't make sense.
I fail to find an example of tagged template strings containing hardcoded javascript with imports in it.

I'd like to avoid shipping a tagged template string transform for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I sure would like to avoid that complexity as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tentatively volunteered to co-champion it. Feed me your concerns.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The primary concern is taking away the ability to locally virtualize tagged template literals, which is a steep price to pay for what seems to be negligible benefit.

@naugtur

This comment was marked as outdated.

boneskull

This comment was marked as resolved.

@naugtur naugtur force-pushed the naugtur/more-evasions branch from 8609251 to 6939389 Compare January 21, 2026 12:50
@naugtur naugtur requested review from gibson042 and mhofman January 21, 2026 12:56
Copy link
Member

@gibson042 gibson042 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR should update the evasive-transform README to mention that it also affects the representation of strings, regular expressions, and template literals (with caveats for last of those, particularly describing the gaps around cooked vs. raw and total disregard of tagged template literals).

/* import comment ...IMPORT('some-module');*/␊
const result = eval("...im"+"port('some-module'); await im"+"port(\\"other\\");");␊
const result2 = eval("...im"+"port('some-module'); await im"+"port(\\"other\\");");␊
const multilineimport =\`␊
Copy link
Member

@gibson042 gibson042 Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's strange loss of the space between = and \` here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's just babel. Even in whitespace-preserving mode this space is at the end of a line in some meaning, so it's probably not represented anywhere.
Are we worried?

Also, impressive you noticed it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not worried, just confused.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out that this lost space stems from generation against our locationless synthetic AST nodes, and it's fairly easy to fix:

diff --git i/packages/evasive-transform/src/transform-code.js w/packages/evasive-transform/src/trans
form-code.js
index 1ef096d9d..f186e7ee5 100644
--- i/packages/evasive-transform/src/transform-code.js
+++ w/packages/evasive-transform/src/transform-code.js
@@ -1,6 +1,32 @@
 const evadeRegexp = /import\s*\(|<!--|-->/g;
 const importRegexp = /import(\s*\()/g;
 
+/**
+ * Copy the location from one AST node to another (round-tripping through JSON
+ * to sever references), updating the target's end position as if it had zero
+ * length.
+ *
+ * @param {import('@babel/types').Node} target
+ * @param {import('@babel/types').Node} src
+ */
+const adoptStartFrom = (target, src) => {
+  try {
+    const srcLoc = src.loc;
+    if (!srcLoc) return;
+    const loc = /** @type {typeof srcLoc} */ (
+      JSON.parse(JSON.stringify(srcLoc))
+    );
+    const start = loc?.start;
+    target.loc = loc;
+    if (!start) return;
+    target.loc.end = /** @type {typeof start} */ (
+      JSON.parse(JSON.stringify(start))
+    );
+  } catch (_err) {
+    // Ignore errors; this is purely opportunistic.
+  }
+};
+
 /**
  * Creates a BinaryExpression adding two expressions
  *
@@ -36,6 +62,7 @@ export const evadeStrings = p => {
     expr = !expr
       ? { type: 'StringLiteral', value: part }
       : addStringToExpressions(expr, part);
+    if (lastIndex === 0) adoptStartFrom(expr, p.node);
     lastIndex = index;
   }
   if (expr) {
@@ -128,11 +155,14 @@ export const evadeTemplates = p => {
     newQuasis[newQuasis.length - 1].tail = true;
   }
 
-  p.replaceWith({
+  /** @type {import('@babel/types').Node} */
+  const replacement = {
     type: 'TemplateLiteral',
     quasis: newQuasis,
     expressions: newExpressions,
-  });
+  };
+  adoptStartFrom(replacement, p.node);
+  p.replaceWith(replacement);
 };
 
 /**

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to put this in a separate PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like it in this PR so there's never a master state where code transformation loses whitespace, but if you do defer it then make sure to open an issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand why target.loc.end is set to start, but I need this merged, so I'll take it.

The first JSON pass should be enough to flatten everything, so I'm changing the second one into a shallow copy with a spread operator.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied and updated the tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand why target.loc.end is set to start, but I need this merged, so I'll take it.

The short answer is "future-proofing"—I don't want generation to skip forward too far if it starts paying attention to end. Suggested explanatory comment text at #3026 (comment)

The first JSON pass should be enough to flatten everything, so I'm changing the second one into a shallow copy with a spread operator.

Sure, that was me being overcautious about possible future introduction of new non-primitive properties.

@changeset-bot
Copy link

changeset-bot bot commented Feb 3, 2026

🦋 Changeset detected

Latest commit: 2a93c75

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages
Name Type
@endo/evasive-transform Minor
@endo/bundle-source Patch
@endo/cli Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@naugtur naugtur force-pushed the naugtur/more-evasions branch 2 times, most recently from 50d68ae to f2fa1de Compare February 12, 2026 13:39
@naugtur naugtur force-pushed the naugtur/more-evasions branch from f2fa1de to 79884e3 Compare February 12, 2026 13:40
@naugtur
Copy link
Member Author

naugtur commented Feb 12, 2026

All finished.

@gibson042 @kriskowal @boneskull

Copy link
Member

@gibson042 gibson042 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can live with this, but I'd really like to see more complete evasion in regular expression literals and preceding whitespace preservation. Thanks for the detailed explanations along the way!

/* import comment ...IMPORT('some-module');*/␊
const result = eval("...im"+"port('some-module'); await im"+"port(\\"other\\");");␊
const result2 = eval("...im"+"port('some-module'); await im"+"port(\\"other\\");");␊
const multilineimport =\`␊
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out that this lost space stems from generation against our locationless synthetic AST nodes, and it's fairly easy to fix:

diff --git i/packages/evasive-transform/src/transform-code.js w/packages/evasive-transform/src/trans
form-code.js
index 1ef096d9d..f186e7ee5 100644
--- i/packages/evasive-transform/src/transform-code.js
+++ w/packages/evasive-transform/src/transform-code.js
@@ -1,6 +1,32 @@
 const evadeRegexp = /import\s*\(|<!--|-->/g;
 const importRegexp = /import(\s*\()/g;
 
+/**
+ * Copy the location from one AST node to another (round-tripping through JSON
+ * to sever references), updating the target's end position as if it had zero
+ * length.
+ *
+ * @param {import('@babel/types').Node} target
+ * @param {import('@babel/types').Node} src
+ */
+const adoptStartFrom = (target, src) => {
+  try {
+    const srcLoc = src.loc;
+    if (!srcLoc) return;
+    const loc = /** @type {typeof srcLoc} */ (
+      JSON.parse(JSON.stringify(srcLoc))
+    );
+    const start = loc?.start;
+    target.loc = loc;
+    if (!start) return;
+    target.loc.end = /** @type {typeof start} */ (
+      JSON.parse(JSON.stringify(start))
+    );
+  } catch (_err) {
+    // Ignore errors; this is purely opportunistic.
+  }
+};
+
 /**
  * Creates a BinaryExpression adding two expressions
  *
@@ -36,6 +62,7 @@ export const evadeStrings = p => {
     expr = !expr
       ? { type: 'StringLiteral', value: part }
       : addStringToExpressions(expr, part);
+    if (lastIndex === 0) adoptStartFrom(expr, p.node);
     lastIndex = index;
   }
   if (expr) {
@@ -128,11 +155,14 @@ export const evadeTemplates = p => {
     newQuasis[newQuasis.length - 1].tail = true;
   }
 
-  p.replaceWith({
+  /** @type {import('@babel/types').Node} */
+  const replacement = {
     type: 'TemplateLiteral',
     quasis: newQuasis,
     expressions: newExpressions,
-  });
+  };
+  adoptStartFrom(replacement, p.node);
+  p.replaceWith(replacement);
 };
 
 /**

|--------|------|-------------|
| `sourceUrl` | `string` | The URL or filename of the source file. Used for source map generation and error messages. |
| `sourceMap` | `string \| object` | Optional. An existing source map (as JSON string or object) to be updated with the transform's mappings. |
| `sourceType` | `'script' \| 'module'` | Optional. Specifies whether the source is a CommonJS script (`'script'`) or an ES module (`'module'`). When provided, it helps the parser handle the code correctly. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which does this default to if omitted?

Copy link
Member Author

@naugtur naugtur Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a very good question.
I didn't touch that part, just documented the existence of the option.

The option is being used to decide whether return statements should be allowed

allowReturnOutsideFunction: opts.sourceType === 'script',

And the way it's written here, it defaults to "not script"

But the sourceType itself is being passed to babel, where it's defined as defaulting to "script"

/**
     * Indicate the mode the code should be parsed in.
     * Can be one of "script", "commonjs", "module", or "unambiguous". Defaults to "script".
     * "unambiguous" will make @babel/parser attempt to guess, based on the presence
     * of ES6 import or export statements.
     * Files with ES6 imports and exports are considered "module" and are otherwise "script".
     *
     * Use "commonjs" to parse code that is intended to be run in a CommonJS environment such as Node.js.
     */
    sourceType?: SourceType;

Let's create an issue about that and fix it separately. We might need to talk about how to avoid it being a breaking change.

| `sourceUrl` | `string` | The URL or filename of the source file. Used for source map generation and error messages. |
| `sourceMap` | `string \| object` | Optional. An existing source map (as JSON string or object) to be updated with the transform's mappings. |
| `sourceType` | `'script' \| 'module'` | Optional. Specifies whether the source is a CommonJS script (`'script'`) or an ES module (`'module'`). When provided, it helps the parser handle the code correctly. |
| `elideComments` | `boolean` | Optional. If `true`, removes comment contents while preserving newlines. Defaults to `false`. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope it also preserves columns. Yes?

Suggested change
| `elideComments` | `boolean` | Optional. If `true`, removes comment contents while preserving newlines. Defaults to `false`. |
| `elideComments` | `boolean` | Optional. If `true`, removes comment contents while preserving newlines and columns. Defaults to `false`. |

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way it's been documented in code (and I stole from there) there doesn't seem to be a promise of preserving columns there.
Which makes sense when

// a line of comment

turns into

// 

and also when internal newlines in a multiline comment are kept but the content is gone.

But it seems like it's not replacing inline comments with whitespace to match.

| `sourceMap` | `string \| object` | Optional. An existing source map (as JSON string or object) to be updated with the transform's mappings. |
| `sourceType` | `'script' \| 'module'` | Optional. Specifies whether the source is a CommonJS script (`'script'`) or an ES module (`'module'`). When provided, it helps the parser handle the code correctly. |
| `elideComments` | `boolean` | Optional. If `true`, removes comment contents while preserving newlines. Defaults to `false`. |
| `onlyComments` | `boolean` | Optional. If `true`, limits transformation to comment contents only, leaving code unchanged. Defaults to `false`. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the motivation of this option?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an opt-in to the behavior from before this PR.
This package used to only transform the contents of the comments and none of the code.

Comment on lines 179 to 182
/* HTML comment <!=- should be evaded -=>*/␊
var HTMLstring ="<!"+"-- should be evaded --"+">";␊
var HTMLtString =\`<!${""}-- should be evaded --${""}>\`;␊
/* import comment ...IMPORT('some-module');*/␊
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@erights @michaelfig @kriskowal //… comments are converted to /*…*/ comments, but the reason seems to be a mystery. @boneskull asked in #1812, and that answer was "I do not remember being aware that we changed comments this way, so I also do not remember why". It doesn't exactly cause problems because the start sequences are equal-length, although forcing the conversion at all does run against our efforts to preserve input as much as possible. Should we contemplate improving the behavior, or is it best to let this lie?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should improve this behavior. But it is low urgency and low importance. Still, we should indeed minimize gratuitous changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the question is: should we do it in this PR, depends on level of effort. If easy and quick, sure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR did not touch the comments transforms and I have a slight preference to keep the scope of it as such. I only added more coverage and more visible test results, which surfaced the issue.

Happy to start a separate branch to work on the fixes for comments and white space issues. They're also low priority to me, but if we find a fix that's not costly in performance or contributor confusion, I'm all for it.

Copy link
Member

@gibson042 gibson042 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final suggestions, still with an approval.

naugtur and others added 2 commits February 23, 2026 13:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants