Skip to content

Conversation

@fredrikekelund
Copy link
Contributor

Related issues

Proposed Changes

In #1264, we implemented some logic in DefaultExporter to exclude .git and node_modules folders from export archives. The logic looked for those substrings anywhere in the path. This could be considered a risky approach, but since those terms are pretty esoteric, it likely didn't cause much trouble.

In #1940, we added the term cache to this list, which is much more generic. When combined with the simplistic path-checking logic, this change would actually break exports containing files or directories with "cache" in the name that weren't meant to be excluded.

STU-1189 documents one such example, where the Yoast SEO plugin would break during export (since certain required files were excluded). It's worth noting that sync pushes containing Yoast SEO plugins wouldn't have broken, since plugins aren't restored directly from the push archive on the back-end.

In any case, this PR addresses the issue by:

  1. Moving this path exclusion logic to a method DefaultExporter::isPathExcludedByPattern
  2. Only excluding a path if it's a directory, AND
  3. If .git, node_modules, or cache is the full name of that directory. hellonode_modules or my-cache are not excluded.

Testing Instructions

  1. Run npm start
  2. Create a new site
  3. Install Yoast SEO on that site
  4. Do a full site export for that site
  5. Create a new site from the backup you just created
  6. Ensure that the site is created successfully and starts successfully

Pre-merge Checklist

  • Have you checked for TypeScript, React or other console errors?

@fredrikekelund fredrikekelund requested a review from a team January 5, 2026 14:01
@fredrikekelund fredrikekelund self-assigned this Jan 5, 2026
@wpmobilebot
Copy link

wpmobilebot commented Jan 5, 2026

📊 Performance Test Results

Comparing 05e5958 vs trunk

site-editor

Metric trunk 05e5958 Diff Change
load 6698.00 ms 7197.00 ms +499.00 ms 🔴 7.4%

site-startup

Metric trunk 05e5958 Diff Change
siteCreation 9073.00 ms 9092.00 ms +19.00 ms 🔴 0.2%
siteStartup 3950.00 ms 3954.00 ms +4.00 ms 🔴 0.1%

Results are median values from multiple test runs.

Legend: 🟢 Improvement (faster) | 🔴 Regression (slower) | ⚪ No change

Copy link
Contributor

@ivan-ottinger ivan-ottinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix and sharing details in the PR description, Fredrik! I can confirm the changes address the issue which I cannot reproduce anymore. Site starts correctly without any missing files errors. 👍🏼

However, I noticed that the logic no longer filters out node_modules and cache directories during export.

Related Slack discussion with example site export: p1767625004464189/1767176661.251759-slack-C06DRMD6VPZ

@fredrikekelund fredrikekelund mentioned this pull request Jan 5, 2026
1 task
Copy link
Contributor

@ivan-ottinger ivan-ottinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the update, Fredrik. The changes look good and work as expected on macOS. I did not observe any more regressions there. The files that should be excluded, get excluded.

On Windows the regex fails though:

Server started for '00000'
(node:6028) UnhandledPromiseRejectionWarning: SyntaxError: Invalid regular expression: /\(.git|node_modules|cache)(\|$)/g: Unmatched ')'
    at new RegExp (<anonymous>)
    at DefaultExporter.isPathExcludedByPattern (C:\Users\ivanottinger\Documents\GitHub\studio\dist\main\index.js:4537:27)
    at C:\Users\ivanottinger\Documents\GitHub\studio\dist\main\index.js:4675:82
    at Archiver.onGlobMatch (C:\Users\ivanottinger\Documents\GitHub\studio\node_modules\archiver\lib\core.js:659:21)
    at ReaddirGlob.emit (node:events:519:28)
    at C:\Users\ivanottinger\Documents\GitHub\studio\node_modules\readdir-glob\index.js:200:20
(node:6028) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 11)

Looks like we may need to escape the backslash in the regex pattern.

I left one additional (but minor) suggestion.

);
} );

it( 'should exclude .git, node_modules, and cache directories', async () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding dedicated test cases.

// a directory or not.
private isPathExcludedByPattern( pathToCheck: string ) {
const DIRECTORY_NAMES_TO_EXCLUDE = [ '.git', 'node_modules', 'cache' ];
const offenderRegex = new RegExp(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it wouldn't be nicer to rename offenderRegex to something like matchRegex and offenderPath to matchedPath (or something similar).

Offender sounds a bit negative to me. 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, but a negative connotation seems pretty appropriate to me in this context. We're naming something that matches a disallowed pattern and may be excluded from the export. "Match" also works, but I would opt for the added clarity of "offender", since there's no sensitive real-world association with that term.

@fredrikekelund
Copy link
Contributor Author

Thanks for the alert reviews, @ivan-ottinger, and sorry about the sloppiness on my part 🤦‍♂️

I realized this morning that there was another problem with my regex: .git matches *git in a regular expression. I've reimplemented isPathExcludedByPattern to avoid regular expressions. Instead, it splits the path using the platform's path separator, then searches the resulting array for any of the disallowed directory names.

I also realized that the test was plain wrong. This comes down to my over-trusting of AI-implemented tests. I mentioned this elsewhere, too, but I know I'll be trying to flip my current workflow by writing the test first and then asking the AI to help with the implementation. It might not be as effective, but my own complacency with AI-generated tests is beginning to irk me.

Anyway, I've added new tests that test the DefaultExporter::isExactPathExcluded and DefaultExporter::isPathExcludedByPattern methods directly. Simpler and more effective.

Copy link
Contributor

@katinthehatsite katinthehatsite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good to me, I compared them to trunk and I can see the same set of files exported as before. I was also able to create a site with the exported file. I did not see anymore the issues mentioned by Ivan

@fredrikekelund fredrikekelund merged commit 4b7add0 into trunk Jan 6, 2026
10 checks passed
@fredrikekelund fredrikekelund deleted the f26d/more-refined-export-exclusion-rules branch January 6, 2026 09:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants