fix(scripts): restore markdown aliases for PDF downloads#2698
fix(scripts): restore markdown aliases for PDF downloads#2698marcklingen wants to merge 1 commit intomainfrom
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
LGTM — the core fix is correct and the nits found are cosmetic/performance-only with no functional impact.
Extended reasoning...
Overview
This PR modifies scripts/copy_md_sources.js (a build-time script) and adds a test file. The change refactors mapDestination → mapDestinations to emit multiple output paths per source, adds dependency injection for testability, and gates execution with require.main === module. The functional goal is to restore root-level .md aliases for marketing pages so the PDF download feature works.
Security Risks
None. This is a build-time file copy script with no auth, user input, or external network calls.
Level of Scrutiny
Low. This is a build tooling script, not production application code. The change is mechanical and the core logic is straightforward. The test validates the key behavioral requirement.
Bug Assessment
The three bugs flagged are all nits with zero functional impact: the copied counter double-counts writes (cosmetic log message), non-markdown files are read before filtering (minor performance waste on build, no correctness issue), and the test temp dir leaks on assertion failure (minor CI hygiene, no false positives/negatives). None of these affect the actual file output that fixes the PDF download feature.
| for (const dest of destinations) { | ||
| const dir = path.dirname(dest); | ||
| ensureDir(dir); | ||
| fs.writeFileSync(dest, processed, 'utf8'); | ||
| copied += 1; | ||
| } |
There was a problem hiding this comment.
🟡 The copied counter increments inside the inner for-loop over destinations, so a single marketing page source file increments it by 2 (once per destination). The log message "Copied N markdown source files into public/md-src" is misleading -- N counts file writes, not source files. Moving copied += 1 outside the inner loop would restore accurate semantics.
Extended reasoning...
What the bug is: In copyAll() at line 90, copied += 1 is placed inside the for (const dest of destinations) loop. This means the counter increments once per destination write, not once per source file processed.
The specific code path: For a marketing page like content/marketing/terms.mdx, mapDestinations() returns two entries: public/md-src/marketing/terms.md and public/md-src/terms.md. The inner loop iterates twice, incrementing copied by 2 for a single source file.
Why existing code does not prevent it: The refactoring that introduced multiple destinations per source (to fix the PDF download feature) moved copied += 1 inside the new inner loop without adjusting for the change in loop semantics. The original code had a single destination per source and incremented once per file.
Concrete step-by-step proof: Suppose the repo has 5 non-marketing docs and 3 marketing pages. Non-marketing docs each produce 1 destination, contributing +5. Marketing pages each produce 2 destinations, contributing +6. Total copied = 11, but the log prints "Copied 11 markdown source files into public/md-src" when only 8 source files were actually processed.
Impact: This is purely a cosmetic logging inaccuracy. No files are missing, duplicated incorrectly, or corrupted. The actual copy behavior is completely correct -- both scoped and root-level aliases are written as intended. Only the human-readable build log is misleading.
Fix: Move copied += 1 outside the inner loop (but still inside the outer for-of-allFiles loop), and only increment when at least one destination was written (i.e., destinations.length > 0). This restores the original semantics of counting source files processed.
| } | ||
| const allFiles = walkDir(SOURCE_DIR); | ||
| const allFiles = walkDir(sourceDir); | ||
| let copied = 0; | ||
| for (const file of allFiles) { | ||
| const dest = mapDestination(file); | ||
| if (!dest) continue; | ||
| const dir = path.dirname(dest); | ||
| ensureDir(dir); | ||
| const originalContent = fs.readFileSync(file, 'utf8'); | ||
| const processed = inlineComponentsMdx(originalContent, file); | ||
| fs.writeFileSync(dest, processed, 'utf8'); | ||
| copied += 1; | ||
| const destinations = mapDestinations(file, { sourceDir, outputDir }); | ||
| for (const dest of destinations) { | ||
| const dir = path.dirname(dest); | ||
| ensureDir(dir); | ||
| fs.writeFileSync(dest, processed, 'utf8'); | ||
| copied += 1; | ||
| } | ||
| } |
There was a problem hiding this comment.
🟡 The refactored copyAll reads every file with fs.readFileSync and runs inlineComponentsMdx on it before calling mapDestinations to determine whether to skip the file, reversing the original filter-then-read pattern. This causes unnecessary I/O and CPU work on non-markdown files (meta.json, _-prefixed files, and any other assets in content/) that are ultimately discarded.
Extended reasoning...
What the bug is
In the original copyAll implementation, mapDestination was called first and an early continue skipped any file that should not be copied. The new implementation inverts this order: it unconditionally calls fs.readFileSync(file, "utf8") and inlineComponentsMdx(originalContent, file) on every file returned by walkDir, and only afterwards calls mapDestinations to determine if there are any output locations.
The specific code path
// NEW (buggy ordering)
for (const file of allFiles) {
const originalContent = fs.readFileSync(file, utf8); // reads EVERY file
const processed = inlineComponentsMdx(originalContent, file); // processes EVERY file
const destinations = mapDestinations(file, { sourceDir, outputDir }); // [] for non-md
for (const dest of destinations) { /* skipped when empty */ }
}The original pattern was:
// OLD (correct ordering)
const dest = mapDestination(file);
if (!dest) continue; // early exit before any I/O
const originalContent = fs.readFileSync(file, utf8);Why existing code does not prevent it
walkDir returns all files recursively, including meta.json files and any _-prefixed files. mapDestinations returns [] for these, causing the inner write loop to be a no-op, but the read and processing have already occurred. inlineComponentsMdx is wrapped in a try-catch so it swallows errors from binary/non-markdown content silently.
Impact
The impact is a performance regression, not a correctness failure. fs.readFileSync(binaryFile, "utf8") in Node.js does not throw; it returns a garbled string. inlineComponentsMdx handles errors via try-catch. Since destinations is empty for non-markdown files, no corrupted files are written. However, every non-markdown file in content/ (at minimum every meta.json and _-prefixed file) is read and parsed on every build run unnecessarily.
Step-by-step proof
walkDir(sourceDir)collects all files, includingcontent/foo/meta.json.- The loop reaches
content/foo/meta.json. fs.readFileSync("content/foo/meta.json", "utf8")succeeds, reading JSON content.inlineComponentsMdx(jsonContent, file)runs the regex import scanner on the JSON string.mapDestinations("content/foo/meta.json", ...)returns[]becausebase === "meta.json".- The inner
for (const dest of destinations)loop executes zero times. - Steps 3–4 were wasted work.
Fix
Move the mapDestinations call before readFileSync and restore the early-continue pattern:
for (const file of allFiles) {
const destinations = mapDestinations(file, { sourceDir, outputDir });
if (destinations.length === 0) continue;
const originalContent = fs.readFileSync(file, utf8);
const processed = inlineComponentsMdx(originalContent, file);
for (const dest of destinations) { ... }
}| test('creates root-level aliases for marketing pages and keeps security paths scoped', () => { | ||
| const tempRoot = fs.mkdtempSync(path.join(os.tmpdir(), 'copy-md-sources-')); | ||
| const sourceDir = path.join(tempRoot, 'content'); | ||
| const outputDir = path.join(tempRoot, 'public', 'md-src'); | ||
|
|
||
| writeFile(path.join(sourceDir, 'marketing', 'terms.mdx'), '# Terms'); | ||
| writeFile(path.join(sourceDir, 'security', 'toms.mdx'), '# TOMS'); | ||
|
|
||
| copyAll({ sourceDir, outputDir }); | ||
|
|
||
| assert.equal(fs.readFileSync(path.join(outputDir, 'marketing', 'terms.md'), 'utf8'), '# Terms'); | ||
| assert.equal(fs.readFileSync(path.join(outputDir, 'terms.md'), 'utf8'), '# Terms'); | ||
| assert.equal(fs.readFileSync(path.join(outputDir, 'security', 'toms.md'), 'utf8'), '# TOMS'); | ||
| assert.equal(fs.existsSync(path.join(outputDir, 'toms.md')), false); | ||
|
|
||
| fs.rmSync(tempRoot, { recursive: true, force: true }); | ||
| }); |
There was a problem hiding this comment.
🟡 The temp directory cleanup at the end of the test will not run if any assert.equal call throws an AssertionError, leaving directories under /tmp on test failures. Wrapping the assertions in a try/finally block (or using after() from node:test) ensures cleanup always runs.
Extended reasoning...
What the bug is and how it manifests
In scripts/copy_md_sources.test.js, the test body creates a temp directory via fs.mkdtempSync, runs assertions, and then calls fs.rmSync(tempRoot, { recursive: true, force: true }) as the last statement. JavaScript does not guarantee that the final statement runs if a preceding statement throws, so if any assertion fails, the cleanup is skipped and a directory like /tmp/copy-md-sources-XXXXXX is left behind.
The specific code path that triggers it
Lines 24-27 call assert.equal(...) four times. assert.equal (from node:assert/strict) throws AssertionError on mismatch. Since there is no try/finally guarding the cleanup at line 29, any thrown error causes the JavaScript call stack to unwind past that statement entirely.
Why existing code does not prevent it
The test uses a plain synchronous test body with no lifecycle hooks (after(), afterEach()) and no try/finally. The node:test framework does call an after() hook after the test regardless of pass/fail, but that mechanism is not used here. There is nothing to intercept the thrown AssertionError and redirect execution to the cleanup.
Impact
On test failure, the /tmp directory accumulates copy-md-sources-XXXXXX directories. In CI environments these are typically cleaned at job end, and most OS configurations periodically sweep /tmp, so the practical impact is low. This does not affect production behavior or cause false positive or negative test results.
How to fix it
Wrap assertions in a try/finally block to guarantee cleanup:
const tempRoot = fs.mkdtempSync(path.join(os.tmpdir(), 'copy-md-sources-'));
try {
// setup and assertions
} finally {
fs.rmSync(tempRoot, { recursive: true, force: true });
}Alternatively, use node:test's after() hook registered before assertions run.
Step-by-step proof
- Test begins: fs.mkdtempSync creates /tmp/copy-md-sources-abc123.
- copyAll runs and writes files to the temp output dir.
- assert.equal at line 24 throws AssertionError due to a content mismatch (hypothetical failure).
- JavaScript unwinds the call stack; line 29 (fs.rmSync(tempRoot, ...)) is never reached.
- /tmp/copy-md-sources-abc123 and all its contents remain on disk until the OS cleans /tmp.
Motivation
Download as PDFfeature broke because the site no longer exposed root-level.mdsource URLs like/terms.mdthat the PDF endpoint expects.public/md-src/marketing/*, so legacy direct markdown URLs were missing and PDF requests failed.Description
scripts/copy_md_sources.jsto return multiple output destinations per source and renamedmapDestinationtomapDestinationsto support this behavior.content/marketing/*pages (for example, producing bothpublic/md-src/marketing/terms.mdandpublic/md-src/terms.md).sourceDir/outputDir, exporting helpers (mapDestinations,copyAll,cleanOutputDir), and guarding execution withrequire.main === moduleto avoid side effects on import.scripts/copy_md_sources.test.jsto validate that marketing pages get root-level aliases while non-marketing docs (e.g.,security/toms) remain scoped.Testing
node --test scripts/copy_md_sources.test.js, which verifies the root-level alias for marketing pages and scoped behavior for security pages, and the test passed.Codex Task