perf(core): Automated performance tuning by Claude#1633
Conversation
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
⚡ Performance Benchmark
Details
Historya008d75 perf(file): Fast-path the ignore-file predicate without per-path resolution
b053bbe perf(file): Fast-path the ignore-file predicate without per-path resolution
632bf8f perf(file): Replace globby's gitignore machinery with an in-repo synchronous ignore-file filter
a890579 perf(output): Defer line-count and markdown-delimiter scans with lazy render-context getters
0fe40b4 perf(file): Sample-scan base64 run detection in truncateBase64
2d0a45a perf(file): Sample-scan base64 run detection in truncateBase64
1f2621e perf(file): Cache readdir results across globby's double traversal
7eeca34 perf(security): Stream security check batches during file collection
21fa845 perf(security): Pre-warm security worker pool to overlap spawn with file collection
5101545 perf(file): Answer globby's gitignore stat calls from readdir dirent types
e722492 perf(file): Answer globby's gitignore stat calls from readdir dirent types
|
There was a problem hiding this comment.
Code Review
This pull request introduces a custom fs adapter for globby to cache directory entry types during traversal, significantly reducing the number of stat syscalls when applying gitignore rules. It also adds a comprehensive test suite to verify this optimization. The review feedback points out a cross-platform issue on Windows where backslash path separators cause cache misses. To resolve this, it is recommended to normalize paths to POSIX style using toPosixPath in both the caching and lookup steps.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| for (const entry of entries) { | ||
| if (entry.isDirectory() || entry.isFile()) { | ||
| isDirectoryByPath.set(path.join(dirPath, entry.name), entry.isDirectory()); | ||
| } | ||
| } |
There was a problem hiding this comment.
On Windows, path.join produces paths with backslashes (\\), whereas globby/fast-glob standardizes on forward slashes (/) internally. This causes the cache lookups in stat to always miss on Windows, silently bypassing the performance optimization. Normalizing the path to POSIX style using the existing toPosixPath helper ensures the cache works correctly across all platforms.
| for (const entry of entries) { | |
| if (entry.isDirectory() || entry.isFile()) { | |
| isDirectoryByPath.set(path.join(dirPath, entry.name), entry.isDirectory()); | |
| } | |
| } | |
| for (const entry of entries) { | |
| if (entry.isDirectory() || entry.isFile()) { | |
| isDirectoryByPath.set(toPosixPath(path.join(dirPath, entry.name)), entry.isDirectory()); | |
| } | |
| } |
There was a problem hiding this comment.
Investigated — the cache keys actually do match on Windows, so this suggestion isn't applied.
The stat lookup path doesn't come from fast-glob's slash-separated strings: globby's filter runs every result through path.normalize() and then path.resolve(cwd, …) (globby createFilterFunctionAsync), which produces native-separator absolute paths on Windows. Our cache key is built with path.join(dirPath, entry.name), which applies the same native normalization (it converts the walker's / joins to \). Verified with path.win32:
const p = require('path').win32;
const cwd = 'C:\\Users\\x\\repo';
const walkerDir = p.resolve(cwd, '.') + '/src/core'; // fast-glob walker joins with '/'
p.join(walkerDir, 'file.ts') // cache key
// => 'C:\\Users\\x\\repo\\src\\core\\file.ts'
p.resolve(cwd, p.normalize('src/core/file.ts')) // globby stat path
// => 'C:\\Users\\x\\repo\\src\\core\\file.ts' — identicalApplying toPosixPath would also be a small correctness risk on POSIX, where \ is a legal filename character: a blanket \ → / rewrite could collide two distinct paths and serve a wrong isDirectory, whereas a separator mismatch in the current code merely falls through to a real stat (perf-neutral, never wrong). Added a docstring note in createGlobbyFsAdapter documenting this invariant.
Generated by Claude Code
| const stat = (statPath: string, callback: (error: NodeJS.ErrnoException | null, stats: Stats) => void): void => { | ||
| const isDirectory = isDirectoryByPath.get(statPath); | ||
| if (isDirectory === undefined) { |
There was a problem hiding this comment.
To match the POSIX-normalized keys stored in isDirectoryByPath, the lookup path in stat should also be normalized using toPosixPath.
| const stat = (statPath: string, callback: (error: NodeJS.ErrnoException | null, stats: Stats) => void): void => { | |
| const isDirectory = isDirectoryByPath.get(statPath); | |
| if (isDirectory === undefined) { | |
| const stat = (statPath: string, callback: (error: NodeJS.ErrnoException | null, stats: Stats) => void): void => { | |
| const isDirectory = isDirectoryByPath.get(toPosixPath(statPath)); | |
| if (isDirectory === undefined) { |
There was a problem hiding this comment.
Not applied for the same reason as the sibling comment: globby's stat path goes through path.normalize() + path.resolve() before reaching the adapter, so it carries native separators that already match the path.join-built cache keys on Windows (verified with path.win32). A toPosixPath rewrite on lookups would additionally risk key collisions on POSIX filenames containing \. The invariant is now documented in the adapter's docstring.
Generated by Claude Code
…types intent(file-search): automated perf tuning pass — single highest-impact, behavior-preserving change against a ~1.25s default pack run learned(globby): with gitignore enabled, globby's ignore filter calls fs.stat on every matched path (~1100 syscalls on this repo) only to decide whether trailing-slash rules apply; the traversal's readdir(withFileTypes) already carried each entry's type decision(fs-adapter): pass a per-call fs adapter to globby that records dirent types during readdir and serves the stat calls from memory; symlinks/special entries and unseen paths fall through to a real stat since stat follows links while dirents do not rejected(secretlint-prefilter): trigger-regex pre-filter before lintSource (~88ms) — a hand-maintained trigger list can silently miss preset rules and produce security false negatives rejected(gitlog-token-cache): caching the git log token count (~10-16ms) — below the 2% improvement threshold and off the critical path constraint(fs-adapter): statSync must be forwarded so globby's cwd-is-directory validation keeps running; ignore.js readFile falls back to real fs on its own constraint(cache-keys): keys are path.join-normalized to native separators, matching globby's normalize+resolve chain on every platform; deliberately NOT posix-normalized because a blanket backslash rewrite could collide distinct POSIX paths (review feedback on PR #1633 investigated and declined — verified via path.win32 that join/resolve produce identical keys) Benchmark (repomix repo itself, ~1100 files, 8 interleaved runs each, warm): - end-to-end: median 1242ms -> 1182ms (-60ms, -4.8%) - globby phase: 254ms -> 206ms (-48ms) - output byte-identical vs base build (cmp) for default run, --include-empty-directories, symlink/trailing-slash edge cases, and non-git directories - npm run lint clean (3 pre-existing warnings), npm run test 1357/1357 pass
e722492 to
5101545
Compare
Deploying repomix with
|
| Latest commit: |
9276d79
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://861b5050.repomix.pages.dev |
| Branch Preview URL: | https://perf-auto-perf-tuning.repomix.pages.dev |
…ile collection intent(security): automated perf tuning pass — single highest-impact, behavior-preserving change against a ~860ms default pack run learned(security-workers): spawning the 2 secretlint worker threads costs ~50-100ms each (thread creation + 7MB preset bundle import) and previously happened inside runSecurityCheck, i.e. after file collection — squarely on the critical path; the security leg gates the security∥processFiles phase (~208ms of a ~242ms phase) decision(security-pool): create the pool in pack() right after createMetricsTaskRunner and post one empty-items task per worker, mirroring the already-merged metrics prewarm pattern; the spawn then overlaps the ~165ms collect+git phase instead of starting after it decision(pool-teardown): start taskRunner.cleanup() right after the security phase resolves so the worker destroy overlaps output generation and metrics; the finally block awaits the same promise (starting it if an earlier stage threw) so no path leaks threads constraint(warmup-count): warm-up count mirrors Tinypool's own sizing (min(2, concurrency, ceil(numOfTasks/100))) so no thread is spawned that the real workload would not have created anyway constraint(secretlint-rules): same rules run on exactly the same content — only worker spawn timing changes; an empty-items warmup batch returns [] without linting anything rejected(metrics-prewarm-zero): skipping metrics warmup on warm-likely runs (~41ms) — exposes a ~150ms BPE init to the common "one file changed" incremental run, the exact case the merged cache-aware prewarm hedges rejected(base64-sampling): sparse-sampling hasLongBase64Run (~38ms CPU) — runs inside processFiles, which is parallel to and shorter than the security leg, so the CPU saving does not move wall time rejected(metrics-cleanup-noawait): fire-and-forget metrics pool destroy (~14-17ms) — borderline at the 2% threshold, kept as a future candidate Benchmark (repomix repo itself, ~1100 files, 20 interleaved warm pairs, quiet 4-core host): - end-to-end: median 859ms -> 815.5ms (-43.5ms, -5.1%) - security check phase (trace log): 207.8ms -> 161.9ms (-46ms) - output byte-identical vs base build (cmp) for default run and --no-security-check - npm run lint clean (3 pre-existing warnings), npm run test 1365/1365 pass (8 new tests)
The security check previously dispatched all its worker batches only after file collection had fully completed, putting the entire lint wall time (~160ms on this repo) on the critical path between collection and output generation. The security workers sit idle during collection (which is I/O-bound on the main thread), so the lint work can overlap it almost entirely. Changes: - New createSecurityCheckStream (securityCheckStreaming.ts): buffers collected files and dispatches each full BATCH_SIZE batch to the worker pool immediately. finalize() flushes the remainder plus git diff/log items, enqueues any raw file that never arrived via addFile (so custom collectFiles implementations that ignore the callback cannot skip the check), awaits all batches, and re-orders suspicious file results back to canonical rawFiles order. Batch failures are captured (not rejected) until finalize so abandoned sessions on error paths cannot surface unhandled rejections. - collectFiles gains an optional onFileCollected callback, invoked in completion order for every file that ends up in rawFiles. - pack() wires the callback to the stream using final display paths (multi-root labels applied), and forwards the stream to validateFileSafety, which uses finalize() instead of runSecurityCheck. - createSecurityCheckTaskRunner now runs before file search and warms up in two stages: one worker immediately (its spawn + secretlint preset import overlap the ~155ms search phase, so it is ready when the first batch arrives), and the second via completeWarmup() once the file count is known — preserving the existing sizing rule (second worker only from 101 items). Behavior is unchanged: the same items are linted with the same batch size and rules, and outputs were verified byte-identical (cmp) against the base build for the default run, --no-security-check, --include-empty-directories, multi-root, and a planted-secrets case (same 2 files flagged and excluded). runSecurityCheck keeps its original behavior for non-streamed callers (MCP, lib API). Benchmark (this repo, warm, 39 interleaved pairs, 4-core Linux, default pack): end-to-end median 869ms -> 832ms (-37ms, -4.3%), paired mean delta -34.8ms (-4.0%), paired t = 5.51. Trace: security-check tail after collection ~160ms -> ~60-90ms. Variants measured and rejected: holding dispatch until the metrics warm-up settles (warm-up spans nearly the whole collection window, nullifying the overlap) and capping pre-finalize dispatch to one batch in flight (paired delta ~2ms, no effect). npm run test: 1378/1378 pass (12 new). npm run lint: clean. https://claude.ai/code/session_015sBq63cfQRHYkmnvrokGF2
intent(file-search): automated perf tuning pass — single highest-impact, behavior-preserving change against a ~810ms default pack run learned(globby-traversal): a single globby() call walks the tree twice — globIgnoreFiles discovers .gitignore/.repomixignore/.ignore files, then the main fast-glob scan re-walks the same tree — issuing 582 readdir(withFileTypes) calls for 291 directories on this repo; both walks share the per-call options.fs adapter, so it is the natural cache point decision(fs-adapter): record successful withFileTypes readdir results in the existing createGlobbyFsAdapter and replay the second walk's calls via process.nextTick (preserves the async callback contract fast-glob expects); errors are never cached so a transiently failing directory is retried constraint(dirent-sharing): both traversals receive the same Dirent[] array — safe because fast-glob wraps dirents in its own entry objects and never mutates the source array constraint(snapshot-semantics): both walks now see one consistent directory snapshot per search; previously a directory changing between the two walks could be listed differently by each (noted in the adapter docstring) rejected(security-batch-size): SECURITY_CHECK_BATCH_SIZE 50 -> 200 measured -58ms under heavy machine load, but 25 quiet interleaved pairs show identical medians (793ms vs 793ms) — the IPC overhead it removes only matters on contended hosts; 400 regresses (~+20ms, worker imbalance) rejected(lazy-render-context): lazy fileLineCounts/markdownCodeBlockDelimiter getters in createRenderContext (~14-15ms on the default XML path, 1.8%) — below the 2% threshold rejected(lazy-handlebars-import): deferring the handlebars import cuts ~17ms of startup but the deferred import() evaluation blocks in-flight collection I/O callbacks (a 0.24ms readFile stretched to 94ms in the probe), netting only ~4ms end to end rejected(metrics-cleanup-noawait): fire-and-forget metrics pool destroy (~15ms median) — still borderline below threshold, unchanged from the previous pass Benchmark (repomix repo itself, ~1100 files, 25 interleaved warm pairs, quiet 4-core Linux, default pack): - end-to-end median 808ms -> 780ms (-28ms, -3.5%), paired delta median -30ms, paired mean -28.8ms (t = 4.6), 20/25 pairs improved - [globby] search phase 166-177ms -> 139-149ms - readdir(withFileTypes) during search: one call per directory; a new regression test asserts no directory is listed twice - output byte-identical (cmp) vs the base build for the default run, --include-empty-directories, multi-root (src website), and --no-gitignore (with the A/B-swapped lib file excluded from the pack) npm run test: 1379/1379 pass (1 new). npm run lint: clean (3 pre-existing warnings in unrelated files). https://claude.ai/code/session_011DHBuMqYeyMgJuYRSeJxSa
intent(file-process): automated perf tuning pass — single highest-impact, behavior-preserving change against a ~865ms default pack run; truncateBase64 is enabled in this repo's own config so its precondition scan runs on every packed file in the benchmark workload
learned(base64-scan): hasLongBase64Run walked every character of every file (~5.5MB per pack, 23ms main-thread self time in CPU profiles, 35ms isolated) even though it almost always returns false — the per-character loop was itself the previous optimization over the regex it gates
decision(sampled-scan): sample one character every MIN_BASE64_LENGTH_STANDALONE (256) positions — any qualifying run occupies 256 consecutive indices, so it must contain a sample point; only a sampled base64-class hit triggers a bounded outward expansion to measure the surrounding run, and the sampling phase resets cleanly after each short-run skip (next possible run from hi+1 always covers sample hi+256)
constraint(equivalence): differential-tested against the per-character reference on the full repo corpus (1096 files, 0 mismatches) plus 20k randomized fuzz cases; a deterministic-LCG differential test now pins both false-positive and false-negative directions in the suite
rejected(regex-precheck): /[A-Za-z0-9+/]{256}/.test() measured 4.5x SLOWER than the per-character loop (155ms vs 35ms on the corpus) — bounded-repetition re-scanning at each start position, not a viable replacement
rejected(early-git-token-dispatch): pre-dispatching git diff/log token counts from the packager — with a warm token cache they resolve while calculateMetrics awaits outputPromise (Promise.all resolves in ~0ms; the 63-67ms wall time is main-thread-busy completion latency, not queue wait), e2e median +15ms under noise, unproven
rejected(collect-concurrency): FILE_COLLECT_CONCURRENCY 50 -> 128/256 — identical medians over 40 quiet interleaved runs; libuv's 4-thread pool is saturated at depth 50, queue depth adds nothing
rejected(startup-lazy-imports): module-level import() prefetches of tinypool/fast-glob/handlebars all measure 0 to -3ms — ESM already fetches/compiles the static graph in parallel; the budget is sequential module evaluation (~255 modules), only bundling would cut it
rejected(lazy-render-context): skipping fileLineCounts + markdownCodeBlockDelimiter on the XML path re-measured at ~11ms p50 quiet (6.2 + 4.7) — still below the 2% threshold, matching the previous pass's rejection
Benchmark (repomix repo itself, ~1100 files, 20 interleaved warm pairs,
quiet 4-core Linux, default pack, pristine HEAD worktree build vs
patched build):
- end-to-end median 865ms -> 820.5ms (paired delta median -26.5ms,
-3.1%), paired mean -37.5ms (t = 5.14), 18/20 pairs improved
- isolated scan cost over the packed corpus: 35.6ms -> 1.6ms p50 (~22x)
- output byte-identical (cmp) vs the base build on the same tree
- 6 new tests: stride alignments 0-511, run ending at EOF,
whole-content run, phase reset after short-run skips, near-threshold
non-matches, and the seeded differential fuzz
npm run test: 1385/1385 pass. npm run lint: clean (3 pre-existing
warnings in unrelated files).
https://claude.ai/code/session_01Ea6eConhLEQFKZsVkJz1zE
2d0a45a to
0fe40b4
Compare
Perf-tuning run 2026-06-12: no qualifying candidate foundThis automated pass investigated 5 scopes in parallel (startup/module evaluation, packager orchestration, file collection, file search, output/metrics tail) against the current branch tip ( Investigated and rejected (this run)
Where the remaining time is (for future reference)~150ms module evaluation (mitigated by compile cache; only architectural changes would cut it), ~150-170ms globby (76ms already saved by the readdir cache on this branch; the residual union-filter cost is dominated by per-result async ignore matching), ~230-260ms collection I/O (kernel-bound, overlapped with security), ~100ms output render+write (overlapped with git token workers), ~75ms git diff/log token counting (the slower parallel branch at the tail). The five changes already on this branch remain the full set of measurable wins; the pipeline now has no serial segment whose removal clears the 2% bar on a warm run. Generated by Claude Code |
… render-context getters createRenderContext eagerly ran calculateFileLineCounts and calculateMarkdownDelimiter — two full scans over every packed file's content (~5.5MB) — on every run, even though fileLineCounts is only consumed by skill generation and markdownCodeBlockDelimiter only by the markdown/skill templates. Memoized getters defer both scans, so the default XML (and plain/JSON) path never pays for them. Output is byte-identical (verified with cmp for xml and markdown styles); when a template or packSkill touches the property, the same computation runs once and is cached. Why this clears the bar now when earlier rounds rejected it: this round's tail profiling showed that on warm runs (token cache hit) the metrics workers resolve instantly and git token tasks complete ~28ms before produceOutput, so generateOutput is the sole main-thread bottleneck at the tail — the scans are wall-visible, not hidden behind the parallel metrics branch as previously assumed. Benchmark (packing this repo, warm, quiet 4-core Linux, interleaved ABBA pairs, node bin/repomix.cjs --quiet): - batch 1 (20 pairs): median delta -21.7ms, mean -20.1ms, t=-3.51, 15/20 improved - batch 2 (30 pairs): median delta -25.6ms, mean -18.0ms, t=-2.49, 19/30 improved - batch 3 (40 pairs): median delta -40.3ms, mean -40.9ms, t=-6.71, 37/40 improved - pooled (90 pairs): mean -28.6ms on a ~1045ms baseline = -2.7% Isolated scan cost is ~11-16ms (warm); the larger e2e delta is consistent with reduced allocation/GC pressure from dropping the per-line match(/\n/g) array allocations at peak heap. intent(perf-tuning): automated round targeting >=2% end-to-end CLI improvement with behavior preserved decision(render-context): memoized getters over style-conditional skips — RenderContext shape and all call sites unchanged, and any consumer that does read the properties still gets identical values rejected(file-search): gitignore:false + prebuilt globby predicate measured -5.2% e2e but unshippable — depends on globby's unexported ignore.js internals (exports-map patch won't exist for npm consumers) and loses globby's gitignore-pattern directory pruning, risking large traversal regressions on repos with big ignored dirs and no negation patterns learned(metrics-tail): on warm runs the tail critical path is produceOutput -> write -> wrapper extraction; git diff/log token tasks finish ~28ms earlier, so render-side CPU cuts are wall-visible up to that slack https://claude.ai/code/session_01RD8vNvv1qtYV8BgdxMU7js
…hronous ignore-file filter With gitignore:true / ignoreFiles, globby filters every matched path through an async predicate — one Promise per path via Promise.all plus a promisified stat each, ~1,100 microtask round-trips per search on this repo. searchFiles now discovers and parses .gitignore / .repomixignore / .ignore itself (new gitignoreParse.ts + gitignoreFilter.ts, using the `ignore` package that globby itself uses), passes gitignore:false to globby, and applies the predicate as a synchronous Array.filter over the results. The fast-glob pruning-pattern injection (no-negations fast path) and parent-.gitignore collection up to the git root are replicated exactly, and the discovery walk shares the readdir-caching fs adapter with the main scan so the tree is still read only once per search. Equivalence verification: - byte-identical output (cmp) vs HEAD build: default pack, --no-gitignore, multi-root (src website), subdir pack (website/client) - searchFiles/listFiles/listDirectories parity on synthetic repos: negations, nested .gitignore (CRLF), anchored and trailing-slash patterns, .repomixignore/.ignore, 300-dir ignored tree (readdir counts equal: 301 = 301), subdir-of-git-root with parent gitignores and nested negation override - differential fuzz vs globby's internal predicate: 9,600 checks across 60 random ignore-file trees, 0 mismatches; pruning patterns equal as multisets across 120 configurations Benchmark (packing this repo, warm, quiet 4-core Linux, 30 interleaved ABBA pairs, node bin/repomix.cjs --quiet): - e2e median 816ms -> 796.5ms, paired delta median -29ms (-3.6%), mean -25.7ms (-3.1%), t = -3.40, 24/30 pairs improved - search phase ([globby] trace) 149-189ms -> 125-145ms intent(perf): round-7 automated pass; the previous round measured this approach at -5.2% but rejected it as unshippable via globby internals decision(gitignore-filter): keep globby for traversal and include handling; replace only the ignore-file machinery — preserves expandDirectories and negative-include behavior that a raw fast-glob swap would silently lose rejected(globby-public-api): isIgnoredByIgnoreFiles alone — it hardcodes includeParentIgnoreFiles=false (drops parent .gitignore files when packing a subdirectory) and does not expose patterns for traversal pruning rejected(candidates): skipping the redundant sortOutputFiles call in generateOutput (-22ms) and a direct XML string builder replacing the Handlebars render (-19ms) — both clear the 2% bar but are smaller than this change; viable for future rounds constraint(equivalence): ignore-file discovery order is nondeterministic (async fast-glob) in globby today as well; order only affects cross-file negation interplay, so the behavior class is unchanged learned(globby): gitignore:true injects converted gitignore patterns into fast-glob's ignore option for directory pruning only when cwd is the git root and no negations exist anywhere; replicated in convertPatternsForFastGlob https://claude.ai/code/session_014MsDPw1ZUnHVU4giu48JA7
…lution
The isIgnored() predicate returned by buildIgnoreFileFilter routed every
tested path through createIgnoreMatcher, which performs ~5 path operations
per call (resolve, normalize, relative, isInsidePath check, slash
conversion) before reaching the `ignore` matcher — applied to every file
and directory the scans emit (~1,400 paths per search on this repo, plus
the empty-directory scans), this dominated the post-scan filter cost.
fast-glob only ever emits clean, slash-separated paths relative to the
scan root. For those, the baseDir-relative form the `ignore` package
expects is a constant prefix (the scan root's path below the git root;
empty when they coincide) plus the input string, so the fast path is now
one string concatenation + ig.ignores(). A single regex routes every
other input shape ('', '.'/'..' segments, backslashes, doubled or
trailing slashes, absolute paths) to the unchanged legacy matcher, so
edge-case semantics stay exactly as before.
decision(gitignore-filter): keep createIgnoreMatcher as the fallback for
non-fast-glob input shapes instead of replicating its normalization
inline — equivalence by construction, and the fallback never runs on the
hot path
rejected(double-sort): skipping the second sortOutputFiles inside
generateOutput — re-measured at ~0.12ms isolated; the change-7 round's
22ms figure was the git-log subprocess cost already eliminated by
prefetchSortData
rejected(xml-direct-builder): direct string builder replacing the
Handlebars xml render — noise-level on the current tip (t=-0.14 over 20
interleaved pairs); earlier ~19ms estimates came from builds without the
landed lazy render-context getters
rejected(md5-precompute): computing contentCacheKey during processFiles
to clear it from the tail — +7.6ms (noise, t=0.86) over 20 interleaved
pairs on the current tip
learned(bench): this container runs ~1.6-1.7x slower than the previous
rounds' quiet host (e2e baseline ~1615ms vs ~800-950ms); relative deltas
from interleaved pairs are the comparable metric
Benchmark (32 interleaved ABBA pairs, warm, default pack of this repo,
4-core Linux): e2e median 1615ms -> 1540ms, paired mean delta -82.0ms
(-5.1%), median delta -74ms (-4.6%), t = -7.68, 29/32 pairs improved.
Search phase ([globby] trace, --verbose): 749-778ms -> 643-684ms with
identical results (1099 files, 255 directories).
Output byte-identical (cmp) vs the previous build for: default pack,
subdirectory pack (website/client — exercises the git-root prefix
branch), multi-root (src website), and --no-gitignore. 1416/1416 tests
pass; lint clean (3 pre-existing warnings in unrelated files).
https://claude.ai/code/session_01N3uqykUShsrDKkyvjuKi13
b053bbe to
a008d75
Compare
# Conflicts: # package-lock.json # package.json
Perf-tuning run 2026-06-13: no qualifying candidate foundThis automated pass first merged the latest It then investigated 5 non-overlapping scopes in parallel (token/metrics pipeline, file processing, output generation, git operations + the MD5 cache-key loop, and a whole-process CPU/GC profile) against the current branch tip ( Strongest candidate — measured, but below the bar
Investigated and rejected this run
VerdictThe MD5→SHA-1+charLen cache-key change is a correct, safe, measurable ~1.2–1.6% win but falls short of the strict 2% threshold, so it was reverted. No other behavior-preserving single change in Generated by Claude Code |
Summary
Eight behavior-preserving optimizations from automated perf-tuning passes, each the single highest-impact candidate of its run:
searchFiles: fs adapter answering globby's gitignorestat()calls from dirent types (commit 1)searchFiles: cache readdir results across globby's internal double traversal (commit 4)truncateBase64: sampled base64-run detection replacing the per-character precondition scan (commit 5)createRenderContext: lazy memoized getters forfileLineCounts/markdownCodeBlockDelimiter(commit 6)searchFiles: in-repo synchronous gitignore filter replacing globby's async per-path predicate (commit 7)searchFiles: fast-path the ignore-file predicate with constant-prefix keys + directignore.ignores()(commit 8)Each change was individually verified against its own baseline on its own machine: −4.8%, −5.1%, −4.3%, −3.5%, −3.1%, −2.7%, −3.6% and −4.6% end-to-end respectively (details below; runs were measured on different machines/days, so per-change deltas are the reliable numbers).
Change 1: globby fs adapter (searchFiles)
With
gitignore: true, globby's ignore filter callsfs.staton every matched path (~1,100 on this repo per run) only to decide whether trailing-slash rules (dir/) apply. The information it needs (is this path a directory?) was already produced by the traversal'sreaddir(withFileTypes)calls.createGlobbyFsAdapter()insrc/core/file/fileSearch.ts:readdirto record each dirent's type while delegating to the realnode:fsreaddir.stat()calls from that map; falls through to a realstatfor symlinks/special entries and unseen paths, so behavior is preserved exactly.statSyncso globby's cwd-is-directory validation keeps running; a fresh adapter per call means the cache cannot go stale.Regression test:
tests/core/file/fileSearchFsAdapter.test.ts(real directory tree with symlinks, trailing-slash rules, nested.gitignore; asserts per-path stats are eliminated).Benchmark (8 interleaved warm runs): end-to-end median 1242ms → 1182ms (−60ms, −4.8%);
[globby]phase 254ms → 206ms.Review feedback addressed: gemini-code-assist suggested posix-normalizing cache keys for Windows — investigated and declined with
path.win32verification (globby's stat path goes throughpath.normalize()+path.resolve(), producing the same native-separator keys; a blanket\→/rewrite would risk collisions on POSIX). Documented in the adapter's docstring; see thread replies.Change 2: security worker pool pre-warm (packager / securityCheck)
Spawning the 2 secretlint worker threads costs ~50–100ms each (thread creation + importing the 7MB
@secretlint/secretlint-rule-preset-recommendbundle) and previously happened insiderunSecurityCheck— i.e. after file collection finished, squarely on the critical path. The security leg gates the security ∥ processFiles phase (~208ms of a ~242ms phase).createSecurityCheckTaskRunner()insrc/core/security/securityCheck.tscreates the pool early inpack()(mirroring the already-merged metrics prewarm pattern) and posts one empty-items task per worker; the spawn then overlaps the ~165ms collect+git phase.min(2, concurrency, ceil(numOfTasks/100))), so no thread is spawned that the real workload would not have created.[]without linting anything — the same rules run on exactly the same content; only spawn timing changes.finallyblock awaits the same memoized promise on every path (including errors and the skill-generate early return), so no path leaks threads.runSecurityCheck/validateFileSafetyaccept the pre-created runner via theirdepsobjects (Partial-merge, backward compatible — existing callers unaffected).8 new tests: warmup sizing/failure tolerance, runner reuse without pool creation or premature cleanup, forwarding through
validateFileSafety, and packager lifecycle (cleanup on success / on error / disabled security).securityScanSpec.test.tsnow routes its inline runner through the new production forwarding path, keeping the end-to-end regression net intact.Benchmark (20 interleaved warm pairs, quiet 4-core host): end-to-end median 859ms → 815.5ms (−43.5ms, −5.1%); security check phase (trace log) 207.8ms → 161.9ms.
Change 3: stream security batches during file collection
Even with pre-warmed workers (change 2), all lint batches were dispatched only after file collection had fully completed, leaving the entire lint wall time (~160ms on this repo) on the critical path while the workers sat idle during the I/O-bound collection phase.
createSecurityCheckStream()(newsrc/core/security/securityCheckStreaming.ts):collectFilesnow reports each file via an optionalonFileCollectedcallback as soon as its content is read; the stream buffers them and dispatches every fullBATCH_SIZEbatch to the worker pool immediately, so lint work overlaps collection instead of following it.finalize()returns exactly whatrunSecurityCheckwould: it flushes the remainder plus git diff/log items (same construction, same trailing position), enqueues any raw file that never arrived via the callback (a safety net so customcollectFilesimplementations cannot skip the check), and re-orders suspicious file results back to canonicalrawFilesorder — streamed batches complete in nondeterministic collection order.finalizeawaits them — so error paths that abandon the session can never surface unhandled rejections;finalizere-throws the first failure likePromise.allwould.createSecurityCheckTaskRunner()now runs before file search, so the first worker's spawn + secretlint preset import (~100ms) overlaps the ~155ms search phase and the worker is ready when the first batch arrives. The second warm-up task is posted viacompleteWarmup()once the file count is known, preserving the existing sizing rule (second worker only from 101 items).runSecurityCheckkeeps its original behavior for non-streamed callers (MCP, lib API); multi-root display-path labeling is applied before items enter the stream, so streamed paths matchrawFilesexactly.12 new tests: batch sizing identical to
runSecurityCheck(full batches + remainder), eager dispatch before finalize, the unstreamed-file safety net, dedup of already-streamed files, git item construction/ordering, canonical result re-ordering, error propagation (incl. pre-finalize failures), progress totals, two-stage warm-up sizing/idempotency/failure tolerance, and theonFileCollectedcontract incollectFiles.Benchmark (39 interleaved warm pairs, 4-core host, default pack of this repo): end-to-end median 869ms → 832ms (−37ms, −4.3%), paired mean delta −34.8ms (−4.0%), paired t = 5.51. Trace: security-check tail after collection ~160ms → ~60–90ms.
Behavior verification (change 3)
cmp) between base and patched builds for the default run,--no-security-check,--include-empty-directories, and multi-root (src website).Change 4: readdir cache across globby's double traversal (commit 4)
This was flagged as the "future candidate" in the change-3 run, now measured and implemented. A single
globby()call walks the tree twice —globIgnoreFilesfirst discovers.gitignore/.repomixignore/.ignorefiles, then the main fast-glob scan re-walks the same tree — issuing 582readdir(withFileTypes)calls for 291 directories on this repo. Both walks share the per-calloptions.fsadapter, so it is the natural cache point.createGlobbyFsAdapter()now records each successfulreaddir(withFileTypes)result and replays the second walk's calls from memory viaprocess.nextTick(preserving the async callback contract fast-glob expects). One readdir syscall per directory per search.Dirent[]across walks is safe: fast-glob wraps dirents in its own entry objects and never mutates the source array (the only mutation site in@nodelib/fs.scandiris behindfollowSymbolicLinks: true, which repomix never sets).Benchmark (25 interleaved warm pairs, quiet 4-core Linux, default pack of this repo): end-to-end median 808ms → 780ms (−28ms, −3.5%), paired delta median −30ms, paired t ≈ 4.6, 20/25 pairs improved.
[globby]phase 166–177ms → 139–149ms.Behavior verification (change 4)
cmp) between base and patched builds for the default run,--include-empty-directories, multi-root (src website), and--no-gitignore(with the A/B-swapped lib file excluded from the pack, since--no-gitignorepackslib/itself).withFileTypes === truefilter guard) applied before push.Alternatives investigated and rejected (change 4 run, 5 parallel investigation passes)
SECURITY_CHECK_BATCH_SIZE50 → 200: measured −58ms under heavy machine load, but 25 quiet interleaved pairs show identical medians (793ms vs 793ms) — the IPC overhead it removes only matters on contended hosts; 400 regresses (~+20ms, worker imbalance).fileLineCounts/markdownCodeBlockDelimitergetters increateRenderContext(~14–15ms on the default XML path; they are only consumed by skill generation / the markdown style): below the 2% threshold. (Superseded — see change 6: re-measured at −2.7% e2e on the current tip, where the tail overlap structure has changed.)import()evaluation blocks in-flight collection I/O callbacks (a 0.24ms readFile stretched to 94ms in the probe), netting only ~4ms end-to-end. Module evaluation is single-threaded — this constraint applies to all lazy-import ideas in this codebase.stat()calls): <1ms, OS handles it efficiently.Change 5: sampled base64-run scan in truncateBase64 (commit 5)
This repo's own
repomix.config.jsonenablestruncateBase64: true(the CI benchmark workload), sohasLongBase64Run— the cheap precondition that gates the expensive standalone-base64 regex — runs over every packed file's content (~5.5 MB per pack). It walked every character (23ms main-thread self time in CPU profiles, 35.6ms isolated on the corpus) even though it almost always returns false.MIN_BASE64_LENGTH_STANDALONE(256) positions: a qualifying run occupies 256 consecutive indices, so it must contain a sample point — no run can slip between samples (the last sample is clamped tolen-1to cover the trailing partial window). Only a sampled base64-class hit triggers a bounded outward expansion measuring the surrounding run; after a too-short run the sampling phase resets past it (a qualifying run starting athi+1always contains samplehi+256).charCodeAtsemantics unchanged. Worst case stays O(n) (bounded ~2× character inspections on pathological alternating runs).Benchmark (20 interleaved warm pairs, quiet 4-core Linux, default pack of this repo, pristine HEAD worktree build vs patched build): end-to-end median 865ms → 820.5ms (paired delta median −26.5ms, −3.1%), paired mean −37.5ms, t = 5.14, 18/20 pairs improved. Isolated scan cost over the packed corpus: 35.6ms → 1.6ms p50 (~22×).
Behavior verification (change 5)
cmp) between base and patched builds on the same tree.Alternatives investigated and rejected (change 5 run, 5 parallel investigation scopes)
/[A-Za-z0-9+/]{256}/.test(): measured 4.5× slower than the per-character loop (155ms vs 35ms on the corpus) — bounded-repetition re-scanning at each start position.calculateMetricsawaitsoutputPromise(finalPromise.allresolves in ~0ms; the 63–67ms wall figure is main-thread-busy completion latency, not queue wait) — e2e median +15ms under noise, unproven.FILE_COLLECT_CONCURRENCY50 → 128/256: identical medians over 40 quiet interleaved runs; libuv's 4-thread pool is saturated at queue depth 50.import()): 0 to −3ms — ESM already fetches/compiles the static graph in parallel; the startup budget is sequential evaluation of ~255 modules, which only bundling would cut.fileLineCounts+markdownCodeBlockDelimiteron the XML path: re-measured at ~11ms p50 on a quiet machine (6.2 + 4.7) — still below the 2% threshold, matching the change-4 run's rejection. (Superseded — see change 6.)Round 6 (2026-06-12): no qualifying change — all candidates measured below the 2% threshold
This run investigated with 5 parallel scopes (startup, file search, file collection, output/metrics tail, cross-cutting profile), prototyped the two strongest candidates, and measured both below the ≥2% bar on a quiet machine; nothing was committed. Recording the negatives so future passes skip them:
ignoreFilespaths: fully prototyped (exact fast-glob deep/entry-filter replication via micromatch, git-root gating, fallback paths). An isolated investigation measured −45–53ms, but that machine was running 5 agents concurrently; on a quiet machine, interleavedsearchFilesmedians are identical (106–109ms both) and 10 CLI pairs average −1.6ms. With the change-4 readdir cache already landed, the discovery walk's marginal cost over a raw BFS is only ~10–15ms of fast-glob matching CPU — the 45–53ms figure was contention inflation. Reverted.handlebars/runtime(drops the compiler,neo-async,source-mapfrom the module graph; ~35–50ms isolated module-load saving): prototyped via lib patching, output byte-identical, but 20 interleaved CLI pairs show only ~7ms mean — module-graph savings compress e2e exactly like the change-5 run's lazy-import findings predicted.tokenCountCache: the worker-side encode is only ~5ms per item (trace:Counted tokens ... Took: 4.96ms); the 69/85ms "Git diff/log token calculation" wall figures are main-thread-busy completion latency, not compute — same artifact the change-5 run documented for early dispatch. No e2e win available.defaultActionlazy-loads); full-CLI bundle has hard blockers (commander's dynamicrequire,import.meta.urlinprocessConcurrency) and would be a build-system change beyond an automated perf pass.fs.readFileof the 1,095-file corpus is ~139ms of the ~162ms pipeline; remaining CPU is TextDecoder UTF-8 validation (~15ms, required). Sub-threshold micro-items only (shared TextDecoder ~0.5ms,path.resolve~1ms).hasLongBase64Run2KB-prescreen ~5ms; cold-run MD5 fast-reject via byteLength index ~5.5ms (cold only); sortPaths/reportTopFiles/lineCounts all <3ms real.Change 6: lazy render-context getters for fileLineCounts / markdownCodeBlockDelimiter (commit 6)
createRenderContexteagerly ran two full scans over every packed file's content (~5.5MB per pack) on every run:calculateFileLineCounts(consumed only by--skill-generate) andcalculateMarkdownDelimiter(consumed only by the markdown/skill templates). Both are now memoized getters, so the default XML (and plain/JSON) path never executes them; markdown/skill consumers compute them once on first access with identical values.RenderContextshape and all call sites are unchanged.Why this clears the bar now when the change-4/5/round-6 passes rejected it (~11–15ms function cost; "<1ms e2e, hidden behind the parallel metrics branch"): this round's tail tracing established that on warm runs the metrics workers are pure cache hits (zero dispatches) and the git diff/log token tasks complete ~28ms before
produceOutputresolves —generateOutputis the sole main-thread bottleneck at the tail, so render-side cuts are wall-visible up to that slack. The earlier "fully hidden" rationale does not hold on the current tip's overlap structure.Benchmark (90 interleaved ABBA pairs in 3 batches, quiet 4-core Linux, warm, default pack of this repo):
Isolated scan cost is ~11–16ms warm; the larger e2e delta is consistent with reduced allocation/GC pressure —
match(/\n/g)allocates one match string per line (~200k allocations across the corpus) at peak heap, right before the 5MB render.Behavior verification (change 6)
cmp) between base and patched builds for the xml and markdown styles (markdown exercises the getter path).lookupPropertytriggers own-property getters identically to data properties (verified against the runtime source), object-literal getters satisfy thereadonlyinterface,processedFilesis not mutated between context creation and render, and no code spreads/serializes aRenderContextoutside the skill path (where memoization keeps values identical).Alternatives investigated and rejected (change 6 run, 5 parallel scopes)
searchFiles:gitignore: false+ pre-built predicate from globby's internalgetIgnorePatternsAndPredicate, applied as a syncArray.filter: measured −5.2% e2e (median −56.5ms, t = −4.71, 19/20 pairs) with byte-identical output on this repo — but unshippable as prototyped.globby/ignore.jsis not in globby's exports map (the prototype patchednode_modules, which would not exist for npm consumers), and disabling globby's gitignore machinery silently drops its gitignore→fast-glob pattern injection that prunes ignored directories during traversal on repos whose collected gitignore patterns contain no negations (repomix does not feed.gitignorecontents into its ownignoreoption) — a large traversal-regression risk on repos with big ignored directories (.venv/,target/,build/). The discovery walk's ignore set would also differ from globby's (which inherits the full repomix ignore array). Future path: an upstream globby PR exportingignore.js(or the predicate helper) plus replication ofconvertPatternsForFastGlobwould unlock this ~5% safely.produceOutput); the "63–85ms git-token latency" is the synchronous MD5 cache-key loop (22–68ms) delaying dispatch — harmless here, and partial MD5 stays rejected (cache invalidation). Decoupling metrics fromwriteOutputToDiskvia anonContentReadycallback saves only ~8.5ms (~0.8%); overlapping pool destroy withsaveTokenCountCachesaves only the 1–7ms cache-save window. The unmergedperf/output-token-ipc-optimizationbranch is superseded by the wrapper fast path (warm XML runs never reachcalculateOutputMetrics).produceOutputwere the two scans removed by this change. Removing the secondsortOutputFilescall insidegenerateOutputre-confirmed rejected (0ms wall on a quiet machine per the round-6 measurement, andgenerateOutputis a public API whose direct callers may pass unsorted files).Change 7: in-repo synchronous gitignore filter (commit 7)
This ships the candidate the change-6 round measured at −5.2% e2e but rejected as unshippable: with
gitignore: true/ignoreFiles, globby filters every matched path through an async predicate —Promise.allover one promise per path plus a promisified stat each, ~1,100 microtask round-trips per search on this repo. The two original blockers are both resolved without touchingnode_modules:gitignoreParse.ts+gitignoreFilter.tsdiscover and parse.gitignore/.repomixignore/.ignorein-repo (using theignorepackage — the same matcher globby uses internally, promoted to a direct dependency along withfast-glob), build a synchronous predicate, and handsearchFilesthe same fast-glob pruning-pattern injection globby derives on its no-negations fast path. The main scan runs withgitignore: false, ignoreFiles: []and the predicate is applied as a plainArray.filter.expandDirectories, negative include patterns, and all other pattern preprocessing keep globby's exact behavior — the earlier prototype's raw fast-glob swap would have silently changed--include "src"-style semantics..gitignorecollection up to the git root (including worktree.gitfiles), per-ignore-file relative anchoring (gitignore spec §2.22.1), directory trailing-slash rules, and theusingGitRootpruning bail-out are ported line-by-line from globby v16'signore.js/utilities.js. The discovery walk shares the readdir-caching fs adapter (change 4) with the main scan, so the tree is still read once per search.Behavior verification (change 7)
cmp) vs the previous build: default pack,--no-gitignore, multi-root (src website), and a subdirectory pack (website/client, exercising parent-gitignore collection).searchFiles/listFiles/listDirectoriesparity on synthetic repos: negations, nested.gitignore(CRLF), anchored and trailing-slash patterns,.repomixignore/.ignore, a 300-directory ignored tree (readdir counts equal: 301 = 301 — traversal pruning preserved), and a subdir-of-git-root with parent gitignores and a nested negation override..codein the wrapped error) is byte-for-byte globby's own pre-existing behavior.Benchmark (30 interleaved ABBA pairs, quiet 4-core Linux, warm, default pack of this repo): e2e median 816ms → 796.5ms, paired delta median −29ms (−3.6%), mean −25.7ms (−3.1%), t = −3.40, 24/30 pairs improved. Search phase (
[globby]trace) 149–189ms → 125–145ms.Alternatives investigated and rejected (change 7 round, 5 parallel scopes)
sortOutputFilesinsidegenerateOutputvia analreadySortedflag: re-measured at −22ms (the round-6 "0ms" figure was wrong — the comparator's two dictionary-mode object lookups per comparison cost ~22ms for 1,072 files withsortByChangesenabled). Above threshold but smaller than change 7; strong candidate for a future round.{{{triple-brace}}}everywhere so direct concatenation is byte-identical, md5-verified). Above threshold but smaller than change 7 and adds template/builder dual maintenance; future candidate.contentCacheKey(~11ms isolated): below threshold and weakens the cache key — stays rejected.Change 8: fast-path the ignore-file predicate (commit 8)
The change-7 module kept globby's
createIgnoreMatcheras the per-path entry point ofisIgnored(): every tested path went through several path operations (resolve, normalize, relative, inside-path check, slash conversion) before reaching theignorematcher — ~1,400 file/directory paths per search on this repo, with the whole pipeline run twice for directories. That resolution work dominated the post-scan filter cost.buildIgnoreFileFilter: fast-glob only ever emits clean, slash-separated scan-root-relative strings, and for those the baseDir-relative form theignorepackage expects is a precomputed constant prefix (the scan root's path below the git root; empty when they coincide) plus the input — one string concatenation +ig.ignores(), zero per-pathnode:pathcalls.'',./..segments, backslashes, doubled or trailing slashes, absolute paths) to the unchanged legacycreateIgnoreMatcher— edge-case semantics are preserved by construction rather than re-implemented (theignorepackage would throw on several of these shapes).ignores()≡test().ignoredin theignorepackage source (including under negation patterns and its per-method caches), routing completeness against the package's throw conditions, Windows separator handling, and allfileSearch.tscall-site formats (absolute: false, nomarkDirectories) — no blockers. Tests/conventions review: no blockers; two comment-accuracy should-fixes applied before push.Behavior verification (change 8)
cmp) vs the previous build: default pack, subdirectory pack (website/client— exercises the git-root prefix branch), multi-root (src website), and--no-gitignore.Benchmark (32 interleaved ABBA pairs, warm, default pack of this repo, 4-core Linux; note this round's container is ~1.6–1.7× slower than earlier rounds' hosts, so absolute ms are not comparable across rounds): e2e median 1615ms → 1540ms, paired mean Δ −82.0ms (−5.1%), median Δ −74ms (−4.6%), t = −7.68, 29/32 pairs improved. Search phase trace: 749–778ms → 643–684ms with identical results (1099 files, 255 directories).
Alternatives investigated and rejected (change 8 round, 5 parallel scopes)
sortOutputFilesinsidegenerateOutput(the change-7 round's −22ms "strong candidate"): re-measured at ~0.12ms isolated on the real corpus — the 22ms figure was the git-log subprocess cost thatprefetchSortData's cache already eliminated. Permanently dead; the only remaining value would be structural cleanliness.contentCacheKeyduringprocessFiles: prototyped (optionalProcessedFile.contentCacheKeyfield, hash after all transforms, fallback for other callers), byte-identical — but +7.6ms (noise, t = 0.86) over 20 interleaved pairs on the tip. The warm-run MD5 loop (~19ms) does execute before the Handlebars render, but moving it into the processFiles∥security window does not shorten the wall on the current overlap structure.pack()is a library entry point (MCP server); dropping the awaited cleanup risks leaked worker threads in long-running hosts.maininstead of the branch tip and "re-discovered" the already-landed change 6; their absolute findings were discarded and only tip-verified measurements were used. Verify the worktree base commit before trusting agent measurements.Checklist
npm run test— 1416/1416 passnpm run lint— clean (3 pre-existing warnings in unrelated files)https://claude.ai/code/session_015jxJ9Nx3ncjkTTPHtLJqq7
https://claude.ai/code/session_015sBq63cfQRHYkmnvrokGF2
https://claude.ai/code/session_011DHBuMqYeyMgJuYRSeJxSa
https://claude.ai/code/session_01Ea6eConhLEQFKZsVkJz1zE
https://claude.ai/code/session_016akbidec8cut61QAGRKb99
https://claude.ai/code/session_01RD8vNvv1qtYV8BgdxMU7js
https://claude.ai/code/session_014MsDPw1ZUnHVU4giu48JA7
https://claude.ai/code/session_01N3uqykUShsrDKkyvjuKi13
Generated by Claude Code