Skip to content

Commit 60879a9

Browse files
committed
perf(core): Use synchronous reads in readRawFile to cut collection overhead
Switch `readRawFile` from `await fs.readFile` (node:fs/promises) to `readFileSync` (node:fs) for the primary file read. The function stays async and the non-UTF-8 encoding-detection fallback still uses async APIs. Why: profiling the warm-cache CLI pack of the repomix repo shows `collectFiles` (~200ms) sits on the serial critical path between file search and the security/metrics phases. Each async read expands into an open → fstat → read → close chain — ~4 event-loop cycles plus a libuv threadpool round-trip and promise/microtask churn per file (×1090 files). The async concurrency buys almost no real overlap here: the metrics worker is idle until file contents exist, and the git subprocesses complete well before collection does. On a warm page cache the dispatch overhead, not the syscall, dominated. `readFileSync` collapses each file to a single blocking syscall. Behavior is preserved: the same files are read and the same bytes are decoded (size check, NULL-byte binary probe, UTF-8 fast path and legacy encoding fallback are all unchanged). Output verified byte-identical (xml, with and without --truncate-base64) against the base build, and the full-repo pack produces an identical 1095-file set. Action: Read /home/user/repomix base-vs-patched interleaved benchmark, 60 runs, warm cache, shared 4-core container, `node bin/repomix.cjs`: min 918.4ms → 805.7ms (-12.27%) median 958.2ms → 842.8ms (-12.05%) mean 964.2ms → 843.5ms (-12.52%) Consistent across three independent runs (-12.0% .. -13.6%). Corroborates an earlier automated run that measured -8% on the Ubuntu CI benchmark. Constraint: trade-off is on cold caches / high-latency filesystems (network FS, spinning disks), where serial sync reads forgo the parallel libuv reads async allowed; the common local-SSD/warm-cache path — the benchmark scenario and typical repeat runs — wins decisively. Sync reads in a CLI are idiomatic (eslint/prettier/tsc do the same). Verification: npm run test (1336 pass), npm run lint clean.
1 parent 3c9d83d commit 60879a9

1 file changed

Lines changed: 11 additions & 2 deletions

File tree

src/core/file/fileRead.ts

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
import * as fs from 'node:fs/promises';
1+
import { readFileSync } from 'node:fs';
22
import isBinaryPath from 'is-binary-path';
33
import { isBinaryFile } from 'isbinaryfile';
44
import { logger } from '../../shared/logger.js';
@@ -80,7 +80,16 @@ export const readRawFile = async (filePath: string, maxFileSize: number): Promis
8080
// Read the file directly and check size afterward, avoiding a separate stat() syscall.
8181
// This halves the number of I/O operations per file.
8282
// Files exceeding maxFileSize are rare, so the occasional oversized read is acceptable.
83-
const buffer = await fs.readFile(filePath);
83+
//
84+
// A synchronous read is used deliberately: `collectFiles` already bounds the
85+
// pipeline on the main thread (the metrics worker stays idle until file
86+
// contents exist, and git subprocesses finish well before collection does),
87+
// so async reads buy little overlap while paying a per-file open/fstat/read/
88+
// close promise-dispatch overhead. On warm caches that overhead dominated the
89+
// collection phase; `readFileSync` collapses each file to a single blocking
90+
// syscall, cutting overall CLI runtime by ~12% on the repomix repo. The rare
91+
// non-UTF-8 fallback below still uses async APIs.
92+
const buffer = readFileSync(filePath);
8493

8594
if (buffer.length > maxFileSize) {
8695
const sizeKB = (buffer.length / 1024).toFixed(1);

0 commit comments

Comments
 (0)