Commit 2d0a45a
committed
perf(file): Sample-scan base64 run detection in truncateBase64
intent(file-process): automated perf tuning pass — single highest-impact, behavior-preserving change against a ~865ms default pack run; truncateBase64 is enabled in this repo's own config so its precondition scan runs on every packed file in the benchmark workload
learned(base64-scan): hasLongBase64Run walked every character of every file (~5.5MB per pack, 23ms main-thread self time in CPU profiles, 35ms isolated) even though it almost always returns false — the per-character loop was itself the previous optimization over the regex it gates
decision(sampled-scan): sample one character every MIN_BASE64_LENGTH_STANDALONE (256) positions — any qualifying run occupies 256 consecutive indices, so it must contain a sample point; only a sampled base64-class hit triggers a bounded outward expansion to measure the surrounding run, and the sampling phase resets cleanly after each short-run skip (next possible run from hi+1 always covers sample hi+256)
constraint(equivalence): differential-tested against the per-character reference on the full repo corpus (1096 files, 0 mismatches) plus 20k randomized fuzz cases; a deterministic-LCG differential test now pins both false-positive and false-negative directions in the suite
rejected(regex-precheck): /[A-Za-z0-9+/]{256}/.test() measured 4.5x SLOWER than the per-character loop (155ms vs 35ms on the corpus) — bounded-repetition re-scanning at each start position, not a viable replacement
rejected(early-git-token-dispatch): pre-dispatching git diff/log token counts from the packager — with a warm token cache they resolve while calculateMetrics awaits outputPromise (Promise.all resolves in ~0ms; the 63-67ms wall time is main-thread-busy completion latency, not queue wait), e2e median +15ms under noise, unproven
rejected(collect-concurrency): FILE_COLLECT_CONCURRENCY 50 -> 128/256 — identical medians over 40 quiet interleaved runs; libuv's 4-thread pool is saturated at depth 50, queue depth adds nothing
rejected(startup-lazy-imports): module-level import() prefetches of tinypool/fast-glob/handlebars all measure 0 to -3ms — ESM already fetches/compiles the static graph in parallel; the budget is sequential module evaluation (~255 modules), only bundling would cut it
rejected(lazy-render-context): skipping fileLineCounts + markdownCodeBlockDelimiter on the XML path re-measured at ~11ms p50 quiet (6.2 + 4.7) — still below the 2% threshold, matching the previous pass's rejection
Benchmark (repomix repo itself, ~1100 files, 20 interleaved warm pairs,
quiet 4-core Linux, default pack, pristine HEAD worktree build vs
patched build):
- end-to-end median 865ms -> 820.5ms (paired delta median -26.5ms,
-3.1%), paired mean -37.5ms (t = 5.14), 18/20 pairs improved
- isolated scan cost over the packed corpus: 35.6ms -> 1.6ms p50 (~22x)
- output byte-identical (cmp) vs the base build on the same tree
- 6 new tests: stride alignments 0-511, run ending at EOF,
whole-content run, phase reset after short-run skips, near-threshold
non-matches, and the seeded differential fuzz
npm run test: 1385/1385 pass. npm run lint: clean (3 pre-existing
warnings in unrelated files).
https://claude.ai/code/session_01Ea6eConhLEQFKZsVkJz1zE1 parent 1f2621e commit 2d0a45a
2 files changed
Lines changed: 139 additions & 15 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
15 | 19 | | |
16 | | - | |
17 | | - | |
| 20 | + | |
| 21 | + | |
18 | 22 | | |
19 | 23 | | |
20 | | - | |
21 | | - | |
22 | | - | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
23 | 36 | | |
24 | 37 | | |
25 | 38 | | |
26 | 39 | | |
27 | | - | |
28 | | - | |
29 | | - | |
30 | | - | |
31 | | - | |
32 | | - | |
33 | | - | |
34 | | - | |
35 | | - | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
36 | 57 | | |
| 58 | + | |
| 59 | + | |
37 | 60 | | |
38 | | - | |
39 | 61 | | |
40 | 62 | | |
41 | 63 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
135 | 135 | | |
136 | 136 | | |
137 | 137 | | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
138 | 240 | | |
0 commit comments