Commit e3ca2bb
perf(frequency): hint UTF-8 failure as cold in ignore-case hot loop
In `ftables_weighted_internal` and `ftables_unweighted`, the per-cell
`process_field` closures take an `if let Ok(s) = simdutf8::basic::from_utf8(field)`
branch where the `Ok` arm dominates on real data and the `Err` arm is rare.
Mark the four `else` arms with `core::hint::cold_path()` so LLVM keeps the hot
UTF-8-success path contiguous in the instruction cache.
Benchmark on a 1M-row NYC 311 CSV (514 MB, 41 cols), hyperfine, 10 runs:
qsv frequency --ignore-case
baseline 4.399 ± 0.045 s
coldpath 4.139 ± 0.093 s → 1.06× faster
qsv frequency --ignore-case --no-trim
baseline 4.089 ± 0.090 s
coldpath 4.053 ± 0.036 s → noise
qsv frequency (default, cache short-circuit)
baseline 1.880 ± 0.028 s
coldpath 1.864 ± 0.015 s → noise (paths not exercised)
Outputs identical between builds. The 6% gain is concentrated on the trim
+ ignore-case path because that hot body (lowercase + extend_from_slice +
add_borrowed) is the largest of the closure variants, so isolating its
icache layout has the most leverage.
MSRV 1.95 ≥ cold_path stabilization (1.92).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent da1202c commit e3ca2bb
1 file changed
Lines changed: 5 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
240 | 240 | | |
241 | 241 | | |
242 | 242 | | |
| 243 | + | |
243 | 244 | | |
244 | 245 | | |
245 | 246 | | |
| |||
2711 | 2712 | | |
2712 | 2713 | | |
2713 | 2714 | | |
| 2715 | + | |
2714 | 2716 | | |
2715 | 2717 | | |
2716 | 2718 | | |
| |||
2723 | 2725 | | |
2724 | 2726 | | |
2725 | 2727 | | |
| 2728 | + | |
2726 | 2729 | | |
2727 | 2730 | | |
2728 | 2731 | | |
| |||
2858 | 2861 | | |
2859 | 2862 | | |
2860 | 2863 | | |
| 2864 | + | |
2861 | 2865 | | |
2862 | 2866 | | |
2863 | 2867 | | |
| |||
2870 | 2874 | | |
2871 | 2875 | | |
2872 | 2876 | | |
| 2877 | + | |
2873 | 2878 | | |
2874 | 2879 | | |
2875 | 2880 | | |
| |||
0 commit comments