v4.5: 50x Faster Case-Insensitive Search for All of Unicode 17 in AVX-512
·
0 commits
to ca7e50509b5d5b90b7c3d78ab5eeb667f39285c2
since this release
Below are the performance numbers comparing the search throughput of unique "word" tokens across various languages of the Leipzig Wikipedia Corpora for a case-insensitive substring search that respects all Unicode 17.0 case-folding rules. This is arguably the only library providing full Unicode spec compliance for search operations besides the PCRE2 library, which is often order(s) of magnitude slower than even our serial baseline due to the extreme complexity of combining a complete RegEx engine with Unicode compliance.
| Corpora Language | Script | Serial Baseline, GB/s | AVX-512 for Ice Lake+, GB/s | Speedup |
|---|---|---|---|---|
| Latin (Basic) | ||||
| 🇬🇧 English | Latin | 1.15 | 10.93 | 11.9× |
| 🇮🇹 Italian | Latin | 0.81 | 10.63 | 14.7× |
| 🇳🇱 Dutch | Latin | 0.85 | 10.91 | 13.3× |
| Latin (Extended) | ||||
| 🇩🇪 German | Latin+ß | 0.74 | 9.36 | 13.6× |
| 🇫🇷 French | Latin+Acc | 0.73 | 8.37 | 15.1× |
| 🇪🇸 Spanish | Latin+ñ | 0.99 | 8.86 | 10.8× |
| 🇵🇹 Portuguese | Latin+Acc | 0.77 | 9.58 | 14.3× |
| 🇵🇱 Polish | Latin+Ext | 0.62 | 7.51 | 14.2× |
| 🇨🇿 Czech | Latin+Háčky | 0.43 | 6.10 | 17.1× |
| 🇹🇷 Turkish | Latin+İ/ı | 0.81 | 6.78 | 11.7× |
| 🇻🇳 Vietnamese | Latin+Tones | 0.41 | 6.38 | 17.9× |
| Cyrillic | ||||
| 🇷🇺 Russian | Cyrillic | 0.54 | 3.41 | 10.6× |
| 🇺🇦 Ukrainian | Cyrillic | 0.56 | 4.03 | 10.6× |
| Greek | ||||
| 🇬🇷 Greek | Greek | 0.31 | 7.04 | 22.5× |
| Caucasian | ||||
| 🇦🇲 Armenian | Armenian | 0.34 | 4.18 | 17.5× |
| 🇬🇪 Georgian | Georgian | 0.65 | 10.56 | 24.2× |
| Semitic | ||||
| 🇮🇱 Hebrew | Hebrew | 0.65 | 9.52 | 13.7× |
| 🇸🇦 Arabic | Arabic | 1.17 | 9.85 | 9.8× |
| 🇮🇷 Persian | Arabic+Ext | 0.41 | 11.83 | 43.1× |
| Indic | ||||
| 🇮🇳 Hindi | Devanagari | 1.25 | 10.99 | 16.3× |
| 🇧🇩 Bengali | Bengali | 0.72 | 11.03 | 25.9× |
| 🇮🇳 Tamil | Tamil | 1.09 | 11.70 | 21.0× |
| CJK & East Asian | ||||
| 🇯🇵 Japanese | CJK+Kana | 0.52 | 11.56 | 26.7× |
| 🇰🇷 Korean | Hangul | 2.98 | 11.58 | 3.5× |
| 🇨🇳 Chinese | CJK | 0.43 | 20.07 | 103.0× |
Minor
- Add: Case-folding for Swift (f621419)
- Add: Case-folding for GoLang (2e310d6)
- Add: Case-folding for JavaScript (93bc9c4)
- Add: Reusable case-insensitive needles with metadata for Rust & C++ (45e3c92)
- Add: Reusable case-insensitive finder for Rust (0c8626d)
- Add: Vietnamese fast path (8ec7de7)
- Add: Armenian & Greek paths (984ace2)
- Add: Central European block (f087711)
- Add: New Cyrillic search kernels (b8e106f)
- Add: Serial verification for Ice Lake search (0cfd2fd)
- Add: Seeding & iterations multipliers for fuzzing (d96cee7)
- Add:
sz_utf8_case_agnosticAPI (a0507ee) - Add: Fast-path Ice Lake case-insensitive substring search for needles <= 16 bytes (b836970)
- Add: ASCII fast paths for small inputs (8b136f3)
- Add: Hash-free search kernel for small needles (e5c477d)
- Add: Latin-1 case-folded search (c1c0305)
- Add: Branchless
.empty()for small strings (ea258c1) - Add: Draft TR29 Unicode word-bound iterators (3ca6695)
- Add: Draft case-insensitive search on Ice Lake (4d30daa)
Patch
- Fix: Pointer cast in GoLang (fb20b6c)
- Docs: Badges, CLI, & inconsistencies (b9aa985)
- Docs: UTF-8 Fold & Search with PErf numbers (b69c49d)
- Improve: Prefetch on massive inputs (48a8ccb)
- Fix: Missing
span::operator==0for new NVCC benchamrks (9b6911a) - Fix: Shaddowing template param on NVCC (908d7f9)
- Fix: Rust UTF-8 iterator doctest (7fff78b)
- Make: Install
curlon Alpine for Rust kit (85af5b5) - Docs: Arm NEON case-folding plans (8a3f25d)
- Improve: Generalize case-invariant logic (1239bea)
- Improve: Faster ASCII kernels for ≤ 3 probes (e6626a8)
- Improve: Deduplicate benchmark input tokens (68287fc)
- Improve: Western European register pressure (fe94e1c)
- Improve: Greek alarm with less register pressure (22dc88c)
- Improve: Separate "alarm" functions for danger zones (936fc22)
- Make: Install
bashon Alpine for Rust toolchain (999ec64) - Improve: Flatten danger zone checks (a8e3f66)
- Improve: VPSHUFB & VPTERNLOG for search (a315ee8)
- Make: Bump Rust & Go CI (7847671)
- Improve: Higher-efficiency Ice Lake kernels (dce1773)
- Fix: Generalize static asserts to 32-bit archs (bde1fad)
- Fix:
NULLmissing - useSZ_NULL(7e3dd35) - Fix: Handle failed downloads of UCD specs (13bc864)
- Fix: Micro case-fold in Georgian path (09ca314)
- Fix: Missing
sized_match_tconstructor (916b23e) - Improve: Test more problematic chars (46b7135)
- Fix: Shrink step proportional to danger zones (e67f3f7)
- Fix: Outdated case-insensitive metadata in Rust (45e81f7)
- Fix: Missing danger marker in Western kernel (cbcd685)
- Fix: Mid-rune serial matches (be360c7)
- Fix: Vietnamese old-even fold in ZMM (409a44d)
- Fix: Modifiers exclusion from case-less chars (5948374)
- Fix: Danger zone length (be52b86)
- Improve: Reproduce Ice Lake bugs (ac31704)
- Fix: Eastern European case folding (900033e)
- Fix: Dispatch Central European path (2a8a9b6)
- Fix: Ban "ss" prefix/suffix for Western European path (66cade8)
- Improve: Tighten safety profiles (6eee0a9)
- Fix: Serial match verification mid-character (d927ea7)
- Improve: New regression tests for ligatures (29e9cd2)
- Fix: Compile-time Ice-Lake dispatch (7dd6209)
- Docs: Describe problematic chars (368bf5d)
- Improve: New probe refinement & tail verification (d9ceb8c)
- Improve: Detect more danger zones (d327cf8)
- Improve: Better fuzzing for substring search (9ad155e)
- Improve: Case folding variables naming (4ee616c)
- Fix: Case-folding around Glagolitic E2 ranges (68cd557)
- Improve: Fuzzing case-folding equivalene (4dcbe62)
- Fix: Check for incomplete set of 3-byte chars in case_fold_ice (ad036ef)
- Fix: Match new reusable needle ABI in Rust & Python (fa30741)
- Fix: Folding Greek final sigma in AVX-512 (bda1321)
- Fix: Handling Micro sign and Armenian ligatures (39516b3)
- Improve: Deduplicate body/tail kernel logic (d9e2409)
- Fix: Pass stress-tests under 10x multiple (1e050ad)
- Improve: Share abstractions for match validation (55c7c92)
- Fix: Cleaner script-specific window tracking (5a1ba33)
- Improve: Case-insensitive test coverage (8b2385d)
- Improve: Propagate metadata between queries (e3a6bb6)
- Fix: Detecting bicameral chars on Ice Lake (65c6c98)
- Improve: Faster test suite (74060ed)
- Fix: Classifying Armenian as bicameral (0ab964e)
- Improve: Test case-insensitive search against fold+find (aabb45e)
- Fix: 's' removed from the ASCII path (524ec7b)
- Docs: Policy for historical S sign 'ſ' (U+017F) (2226e25)
- Improve: Test coverage for case-insensitive search (a9a7d85)
- Fix: Mostlly passing tests (24e43aa)
- Improve: Simpler design for Ice-Lake case-insensitive search (9ce5a6f)
- Fix: Match new rune safety profiles (b313a8c)
- Fix: 'k' and C6 policy for Vietnamese (8974271)
- Improve: New safety profiles (afa24f7)
- Fix: Steping logic for safe slices under 16 bytes (b66ebd4)
- Improve: Default safe-window selection (1588a2d)
- Fix: Multiplication/division signs on Vietnamese path (5e0de6c)
- Fix: Special Cyrillic folding cases (fb09351)
- Improve: Print env settings at start (b4e269e)
- Fix: Serial fallback for archaic Polytonic Greek chars (659c2c7)
- Fix: Remove Ligature detection from the hot path (351236c)
- Fix: Mask offsets and Latin-A/B extensions (6aa2893)
- Fix: Using enum masks for character safety profiles (f30fb20)
- Improve: Uniform function naming (e751fbc)
- Improve: Uniform logic for case-insensitive search (a534597)
- Improve: Check "safe windows" even for small needles (b56ee54)
- Improve: New safety profiles for Unicode scripts (a6d75b9)
- Fix: Stale folded rune state (32b3df4)
- Improve: Cleaner Ice Lake kernels (4c6cf68)
- Fix: Stale
pending_idxin fast ASCII iterator (8614658) - Fix: Missing
Strs.tapeaccessors (718f9c1) - Fix: Named args in
hmac_sha256(29c5732) - Docs: Missing API coverage (217055e)
- Improve: Log running tests (473d9db)
- Docs: Explore more scripts with examples (01f493b)
- Improve: 40% smalles fast-path selector state (3d6b244)
- Improve: Reuse needle anomalies logic (094eba8)
- Improve: Test more scripts (917bc03)
- Fix: Passing case-insensitive tests (9d5b9e3)
- Improve: Cleaner case-insensitive fuzzing (ddc8b7a)
- Improve: Unnecassary checks in ci-find (c19f11c)
- Fix: Case-insensitive search passes test (726bbbd)
- Improve: Case-insensitive search for Ry, Vi, El, Hy (Am) (fd89a88)
- Improve: Vietnamese fastt case-folding path (025b36d)
- Improve: Self-equality & overflow protection (b325b1c)
- Docs: Small-string safety comments (83ca417)
- Improve: Length-returning small-string API (be9e2d7)
- Improve: Avoid modulo division (024f677)
- Improve: Use ring-buffers for O(1) prefix hashes (2db5e54)
- Docs: Missing table info (c8b6ae1)
- Improve: Cleaner Raita kernels - unstable (f642fa9)
- Improve: Faster LUT on Ice Lake and Zen4+ (b13aef4)
- Improve: Avoid UTF-8 checks in case-fold (d8aac4a)
- Improve: Faster serial baselines for ASCII needles (e5227ad)
- Improve: Optional
start/endfor folded find (62ad6f7) - Improve: Faster optional UTF-8 validation (7edba6f)
- Fix: Folding "中ABC" on Ice Lake (20dbef3)
- Improve: Boundary condition fold tests (bbea84f)
- Docs: Exaplain convoluted control-flow (44b6279)
- Fix: Gracefully handle Unicode spec download issues (44412bf)
- Fix: Require continuous substitution matrices (20ac49a)
- Make: Ignore UV lock (2c3d35d)
- Docs: Inconsistent UTF-8 fold explanations (c7a3012)