Releases: ashvardanian/StringZilla
v4.6: Fast Search for Georgian & Hashing w/out Stack Protection 🔞
- New fast case-insensitive substring search path for Georgian 🇬🇪
- Faster hashing with compile-time dispatch w/out stack protection 🔞
- Faster byteset search on Haswell CPUs, thanks to @Caturra000 👏
- Fewer inline-related warnings in C++ builds, thanks to @inflooper 👏
Minor
- Add: Georgian fast path (e02cb00)
Patch
- Fix:
inline staticwarnings with C++23 modules (#287) (374adbf) - Improve: Reduce startup overhead for
sz_find_byteset_haswell(#293) (7f2899a) - Fix: Missing Georgian dispatch (5f91e7e)
- Improve: Drop stack-protection in hashing on GCC (23801e4)
- Improve: Reduce repeated reviews (2e5784b)
- Improve: Faster
sz_size_bit_ceil(4003057) - Improve: Avoid ZMM-to-stack spill in Skylake comparisons (d94a010)
- Make: Use relative install path for C sources (690d775)
v4.5.1: UTF-8 "Find-All" Iterators for Python 🐍
v4.5: 50x Faster Case-Insensitive Search for All of Unicode 17 in AVX-512
Below are the performance numbers comparing the search throughput of unique "word" tokens across various languages of the Leipzig Wikipedia Corpora for a case-insensitive substring search that respects all Unicode 17.0 case-folding rules. This is arguably the only library providing full Unicode spec compliance for search operations besides the PCRE2 library, which is often order(s) of magnitude slower than even our serial baseline due to the extreme complexity of combining a complete RegEx engine with Unicode compliance.
| Corpora Language | Script | Serial Baseline, GB/s | AVX-512 for Ice Lake+, GB/s | Speedup |
|---|---|---|---|---|
| Latin (Basic) | ||||
| 🇬🇧 English | Latin | 1.15 | 10.93 | 11.9× |
| 🇮🇹 Italian | Latin | 0.81 | 10.63 | 14.7× |
| 🇳🇱 Dutch | Latin | 0.85 | 10.91 | 13.3× |
| Latin (Extended) | ||||
| 🇩🇪 German | Latin+ß | 0.74 | 9.36 | 13.6× |
| 🇫🇷 French | Latin+Acc | 0.73 | 8.37 | 15.1× |
| 🇪🇸 Spanish | Latin+ñ | 0.99 | 8.86 | 10.8× |
| 🇵🇹 Portuguese | Latin+Acc | 0.77 | 9.58 | 14.3× |
| 🇵🇱 Polish | Latin+Ext | 0.62 | 7.51 | 14.2× |
| 🇨🇿 Czech | Latin+Háčky | 0.43 | 6.10 | 17.1× |
| 🇹🇷 Turkish | Latin+İ/ı | 0.81 | 6.78 | 11.7× |
| 🇻🇳 Vietnamese | Latin+Tones | 0.41 | 6.38 | 17.9× |
| Cyrillic | ||||
| 🇷🇺 Russian | Cyrillic | 0.54 | 3.41 | 10.6× |
| 🇺🇦 Ukrainian | Cyrillic | 0.56 | 4.03 | 10.6× |
| Greek | ||||
| 🇬🇷 Greek | Greek | 0.31 | 7.04 | 22.5× |
| Caucasian | ||||
| 🇦🇲 Armenian | Armenian | 0.34 | 4.18 | 17.5× |
| 🇬🇪 Georgian | Georgian | 0.65 | 10.56 | 24.2× |
| Semitic | ||||
| 🇮🇱 Hebrew | Hebrew | 0.65 | 9.52 | 13.7× |
| 🇸🇦 Arabic | Arabic | 1.17 | 9.85 | 9.8× |
| 🇮🇷 Persian | Arabic+Ext | 0.41 | 11.83 | 43.1× |
| Indic | ||||
| 🇮🇳 Hindi | Devanagari | 1.25 | 10.99 | 16.3× |
| 🇧🇩 Bengali | Bengali | 0.72 | 11.03 | 25.9× |
| 🇮🇳 Tamil | Tamil | 1.09 | 11.70 | 21.0× |
| CJK & East Asian | ||||
| 🇯🇵 Japanese | CJK+Kana | 0.52 | 11.56 | 26.7× |
| 🇰🇷 Korean | Hangul | 2.98 | 11.58 | 3.5× |
| 🇨🇳 Chinese | CJK | 0.43 | 20.07 | 103.0× |
Minor
- Add: Case-folding for Swift (f621419)
- Add: Case-folding for GoLang (2e310d6)
- Add: Case-folding for JavaScript (93bc9c4)
- Add: Reusable case-insensitive needles with metadata for Rust & C++ (45e3c92)
- Add: Reusable case-insensitive finder for Rust (0c8626d)
- Add: Vietnamese fast path (8ec7de7)
- Add: Armenian & Greek paths (984ace2)
- Add: Central European block (f087711)
- Add: New Cyrillic search kernels (b8e106f)
- Add: Serial verification for Ice Lake search (0cfd2fd)
- Add: Seeding & iterations multipliers for fuzzing (d96cee7)
- Add:
sz_utf8_case_agnosticAPI (a0507ee) - Add: Fast-path Ice Lake case-insensitive substring search for needles <= 16 bytes (b836970)
- Add: ASCII fast paths for small inputs (8b136f3)
- Add: Hash-free search kernel for small needles (e5c477d)
- Add: Latin-1 case-folded search (c1c0305)
- Add: Branchless
.empty()for small strings (ea258c1) - Add: Draft TR29 Unicode word-bound iterators (3ca6695)
- Add: Draft case-insensitive search on Ice Lake (4d30daa)
Patch
- Fix: Pointer cast in GoLang (fb20b6c)
- Docs: Badges, CLI, & inconsistencies (b9aa985)
- Docs: UTF-8 Fold & Search with PErf numbers (b69c49d)
- Improve: Prefetch on massive inputs (48a8ccb)
- Fix: Missing
span::operator==0for new NVCC benchamrks (9b6911a) - Fix: Shaddowing template param on NVCC (908d7f9)
- Fix: Rust UTF-8 iterator doctest (7fff78b)
- Make: Install
curlon Alpine for Rust kit (85af5b5) - Docs: Arm NEON case-folding plans (8a3f25d)
- Improve: Generalize case-invariant logic (1239bea)
- Improve: Faster ASCII kernels for ≤ 3 probes (e6626a8)
- Improve: Deduplicate benchmark input tokens (68287fc)
- Improve: Western European register pressure (fe94e1c)
- Improve: Greek alarm with less register pressure (22dc88c)
- Improve: Separate "alarm" functions for danger zones (936fc22)
- Make: Install
bashon Alpine for Rust toolchain (999ec64) - Improve: Flatten danger zone checks (a8e3f66)
- Improve: VPSHUFB & VPTERNLOG for search (a315ee8)
- Make: Bump Rust & Go CI (7847671)
- Improve: Higher-efficiency Ice Lake kernels (dce1773)
- Fix: Generalize static asserts to 32-bit archs (bde1fad)
- Fix:
NULLmissing - useSZ_NULL(7e3dd35) - Fix: Handle failed downloads of UCD specs (13bc864)
- Fix: Micro case-fold in Georgian path (09ca314)
- Fix: Missing
sized_match_tconstructor (916b23e) - Improve: Test more problematic chars (46b7135)
- Fix: Shrink step proportional to danger zones (e67f3f7)
- Fix: Outdated case-insensitive metadata in Rust (45e81f7)
- Fix: Missing danger marker in Western kernel (cbcd685)
- Fix: Mid-rune serial matches (be360c7)
- Fix: Vietnamese old-even fold in ZMM (409a44d)
- Fix: Modifiers exclusion from case-less chars (5948374)
- Fix: Danger zone length (be52b86)
- Improve: Reproduce Ice Lake bugs (ac31704)
- Fix: Eastern European case folding (900033e)
- Fix: Dispatch Central European path (2a8a9b6)
- Fix: Ban "ss" prefix/suffix for Western European path (66cade8)
- Improve: Tighten safety profiles (6eee0a9)
- Fix: Serial match verification mid-character (d927ea7)
- Improve: New regression tests for ligatures (29e9cd2)
- Fix: Compile-time Ice-Lake dispatch (7dd6209)
- Docs: Describe problematic chars (368bf5d)
- Improve: New probe refinement & tail verification (d9ceb8c)
- Improve: Detect more danger zones (d327cf8)
- Improve: Better fuzzing for substring search (9ad155e)
- Improve: Case folding variables naming (4ee616c)
- Fix: Case-folding around Glagolitic E2 ranges (68cd557)
- Improve: Fuzzing case-folding equivalene (4dcbe62)
- Fix: Check for incomplete set of 3-byte chars in case_fold_ice (ad036ef)
- Fix: Match new reusable needle ABI in Rust & Python (fa30741)
- Fix: Folding Greek final sigma in AVX-512 (bda1321)
- Fix: Handling Micro sign and Armenian ligatures (39516b3)
- Improve: Deduplicate body/tail kernel logic (d9e2409)
- Fix: Pass stress-tests under 10x multiple (1e050ad)
- Improve: Share abstractions for match validation (55c7c92)
- Fix: Cleaner script-specific window tracking (5a1ba33)
- Improve: Case-insensitive test coverage (8b2385d)
- Improve: Propagate metadata between queries (e3a6bb6)
- Fix: Detecting bicameral chars on Ice Lake (65c6c98)
- Improve: Faster test suite (74060ed)
- Fix: Classifying Armenian as bicameral (0ab964e)
- Improve: Test case-insensitive search against fold+find (aabb45e)
- Fix: 's' removed from the ASCII path (524ec7b)
- Docs: Policy for historical S sign 'ſ' (U+017F) (2226e25)
- Improve: Test coverage for case-insensitive search (a9a7d85)
- Fix: Mostlly passing tests (24e43aa)
- Improve: Simpler design for Ice-Lake case-insensitive search (9ce5a6f)
- Fix: Match new rune safety profiles (b313a8c)
- Fix: 'k' and C6 policy for Vietnamese (8974271)
- Improve: New safety profiles (afa24f7)
- Fix: Steping logic for safe slices under 16 bytes (b66ebd4)
- Improve: Default safe-window selection (1588a2d)
- Fix: Multiplication/division signs on Vietnamese path (5e0de6c)
- Fix: Special Cyrillic folding cases (fb09351)
- Improve: Print env settings at start (b4e269e)
- Fix: Serial fallback for archaic Polytonic Greek chars (659c2c7)
- Fix: Remove Ligature detection from the hot path (351236c)
- Fix: Mask offsets and Latin-A/B extensions (6aa2893)
- Fix: Using enum masks for character safety profiles (f30fb20)
- Improve: Uniform function naming (e751fbc)
- Improve: Uniform logic for case-insensitive search (a534597)
- Improve: Check "safe windows" even for small needles (b56ee54)
- Improve: New safety profiles for Unicode scripts (a6d75b9)
- Fix: Stale folded rune state (32b3df4)
- Improve: Cleaner Ice Lake kernels (4c6cf68)
- Fix: S...
Release v4.4.2
v4.4.1: Harden C 99 API with `static n` Array Arguments
Added sz_at_least(n) macro for C99's static array parameter syntax, enabling compile-time bounds checking on fixed-size array arguments. In C mode, Clang will now warn when passing undersized arrays to annotated functions. The macro expands to nothing in C++ for compatibility.
// Compiler can now warn if the digest buffer is smaller than 32 bytes
void sz_sha256_state_digest(..., sz_u8_t digest[sz_at_least(32)]);
// Lookup tables must be at least 256 bytes
void sz_lookup(..., char const lut[sz_at_least(256)]);See LWN.net article for background on this feature and its use in the Linux kernel.
Patch
v4.4: Case-Folding UTF-8 in AVX-512
To my knowledge, this is the first ever properly vectorized case-folding (aka .to_lower()) implementation compliant with Unicode (v17) and using SIMD (AVX-512 for Intel Ice Lake and newer). The results are remarkable across most languages, but it wasn't trivial to achieve. Unlike dense linear algebra workloads, such as in SimSIMD, no shared logic holds across all languages and code points here. After all, Unicode began in 1989 and covers languages and writing systems that took thousands of years to develop and decades to be organized into a standardized set of rules.
This implementation focuses on locale-independent conversion. It covers every one of 1000+ character folding rules in CaseFolding.txt of the Unicode spec, including:
- simple cases, like ASCII English letters: 'A' → 'a'.
- complex Latin extensions, where one codepoint expands into multiple characters: 'ẞ' → "ss".
- ligatures and mathematical symbols, like 'ffi' → "ffi".
- less common bicameral alphabets, including Armenian, Georgian, Vietnamese, and others.
- fast
memcpy-like paths for unicameral scripts, like Chinese, Japanese, and Korean.
To benchmark all of those, I've extended the StringWars benchmarks with a new bench_unicode.rs and bench_unicode.py scripts and the bench_unicode.md report produced for two dozen datasets pulled from the Leipzig Wikipedia corpora. On most languages the performance is great, except for Georgian and Vietnamese for now:
| Language | Standard 🦀 | StringZilla 🦀 | Standard 🐍 | StringZilla 🐍 | ||
|---|---|---|---|---|---|---|
| English 🇬🇧 | 482 MB/s | 7.53 GB/s | 16x | 257 MB/s | 3.14 GB/s | 12x |
| German 🇩🇪 | 432 MB/s | 2.59 GB/s | 6x | 260 MB/s | 1.81 GB/s | 7x |
| Russian 🇷🇺 | 217 MB/s | 2.20 GB/s | 10x | 470 MB/s | 1.56 GB/s | 3x |
| French 🇫🇷 | 346 MB/s | 1.84 GB/s | 5x | 274 MB/s | 1.37 GB/s | 5x |
| Greek 🇬🇷 | 220 MB/s | 1.00 GB/s | 5x | 431 MB/s | 779 MB/s | 2x |
| Armenian 🇦🇲 | 223 MB/s | 908 MB/s | 4x | 470 MB/s | 746 MB/s | 2x |
| Vietnamese 🇻🇳 | 265 MB/s | 352 MB/s | 1x | 340 MB/s | 291 MB/s | 1x |
| Arabic 🇸🇦 | 232 MB/s | 1004 MB/s | 4x | 467 MB/s | 1.80 GB/s | 4x |
| Bengali 🇧🇩 | 314 MB/s | 6.17 GB/s | 20x | 694 MB/s | 2.91 GB/s | 4x |
| Chinese 🇨🇳 | 325 MB/s | 1.21 GB/s | 4x | 697 MB/s | 886 MB/s | 1x |
| Czech 🇨🇿 | 322 MB/s | 827 MB/s | 3x | 292 MB/s | 688 MB/s | 2x |
| Dutch 🇳🇱 | 471 MB/s | 4.73 GB/s | 10x | 262 MB/s | 2.97 GB/s | 11x |
| Farsi 🇮🇷 | 235 MB/s | 858 MB/s | 4x | 475 MB/s | 1.42 GB/s | 3x |
| Georgian 🇬🇪 | 294 MB/s | 192 MB/s | 1x | 689 MB/s | 488 MB/s | 1x |
| Hebrew 🇮🇱 | 233 MB/s | 1.01 GB/s | 4x | 473 MB/s | 1.86 GB/s | 4x |
| Hindi 🇮🇳 | 293 MB/s | 6.32 GB/s | 22x | 682 MB/s | 3.14 GB/s | 5x |
| Italian 🇮🇹 | 439 MB/s | 2.29 GB/s | 5x | 268 MB/s | 1.93 GB/s | 7x |
| Japanese 🇯🇵 | 330 MB/s | 3.51 GB/s | 11x | 726 MB/s | 2.00 GB/s | 3x |
| Korean 🇰🇷 | 314 MB/s | 861 MB/s | 3x | 623 MB/s | 2.80 GB/s | 4x |
| Lithuanian 🇱🇹 | 352 MB/s | 864 MB/s | 2x | 274 MB/s | 728 MB/s | 3x |
| Polish 🇵🇱 | 364 MB/s | 939 MB/s | 3x | 277 MB/s | 786 MB/s | 3x |
| Portuguese 🇧🇷 | 395 MB/s | 2.38 GB/s | 6x | 270 MB/s | 1.79 GB/s | 7x |
| Spanish 🇪🇸 | 414 MB/s | 2.38 GB/s | 6x | 272 MB/s | 1.80 GB/s | 7x |
| Tamil 🇮🇳 | 306 MB/s | 6.05 GB/s | 20x | 712 MB/s | 3.03 GB/s | 4x |
| Turkish 🇹🇷 | 326 MB/s | 852 MB/s | 3x | 284 MB/s | 706 MB/s | 2x |
| Ukrainian 🇺🇦 | 217 MB/s | 2.09 GB/s | 10x | 476 MB/s | 1.58 GB/s | 3x |
For a complete comparison, go to StringWars 😉
Minor
- Add: Fast path for Georgian case-folding (fa7422c)
- Add: Case-insensitive ops for Python (d88e30a)
- Add: Dispatch case-insensitive search (4ae91c0)
- Add: Serial case-insensitive find & compare (4b18f05)
Patch
- Fix: Eszett hex parsing warnings in Clang (8b27080)
- Fix: Avoid
__builtinmissing on MSVC (fdc95f3) - Fix: Uninitialized values warning (b84c83e)
- Improve: Safer & faster case-folding on Ice Lake (bcd5d16)
- Improve: Case-folding on Ice Lake (bb23b60)
- Fix: Move Ice Lake kernels out of Haswell scope (b7cc2c4)
- Improve: Rename functions towards
utf8_case*(44fbb92) - Improve: Faster serial Unicode folding (aa1b21b)
- Improve: Re-group folding by char-length (c3586e2)
- Docs: Avoid locale-specific Unicode rules (333a778)
- Docs: Emoji-free doc section titles (#284) (dc11b40)
v4.3: Tokenizing UTF-8 with SIMD ㊗️
On AMD Zen5 Turin CPUs on different datasets, StringZilla provides the following throughput for splitting around whitespace and newline characters on 5 vastly different languages. Chinese and Korean texts, for example, are both made of mostly 3-byte letters, but Korean uses a lot of whitespace characters for syllable separation, while Chinese doesn't use any. French and English both use a lot of single-byte whitespace characters, but French uses many accented letters that are 2-byte long in UTF-8.
| Library | English | Chinese | Arabic | French | Korean |
|---|---|---|---|---|---|
| Split around 8 newline combinations: | |||||
stringzilla::utf8_newline_splits |
15.45 GiB/s | 16.65 GiB/s | 18.34 GiB/s | 14.52 GiB/s | 16.71 GiB/s |
stdlib::split(char::is_unicode_newline) |
1.90 GiB/s | 1.93 GiB/s | 1.82 GiB/s | 1.78 GiB/s | 1.81 GiB/s |
| Split around 25 whitespace characters: | |||||
stringzilla::utf8_whitespace_splits |
0.82 GiB/s | 2.40 GiB/s | 2.40 GiB/s | 0.92 GiB/s | 1.88 GiB/s |
stdlib::split(char::is_whitespace) |
0.77 GiB/s | 1.87 GiB/s | 1.04 GiB/s | 0.72 GiB/s | 0.98 GiB/s |
icu::WhiteSpace |
0.11 GiB/s | 0.16 GiB/s | 0.15 GiB/s | 0.12 GiB/s | 0.15 GiB/s |
On Apple M2 Pro:
| Library | English | Chinese | Arabic | French | Korean |
|---|---|---|---|---|---|
| Split around 8 newline combinations: | |||||
stringzilla::utf8_newline_splits |
5.69 GiB/s | 6.24 GiB/s | 6.58 GiB/s | 6.70 GiB/s | 6.29 GiB/s |
stdlib::split(char::is_unicode_newline) |
1.12 GiB/s | 1.11 GiB/s | 1.11 GiB/s | 1.11 GiB/s | 1.13 GiB/s |
| Split around 25 whitespace characters: | |||||
stringzilla::utf8_whitespace_splits |
0.57 GiB/s | 2.45 GiB/s | 1.18 GiB/s | 0.61 GiB/s | 0.92 GiB/s |
stdlib::split(char::is_whitespace) |
0.59 GiB/s | 1.16 GiB/s | 0.99 GiB/s | 0.63 GiB/s | 0.89 GiB/s |
icu::WhiteSpace |
0.10 GiB/s | 0.16 GiB/s | 0.14 GiB/s | 0.11 GiB/s | 0.14 GiB/s |
Minor
- Add: UTF-8 case-folding placeholders (15bcc43)
- Add: UTF-8 serial case-folding (65b652f)
- Add: SVE2 kernels for UTF-8 (d4504be)
- Add: Skip-ahead UTF-8 iterator interface (958be10)
- Add: NEON UTF-8 tokenization kernels (0259f58)
- Add:
try_replace_allfor Rust (35ed227) - Add: NEON UTF-8 placeholders (f1fcdc5)
- Add: Lazy UTF-8 views for Rust (c08dc0c)
- Add:
sz_utf8_unpack_upto64for iterators (3ea1857) - Add: UTF-8 length counting 15x faster (49d9da0)
- Add:
utf8.hfor newvalidandfind_nthinterfaces (e0465d5) - Add: UTF-8 bound checks for Rust (e7b4b9e)
- Add: UTF-8 boundary detection (f1e5318)
Patch
- Make:
SZ_ENFORCE_SVE_OVER_NEON=0by default (da5687d) - Improve: Fewer loads in SVE2 and no fast paths (a06583a)
- Make: Bump macOS-13 → 15 in CI (98b8802)
- Improve: Fewer registers for
e280xxmasks in SVE2 (5434ebf) - Improve: Faster SVE2 & Neon logic (bd9ddf5)
- Fix: NEON whitespace & newline equivalence (016c44a)
- Improve: UTF-8 equivalence checks (786a322)
- Fix: Missing
i8greater-than in AVX2 (dd4c4b0) - Fix: MSVC-compatible
uint8x16_tinit (97cf851) - Improve: Consistent var. names in UTF-8 tokenizers (5c6a32a)
- Fix: Aligned state compilation in NEON (31e4c8b)
- Fix: Missing
svcompact_u8in SVE2 (302af92) - Improve: Include SVE2 benchmarks (4f558e1)
- Fix: Incorrect literal bound for test input (5e0f3ea)
- Improve:
skip_emptyarg for Python compatibility (0279383) - Improve: Consistent split-iterator across languages (07c4d1c)
- Improve: Case-folding bump from Unicode 16 to 17 (9daa2a7)
- Fix: UBSAN issues in
hash.h(36fa527) - Docs: On complexity of case-insensitive substring search (ac5cb2f)
- Make: Bump Rust deps & drop ICU (ebc4296)
- Improve: New case-folding ABI (82528a7)
- Make: Separate file for UTF-8 unpacking (567cf17)
- Improve: Check UTF-8 case-folding (bf0ff0d)
- Make: Deprecate current UTF-32 unpacking code (b2b96f4)
- Fix: Misplaced UTF-8 skip in StringZilla (b838127)
- Fix:
svmatch-ing zero characters in SVE2 kernels (6f045aa) - Improve: Use fewer registers in SVE2 code (e52f4a1)
- Fix:
shortimplicit casts (00bacfc) - Improve: Test CLRF corner cases (0edc81f)
- Improve: Faster
utf8_count_neonw/out u64 unpacking in loop (b583fa8) - Improve: Fast path for 1-byte whitespace in NEON (73da441)
- Fix: Compile-time AES/SHA dispatch for Apple (8c34baf)
- Improve: More UTF-8 whitespace tokenization tests (8bb0324)
- Fix:
no_stdbuilds and doctests (bb699e9) - Improve: Test UTF-8 decoding ops (849bff2)
- Fix: Out of bounds access in
sz_sha256_*_ice(2bceb8d) - Make: Correct
envfields for.vscode/tasks.json(dda7704) - Improve: Unlimited chunk size for UTF-8 iterators (aad09a4)
- Make: Tune Rust analyzer to use less RAM (ced9636)
- Fix: Skip U+001C, U+001D, U+001E (aca0473)
- Improve: Avoid optimization in more benchmarks (f979ed9)
- Improve: Fast path for UTF-8 whitespaces (a3c407f)
- Make: Build just 1 target for VS Code debug (26b0074)
- Fix: Signed comparisons for UTF-8 boundaries (f532ea2)
- Make: Redefining
SZ_DEBUG=0in CMake (febbdac)
Release v4.2.3
Release: v4.2.3 [skip ci]
Patch
- Fix: Missing bounds checks in Rust (#273) (5219a4d)
- Fix: Type-casting UBs of
movemaskbitsets (7c42b98) - Fix: Handling a larger
orderarray (32b6350) - Fix:
head_lengthis pre-decremented to zero (1c5c7e8) - Fix: Avoid
std::enable_iffor non-STL builds (568d90c) - Fix: Lifetime of temp strings in ranges (73ce811)
Release v4.2.2
Release: v4.2.2 [skip ci]
Patch
- Improve: LUTs in SVE (3d886d3)
- Make: Linux cross-compile matching Release CI (524b0d7)
- Fix: Check for Arm Neon support on windows (30320b7)
- Make: Removed pyarrow from windows arms python tests (eab8c3c)
- Make: Exclude KERNEL32.dll from stringzilla_bare checks (9edb804)
- Make: Disabled SVE when using MSVC (04c985b)
- Make: Use correct arch on windows for stringzillas/cuda (3fcd947)
- Make: Updated target arch for windows tests. (e6460e1)
- Fix: Disable windows min/max macros (00e902f)
- Fix: Replace processthreadsapi.h with windows.h (f09e4f9)
- Make: Expand CMAKE_HOST_SYSTEM_PROCESSOR (fe09f8d)
- Make: Revert
--sysrootcross-compile commands (579c82d) - Fix: Accessing
ARM64_CNTVCTon Windows (5e6777d) - Make: Avoid redefining
arch=armv8.2-ain pragmas (636147d) - Make: Expand CMAKE_HOST_SYSTEM_PROCESSOR (1f90f6c)
- Make: Link to
libc++in LLVM builds on MacOS (1c8b29b) - Make: Revert
_M_ARM64=1flags for MSVC (25311a6) - Make: Enable Posix extensions for Python builds (9fe4f7c)
- Make: Missing macros for
winnt.h(169)C1189 error (8ef98a9) - Fix: Reading
mrsw/out inline Asm on MSVC (d804c9f) - Make: Override
--sysrootfor "Cross Compile" builds (d3d901d) - Make: Use valid arch flags on MSVC (5aba122)
- Make: Cross compile checks now correct for MSVC (7664f67)
- Make: Windows arm now uses the correct compiler (7c2e9a0)
- Make: cmake set ARCHIVE_OUTPUT_DIRECTORY to binary dir (f1ec210)
- Make: Use ninja for windows deploy builds (0af43c8)
- Make: Fixed Windows deploy (8ff2ad7)
- Make: Include experimental Arm cross-compilation (4d86312)
v4.2.1: SHA-256 for JS, Swift, Go
Exposing SHA-256 to GoLang was tricky. Clang worked fine. GCC failed. It turned out that GCC was too shy about inlining my code, resulting in excessive stack space usage... Now, JavaScript, Swift, and GoLang bindings all support incremental SHA-256 procedures 🥳. Thanks to @MarekKnapek for reducing the stack memory usage of the serial SHA variant!
Moreover, thanks to @laurenspriem for highlighting the SIGILL when probing ID registers on older Arm CPUs. I've now guarded first mrs probes with signal handlers. Ugly solution, but it may work 😅 I've also improved the capability detection code on Arm-based Windows machines, using the OS-specific <processthreadsapi.h> functionality, so now not only pure NEON, but also NEON+SHA+AES kernels, should be dispatched just fine!
Thanks to @ashbob999, StringZilla is also getting more stable Windows builds and stringzilla_bare coverage in our CI 🦺
Patch
- Make: Removed rand/free/malloc stubs when avoiding libc (0148282)
- Make: Deploy stringzilla_bare for windows (e4ddce8)
- Make: Added .lib file to uploaded windows archives (2dc6936)
- Make: Add MSVC bare builds back (5cc5f01)
- Make: Added stringzilla_bare checks (bbc5cca)
- Fix: Avoid unused POSIX extensions on macOS (aeb06a5)
- Make: Deprecate old cross-compilation scripts (2f34c2d)
- Improve: Drop
-pedanticfor POSIX extensions (e99d557) - Make: Pre-define CMake properties, like
-lpthreadand pointer size (7722bb1) - Improve:
serialize_capabilityfor Ice Lake on Clang (58f8cf9) - Make: Skip compiler checks for cross-compilation (60988f3)
- Fix: Unused
capabilitiesin Arm macOS builds (511a09e) - Docs: Listing
./scriptsand StringWars (5af84dd) - Make: Pass
-D CMAKE_SYSROOTin cross-compiling CI (a26fc73) - Fix: Suppress unused
allocwarnings (4868d7f) - Make: Reduce CMake nesting (dda024d)
- Make: Propagate cross-compilation settings (5070321)
- Improve: Detect NEON+SHA+AES via WinAPI (3b175f8)
- Fix: Probe
mrsfor avoidSIGILLon older Arm (d2f8e97) - Fix: Isolate & skip SHA-256 tests in Go with GCC (0874b13)
- Fix: Deprecates
sz_checksum(97f9ecf) - Make: More aggressive inlining (e8f33c1)
- Make: Uniform hardware specs logging (f826dfc)
- Improve: Expose
Capabilitiesto GoLang (5f2cc97) - Improve: Branchless serial SHA-256 block processor (fe7efe2)
- Fix: Missing modulo in SHA #254 (5a513b7)
- Improve: Smaller stack usage in SHA-256 (#253) (a298be0)
- Fix: No
noescape/nocallbackfor stateful hashes (f8d321f) - Fix: Violating u32/u64 aliasing (7e55e5c)
- Fix: Missing SSE flags for SHA (403b28b)
- Improve:
io.Writer&hash.Hash64interface for Go (05f89ca) - Improve: Expose
sz_dispatch_table_initfor Go (5ff7ba1) - Fix: Missing Goldmont & Ice SHA dispatch (e29bded)
- Fix: Supporting unaligned SHA-256 states (c770e48)
- Fix: Missing
C.sz_checksum(652735d) - Fix: Hex formatting in Swift on Linux (fc65328)
- Improve: SHA for Go, JS, Swift (a165322)