Skip to content

Releases: ashvardanian/StringZilla

v4.6: Fast Search for Georgian & Hashing w/out Stack Protection 🔞

26 Dec 21:36

Choose a tag to compare

  • New fast case-insensitive substring search path for Georgian 🇬🇪
  • Faster hashing with compile-time dispatch w/out stack protection 🔞
  • Faster byteset search on Haswell CPUs, thanks to @Caturra000 👏
  • Fewer inline-related warnings in C++ builds, thanks to @inflooper 👏

Minor

  • Add: Georgian fast path (e02cb00)

Patch

  • Fix: inline static warnings with C++23 modules (#287) (374adbf)
  • Improve: Reduce startup overhead for sz_find_byteset_haswell (#293) (7f2899a)
  • Fix: Missing Georgian dispatch (5f91e7e)
  • Improve: Drop stack-protection in hashing on GCC (23801e4)
  • Improve: Reduce repeated reviews (2e5784b)
  • Improve: Faster sz_size_bit_ceil (4003057)
  • Improve: Avoid ZMM-to-stack spill in Skylake comparisons (d94a010)
  • Make: Use relative install path for C sources (690d775)

v4.5.1: UTF-8 "Find-All" Iterators for Python 🐍

15 Dec 16:21

Choose a tag to compare

Patch

  • Fix: Shared library function signatures (4b6f40f)
  • Make: Ignore blaming for recent chores :) (8a4cacd)
  • Fix: Cleaner haystack/needle buffer (de44c05)
  • Improve: Python iterator over case-insensitive matches (3782f9a)
  • Make: NPM Trusted Publishing in CI (738b07f)

v4.5: 50x Faster Case-Insensitive Search for All of Unicode 17 in AVX-512

15 Dec 00:28

Choose a tag to compare

search-utf8-example

Below are the performance numbers comparing the search throughput of unique "word" tokens across various languages of the Leipzig Wikipedia Corpora for a case-insensitive substring search that respects all Unicode 17.0 case-folding rules. This is arguably the only library providing full Unicode spec compliance for search operations besides the PCRE2 library, which is often order(s) of magnitude slower than even our serial baseline due to the extreme complexity of combining a complete RegEx engine with Unicode compliance.

Corpora Language Script Serial Baseline, GB/s AVX-512 for Ice Lake+, GB/s Speedup
Latin (Basic)
🇬🇧 English Latin 1.15 10.93 11.9×
🇮🇹 Italian Latin 0.81 10.63 14.7×
🇳🇱 Dutch Latin 0.85 10.91 13.3×
Latin (Extended)
🇩🇪 German Latin+ß 0.74 9.36 13.6×
🇫🇷 French Latin+Acc 0.73 8.37 15.1×
🇪🇸 Spanish Latin+ñ 0.99 8.86 10.8×
🇵🇹 Portuguese Latin+Acc 0.77 9.58 14.3×
🇵🇱 Polish Latin+Ext 0.62 7.51 14.2×
🇨🇿 Czech Latin+Háčky 0.43 6.10 17.1×
🇹🇷 Turkish Latin+İ/ı 0.81 6.78 11.7×
🇻🇳 Vietnamese Latin+Tones 0.41 6.38 17.9×
Cyrillic
🇷🇺 Russian Cyrillic 0.54 3.41 10.6×
🇺🇦 Ukrainian Cyrillic 0.56 4.03 10.6×
Greek
🇬🇷 Greek Greek 0.31 7.04 22.5×
Caucasian
🇦🇲 Armenian Armenian 0.34 4.18 17.5×
🇬🇪 Georgian Georgian 0.65 10.56 24.2×
Semitic
🇮🇱 Hebrew Hebrew 0.65 9.52 13.7×
🇸🇦 Arabic Arabic 1.17 9.85 9.8×
🇮🇷 Persian Arabic+Ext 0.41 11.83 43.1×
Indic
🇮🇳 Hindi Devanagari 1.25 10.99 16.3×
🇧🇩 Bengali Bengali 0.72 11.03 25.9×
🇮🇳 Tamil Tamil 1.09 11.70 21.0×
CJK & East Asian
🇯🇵 Japanese CJK+Kana 0.52 11.56 26.7×
🇰🇷 Korean Hangul 2.98 11.58 3.5×
🇨🇳 Chinese CJK 0.43 20.07 103.0×

Minor

  • Add: Case-folding for Swift (f621419)
  • Add: Case-folding for GoLang (2e310d6)
  • Add: Case-folding for JavaScript (93bc9c4)
  • Add: Reusable case-insensitive needles with metadata for Rust & C++ (45e3c92)
  • Add: Reusable case-insensitive finder for Rust (0c8626d)
  • Add: Vietnamese fast path (8ec7de7)
  • Add: Armenian & Greek paths (984ace2)
  • Add: Central European block (f087711)
  • Add: New Cyrillic search kernels (b8e106f)
  • Add: Serial verification for Ice Lake search (0cfd2fd)
  • Add: Seeding & iterations multipliers for fuzzing (d96cee7)
  • Add: sz_utf8_case_agnostic API (a0507ee)
  • Add: Fast-path Ice Lake case-insensitive substring search for needles <= 16 bytes (b836970)
  • Add: ASCII fast paths for small inputs (8b136f3)
  • Add: Hash-free search kernel for small needles (e5c477d)
  • Add: Latin-1 case-folded search (c1c0305)
  • Add: Branchless .empty() for small strings (ea258c1)
  • Add: Draft TR29 Unicode word-bound iterators (3ca6695)
  • Add: Draft case-insensitive search on Ice Lake (4d30daa)

Patch

  • Fix: Pointer cast in GoLang (fb20b6c)
  • Docs: Badges, CLI, & inconsistencies (b9aa985)
  • Docs: UTF-8 Fold & Search with PErf numbers (b69c49d)
  • Improve: Prefetch on massive inputs (48a8ccb)
  • Fix: Missing span::operator==0 for new NVCC benchamrks (9b6911a)
  • Fix: Shaddowing template param on NVCC (908d7f9)
  • Fix: Rust UTF-8 iterator doctest (7fff78b)
  • Make: Install curl on Alpine for Rust kit (85af5b5)
  • Docs: Arm NEON case-folding plans (8a3f25d)
  • Improve: Generalize case-invariant logic (1239bea)
  • Improve: Faster ASCII kernels for ≤ 3 probes (e6626a8)
  • Improve: Deduplicate benchmark input tokens (68287fc)
  • Improve: Western European register pressure (fe94e1c)
  • Improve: Greek alarm with less register pressure (22dc88c)
  • Improve: Separate "alarm" functions for danger zones (936fc22)
  • Make: Install bash on Alpine for Rust toolchain (999ec64)
  • Improve: Flatten danger zone checks (a8e3f66)
  • Improve: VPSHUFB & VPTERNLOG for search (a315ee8)
  • Make: Bump Rust & Go CI (7847671)
  • Improve: Higher-efficiency Ice Lake kernels (dce1773)
  • Fix: Generalize static asserts to 32-bit archs (bde1fad)
  • Fix: NULL missing - use SZ_NULL (7e3dd35)
  • Fix: Handle failed downloads of UCD specs (13bc864)
  • Fix: Micro case-fold in Georgian path (09ca314)
  • Fix: Missing sized_match_t constructor (916b23e)
  • Improve: Test more problematic chars (46b7135)
  • Fix: Shrink step proportional to danger zones (e67f3f7)
  • Fix: Outdated case-insensitive metadata in Rust (45e81f7)
  • Fix: Missing danger marker in Western kernel (cbcd685)
  • Fix: Mid-rune serial matches (be360c7)
  • Fix: Vietnamese old-even fold in ZMM (409a44d)
  • Fix: Modifiers exclusion from case-less chars (5948374)
  • Fix: Danger zone length (be52b86)
  • Improve: Reproduce Ice Lake bugs (ac31704)
  • Fix: Eastern European case folding (900033e)
  • Fix: Dispatch Central European path (2a8a9b6)
  • Fix: Ban "ss" prefix/suffix for Western European path (66cade8)
  • Improve: Tighten safety profiles (6eee0a9)
  • Fix: Serial match verification mid-character (d927ea7)
  • Improve: New regression tests for ligatures (29e9cd2)
  • Fix: Compile-time Ice-Lake dispatch (7dd6209)
  • Docs: Describe problematic chars (368bf5d)
  • Improve: New probe refinement & tail verification (d9ceb8c)
  • Improve: Detect more danger zones (d327cf8)
  • Improve: Better fuzzing for substring search (9ad155e)
  • Improve: Case folding variables naming (4ee616c)
  • Fix: Case-folding around Glagolitic E2 ranges (68cd557)
  • Improve: Fuzzing case-folding equivalene (4dcbe62)
  • Fix: Check for incomplete set of 3-byte chars in case_fold_ice (ad036ef)
  • Fix: Match new reusable needle ABI in Rust & Python (fa30741)
  • Fix: Folding Greek final sigma in AVX-512 (bda1321)
  • Fix: Handling Micro sign and Armenian ligatures (39516b3)
  • Improve: Deduplicate body/tail kernel logic (d9e2409)
  • Fix: Pass stress-tests under 10x multiple (1e050ad)
  • Improve: Share abstractions for match validation (55c7c92)
  • Fix: Cleaner script-specific window tracking (5a1ba33)
  • Improve: Case-insensitive test coverage (8b2385d)
  • Improve: Propagate metadata between queries (e3a6bb6)
  • Fix: Detecting bicameral chars on Ice Lake (65c6c98)
  • Improve: Faster test suite (74060ed)
  • Fix: Classifying Armenian as bicameral (0ab964e)
  • Improve: Test case-insensitive search against fold+find (aabb45e)
  • Fix: 's' removed from the ASCII path (524ec7b)
  • Docs: Policy for historical S sign 'ſ' (U+017F) (2226e25)
  • Improve: Test coverage for case-insensitive search (a9a7d85)
  • Fix: Mostlly passing tests (24e43aa)
  • Improve: Simpler design for Ice-Lake case-insensitive search (9ce5a6f)
  • Fix: Match new rune safety profiles (b313a8c)
  • Fix: 'k' and C6 policy for Vietnamese (8974271)
  • Improve: New safety profiles (afa24f7)
  • Fix: Steping logic for safe slices under 16 bytes (b66ebd4)
  • Improve: Default safe-window selection (1588a2d)
  • Fix: Multiplication/division signs on Vietnamese path (5e0de6c)
  • Fix: Special Cyrillic folding cases (fb09351)
  • Improve: Print env settings at start (b4e269e)
  • Fix: Serial fallback for archaic Polytonic Greek chars (659c2c7)
  • Fix: Remove Ligature detection from the hot path (351236c)
  • Fix: Mask offsets and Latin-A/B extensions (6aa2893)
  • Fix: Using enum masks for character safety profiles (f30fb20)
  • Improve: Uniform function naming (e751fbc)
  • Improve: Uniform logic for case-insensitive search (a534597)
  • Improve: Check "safe windows" even for small needles (b56ee54)
  • Improve: New safety profiles for Unicode scripts (a6d75b9)
  • Fix: Stale folded rune state (32b3df4)
  • Improve: Cleaner Ice Lake kernels (4c6cf68)
  • Fix: S...
Read more

Release v4.4.2

04 Dec 19:31

Choose a tag to compare

Release: v4.4.2 [skip ci]

Patch

v4.4.1: Harden C 99 API with `static n` Array Arguments

03 Dec 22:37

Choose a tag to compare

Added sz_at_least(n) macro for C99's static array parameter syntax, enabling compile-time bounds checking on fixed-size array arguments. In C mode, Clang will now warn when passing undersized arrays to annotated functions. The macro expands to nothing in C++ for compatibility.

// Compiler can now warn if the digest buffer is smaller than 32 bytes
void sz_sha256_state_digest(..., sz_u8_t digest[sz_at_least(32)]);

// Lookup tables must be at least 256 bytes
void sz_lookup(..., char const lut[sz_at_least(256)]);

See LWN.net article for background on this feature and its use in the Linux kernel.

Patch

  • Improve: Harden C API with static n arrays (#289) (039c4b4)

v4.4: Case-Folding UTF-8 in AVX-512

29 Nov 14:43

Choose a tag to compare

To my knowledge, this is the first ever properly vectorized case-folding (aka .to_lower()) implementation compliant with Unicode (v17) and using SIMD (AVX-512 for Intel Ice Lake and newer). The results are remarkable across most languages, but it wasn't trivial to achieve. Unlike dense linear algebra workloads, such as in SimSIMD, no shared logic holds across all languages and code points here. After all, Unicode began in 1989 and covers languages and writing systems that took thousands of years to develop and decades to be organized into a standardized set of rules.

This implementation focuses on locale-independent conversion. It covers every one of 1000+ character folding rules in CaseFolding.txt of the Unicode spec, including:

  • simple cases, like ASCII English letters: 'A' → 'a'.
  • complex Latin extensions, where one codepoint expands into multiple characters: 'ẞ' → "ss".
  • ligatures and mathematical symbols, like 'ffi' → "ffi".
  • less common bicameral alphabets, including Armenian, Georgian, Vietnamese, and others.
  • fast memcpy-like paths for unicameral scripts, like Chinese, Japanese, and Korean.

To benchmark all of those, I've extended the StringWars benchmarks with a new bench_unicode.rs and bench_unicode.py scripts and the bench_unicode.md report produced for two dozen datasets pulled from the Leipzig Wikipedia corpora. On most languages the performance is great, except for Georgian and Vietnamese for now:

Language Standard 🦀 StringZilla 🦀 Standard 🐍 StringZilla 🐍
English 🇬🇧 482 MB/s 7.53 GB/s 16x 257 MB/s 3.14 GB/s 12x
German 🇩🇪 432 MB/s 2.59 GB/s 6x 260 MB/s 1.81 GB/s 7x
Russian 🇷🇺 217 MB/s 2.20 GB/s 10x 470 MB/s 1.56 GB/s 3x
French 🇫🇷 346 MB/s 1.84 GB/s 5x 274 MB/s 1.37 GB/s 5x
Greek 🇬🇷 220 MB/s 1.00 GB/s 5x 431 MB/s 779 MB/s 2x
Armenian 🇦🇲 223 MB/s 908 MB/s 4x 470 MB/s 746 MB/s 2x
Vietnamese 🇻🇳 265 MB/s 352 MB/s 1x 340 MB/s 291 MB/s 1x
Arabic 🇸🇦 232 MB/s 1004 MB/s 4x 467 MB/s 1.80 GB/s 4x
Bengali 🇧🇩 314 MB/s 6.17 GB/s 20x 694 MB/s 2.91 GB/s 4x
Chinese 🇨🇳 325 MB/s 1.21 GB/s 4x 697 MB/s 886 MB/s 1x
Czech 🇨🇿 322 MB/s 827 MB/s 3x 292 MB/s 688 MB/s 2x
Dutch 🇳🇱 471 MB/s 4.73 GB/s 10x 262 MB/s 2.97 GB/s 11x
Farsi 🇮🇷 235 MB/s 858 MB/s 4x 475 MB/s 1.42 GB/s 3x
Georgian 🇬🇪 294 MB/s 192 MB/s 1x 689 MB/s 488 MB/s 1x
Hebrew 🇮🇱 233 MB/s 1.01 GB/s 4x 473 MB/s 1.86 GB/s 4x
Hindi 🇮🇳 293 MB/s 6.32 GB/s 22x 682 MB/s 3.14 GB/s 5x
Italian 🇮🇹 439 MB/s 2.29 GB/s 5x 268 MB/s 1.93 GB/s 7x
Japanese 🇯🇵 330 MB/s 3.51 GB/s 11x 726 MB/s 2.00 GB/s 3x
Korean 🇰🇷 314 MB/s 861 MB/s 3x 623 MB/s 2.80 GB/s 4x
Lithuanian 🇱🇹 352 MB/s 864 MB/s 2x 274 MB/s 728 MB/s 3x
Polish 🇵🇱 364 MB/s 939 MB/s 3x 277 MB/s 786 MB/s 3x
Portuguese 🇧🇷 395 MB/s 2.38 GB/s 6x 270 MB/s 1.79 GB/s 7x
Spanish 🇪🇸 414 MB/s 2.38 GB/s 6x 272 MB/s 1.80 GB/s 7x
Tamil 🇮🇳 306 MB/s 6.05 GB/s 20x 712 MB/s 3.03 GB/s 4x
Turkish 🇹🇷 326 MB/s 852 MB/s 3x 284 MB/s 706 MB/s 2x
Ukrainian 🇺🇦 217 MB/s 2.09 GB/s 10x 476 MB/s 1.58 GB/s 3x

For a complete comparison, go to StringWars 😉

Minor

  • Add: Fast path for Georgian case-folding (fa7422c)
  • Add: Case-insensitive ops for Python (d88e30a)
  • Add: Dispatch case-insensitive search (4ae91c0)
  • Add: Serial case-insensitive find & compare (4b18f05)

Patch

  • Fix: Eszett hex parsing warnings in Clang (8b27080)
  • Fix: Avoid __builtin missing on MSVC (fdc95f3)
  • Fix: Uninitialized values warning (b84c83e)
  • Improve: Safer & faster case-folding on Ice Lake (bcd5d16)
  • Improve: Case-folding on Ice Lake (bb23b60)
  • Fix: Move Ice Lake kernels out of Haswell scope (b7cc2c4)
  • Improve: Rename functions towards utf8_case* (44fbb92)
  • Improve: Faster serial Unicode folding (aa1b21b)
  • Improve: Re-group folding by char-length (c3586e2)
  • Docs: Avoid locale-specific Unicode rules (333a778)
  • Docs: Emoji-free doc section titles (#284) (dc11b40)

v4.3: Tokenizing UTF-8 with SIMD ㊗️

26 Nov 13:42

Choose a tag to compare

On AMD Zen5 Turin CPUs on different datasets, StringZilla provides the following throughput for splitting around whitespace and newline characters on 5 vastly different languages. Chinese and Korean texts, for example, are both made of mostly 3-byte letters, but Korean uses a lot of whitespace characters for syllable separation, while Chinese doesn't use any. French and English both use a lot of single-byte whitespace characters, but French uses many accented letters that are 2-byte long in UTF-8.

Library English Chinese Arabic French Korean
Split around 8 newline combinations:
stringzilla::utf8_newline_splits 15.45 GiB/s 16.65 GiB/s 18.34 GiB/s 14.52 GiB/s 16.71 GiB/s
stdlib::split(char::is_unicode_newline) 1.90 GiB/s 1.93 GiB/s 1.82 GiB/s 1.78 GiB/s 1.81 GiB/s
Split around 25 whitespace characters:
stringzilla::utf8_whitespace_splits 0.82 GiB/s 2.40 GiB/s 2.40 GiB/s 0.92 GiB/s 1.88 GiB/s
stdlib::split(char::is_whitespace) 0.77 GiB/s 1.87 GiB/s 1.04 GiB/s 0.72 GiB/s 0.98 GiB/s
icu::WhiteSpace 0.11 GiB/s 0.16 GiB/s 0.15 GiB/s 0.12 GiB/s 0.15 GiB/s

On Apple M2 Pro:

Library English Chinese Arabic French Korean
Split around 8 newline combinations:
stringzilla::utf8_newline_splits 5.69 GiB/s 6.24 GiB/s 6.58 GiB/s 6.70 GiB/s 6.29 GiB/s
stdlib::split(char::is_unicode_newline) 1.12 GiB/s 1.11 GiB/s 1.11 GiB/s 1.11 GiB/s 1.13 GiB/s
Split around 25 whitespace characters:
stringzilla::utf8_whitespace_splits 0.57 GiB/s 2.45 GiB/s 1.18 GiB/s 0.61 GiB/s 0.92 GiB/s
stdlib::split(char::is_whitespace) 0.59 GiB/s 1.16 GiB/s 0.99 GiB/s 0.63 GiB/s 0.89 GiB/s
icu::WhiteSpace 0.10 GiB/s 0.16 GiB/s 0.14 GiB/s 0.11 GiB/s 0.14 GiB/s

Minor

  • Add: UTF-8 case-folding placeholders (15bcc43)
  • Add: UTF-8 serial case-folding (65b652f)
  • Add: SVE2 kernels for UTF-8 (d4504be)
  • Add: Skip-ahead UTF-8 iterator interface (958be10)
  • Add: NEON UTF-8 tokenization kernels (0259f58)
  • Add: try_replace_all for Rust (35ed227)
  • Add: NEON UTF-8 placeholders (f1fcdc5)
  • Add: Lazy UTF-8 views for Rust (c08dc0c)
  • Add: sz_utf8_unpack_upto64 for iterators (3ea1857)
  • Add: UTF-8 length counting 15x faster (49d9da0)
  • Add: utf8.h for new valid and find_nth interfaces (e0465d5)
  • Add: UTF-8 bound checks for Rust (e7b4b9e)
  • Add: UTF-8 boundary detection (f1e5318)

Patch

  • Make: SZ_ENFORCE_SVE_OVER_NEON=0 by default (da5687d)
  • Improve: Fewer loads in SVE2 and no fast paths (a06583a)
  • Make: Bump macOS-13 → 15 in CI (98b8802)
  • Improve: Fewer registers for e280xx masks in SVE2 (5434ebf)
  • Improve: Faster SVE2 & Neon logic (bd9ddf5)
  • Fix: NEON whitespace & newline equivalence (016c44a)
  • Improve: UTF-8 equivalence checks (786a322)
  • Fix: Missing i8 greater-than in AVX2 (dd4c4b0)
  • Fix: MSVC-compatible uint8x16_t init (97cf851)
  • Improve: Consistent var. names in UTF-8 tokenizers (5c6a32a)
  • Fix: Aligned state compilation in NEON (31e4c8b)
  • Fix: Missing svcompact_u8 in SVE2 (302af92)
  • Improve: Include SVE2 benchmarks (4f558e1)
  • Fix: Incorrect literal bound for test input (5e0f3ea)
  • Improve: skip_empty arg for Python compatibility (0279383)
  • Improve: Consistent split-iterator across languages (07c4d1c)
  • Improve: Case-folding bump from Unicode 16 to 17 (9daa2a7)
  • Fix: UBSAN issues in hash.h (36fa527)
  • Docs: On complexity of case-insensitive substring search (ac5cb2f)
  • Make: Bump Rust deps & drop ICU (ebc4296)
  • Improve: New case-folding ABI (82528a7)
  • Make: Separate file for UTF-8 unpacking (567cf17)
  • Improve: Check UTF-8 case-folding (bf0ff0d)
  • Make: Deprecate current UTF-32 unpacking code (b2b96f4)
  • Fix: Misplaced UTF-8 skip in StringZilla (b838127)
  • Fix: svmatch-ing zero characters in SVE2 kernels (6f045aa)
  • Improve: Use fewer registers in SVE2 code (e52f4a1)
  • Fix: short implicit casts (00bacfc)
  • Improve: Test CLRF corner cases (0edc81f)
  • Improve: Faster utf8_count_neon w/out u64 unpacking in loop (b583fa8)
  • Improve: Fast path for 1-byte whitespace in NEON (73da441)
  • Fix: Compile-time AES/SHA dispatch for Apple (8c34baf)
  • Improve: More UTF-8 whitespace tokenization tests (8bb0324)
  • Fix: no_std builds and doctests (bb699e9)
  • Improve: Test UTF-8 decoding ops (849bff2)
  • Fix: Out of bounds access in sz_sha256_*_ice (2bceb8d)
  • Make: Correct env fields for .vscode/tasks.json (dda7704)
  • Improve: Unlimited chunk size for UTF-8 iterators (aad09a4)
  • Make: Tune Rust analyzer to use less RAM (ced9636)
  • Fix: Skip U+001C, U+001D, U+001E (aca0473)
  • Improve: Avoid optimization in more benchmarks (f979ed9)
  • Improve: Fast path for UTF-8 whitespaces (a3c407f)
  • Make: Build just 1 target for VS Code debug (26b0074)
  • Fix: Signed comparisons for UTF-8 boundaries (f532ea2)
  • Make: Redefining SZ_DEBUG=0 in CMake (febbdac)

Release v4.2.3

27 Oct 16:20

Choose a tag to compare

Release: v4.2.3 [skip ci]

Patch

  • Fix: Missing bounds checks in Rust (#273) (5219a4d)
  • Fix: Type-casting UBs of movemask bitsets (7c42b98)
  • Fix: Handling a larger order array (32b6350)
  • Fix: head_length is pre-decremented to zero (1c5c7e8)
  • Fix: Avoid std::enable_if for non-STL builds (568d90c)
  • Fix: Lifetime of temp strings in ranges (73ce811)

Release v4.2.2

26 Oct 21:24

Choose a tag to compare

Release: v4.2.2 [skip ci]

Patch

  • Improve: LUTs in SVE (3d886d3)
  • Make: Linux cross-compile matching Release CI (524b0d7)
  • Fix: Check for Arm Neon support on windows (30320b7)
  • Make: Removed pyarrow from windows arms python tests (eab8c3c)
  • Make: Exclude KERNEL32.dll from stringzilla_bare checks (9edb804)
  • Make: Disabled SVE when using MSVC (04c985b)
  • Make: Use correct arch on windows for stringzillas/cuda (3fcd947)
  • Make: Updated target arch for windows tests. (e6460e1)
  • Fix: Disable windows min/max macros (00e902f)
  • Fix: Replace processthreadsapi.h with windows.h (f09e4f9)
  • Make: Expand CMAKE_HOST_SYSTEM_PROCESSOR (fe09f8d)
  • Make: Revert --sysroot cross-compile commands (579c82d)
  • Fix: Accessing ARM64_CNTVCT on Windows (5e6777d)
  • Make: Avoid redefining arch=armv8.2-a in pragmas (636147d)
  • Make: Expand CMAKE_HOST_SYSTEM_PROCESSOR (1f90f6c)
  • Make: Link to libc++ in LLVM builds on MacOS (1c8b29b)
  • Make: Revert _M_ARM64=1 flags for MSVC (25311a6)
  • Make: Enable Posix extensions for Python builds (9fe4f7c)
  • Make: Missing macros for winnt.h(169) C1189 error (8ef98a9)
  • Fix: Reading mrs w/out inline Asm on MSVC (d804c9f)
  • Make: Override --sysroot for "Cross Compile" builds (d3d901d)
  • Make: Use valid arch flags on MSVC (5aba122)
  • Make: Cross compile checks now correct for MSVC (7664f67)
  • Make: Windows arm now uses the correct compiler (7c2e9a0)
  • Make: cmake set ARCHIVE_OUTPUT_DIRECTORY to binary dir (f1ec210)
  • Make: Use ninja for windows deploy builds (0af43c8)
  • Make: Fixed Windows deploy (8ff2ad7)
  • Make: Include experimental Arm cross-compilation (4d86312)

v4.2.1: SHA-256 for JS, Swift, Go 🫆

12 Oct 13:50

Choose a tag to compare

Exposing SHA-256 to GoLang was tricky. Clang worked fine. GCC failed. It turned out that GCC was too shy about inlining my code, resulting in excessive stack space usage... Now, JavaScript, Swift, and GoLang bindings all support incremental SHA-256 procedures 🥳. Thanks to @MarekKnapek for reducing the stack memory usage of the serial SHA variant!

Moreover, thanks to @laurenspriem for highlighting the SIGILL when probing ID registers on older Arm CPUs. I've now guarded first mrs probes with signal handlers. Ugly solution, but it may work 😅 I've also improved the capability detection code on Arm-based Windows machines, using the OS-specific <processthreadsapi.h> functionality, so now not only pure NEON, but also NEON+SHA+AES kernels, should be dispatched just fine!

Thanks to @ashbob999, StringZilla is also getting more stable Windows builds and stringzilla_bare coverage in our CI 🦺

Patch

  • Make: Removed rand/free/malloc stubs when avoiding libc (0148282)
  • Make: Deploy stringzilla_bare for windows (e4ddce8)
  • Make: Added .lib file to uploaded windows archives (2dc6936)
  • Make: Add MSVC bare builds back (5cc5f01)
  • Make: Added stringzilla_bare checks (bbc5cca)
  • Fix: Avoid unused POSIX extensions on macOS (aeb06a5)
  • Make: Deprecate old cross-compilation scripts (2f34c2d)
  • Improve: Drop -pedantic for POSIX extensions (e99d557)
  • Make: Pre-define CMake properties, like -lpthread and pointer size (7722bb1)
  • Improve: serialize_capability for Ice Lake on Clang (58f8cf9)
  • Make: Skip compiler checks for cross-compilation (60988f3)
  • Fix: Unused capabilities in Arm macOS builds (511a09e)
  • Docs: Listing ./scripts and StringWars (5af84dd)
  • Make: Pass -D CMAKE_SYSROOT in cross-compiling CI (a26fc73)
  • Fix: Suppress unused alloc warnings (4868d7f)
  • Make: Reduce CMake nesting (dda024d)
  • Make: Propagate cross-compilation settings (5070321)
  • Improve: Detect NEON+SHA+AES via WinAPI (3b175f8)
  • Fix: Probe mrs for avoid SIGILL on older Arm (d2f8e97)
  • Fix: Isolate & skip SHA-256 tests in Go with GCC (0874b13)
  • Fix: Deprecates sz_checksum (97f9ecf)
  • Make: More aggressive inlining (e8f33c1)
  • Make: Uniform hardware specs logging (f826dfc)
  • Improve: Expose Capabilities to GoLang (5f2cc97)
  • Improve: Branchless serial SHA-256 block processor (fe7efe2)
  • Fix: Missing modulo in SHA #254 (5a513b7)
  • Improve: Smaller stack usage in SHA-256 (#253) (a298be0)
  • Fix: No noescape/nocallback for stateful hashes (f8d321f)
  • Fix: Violating u32/u64 aliasing (7e55e5c)
  • Fix: Missing SSE flags for SHA (403b28b)
  • Improve: io.Writer & hash.Hash64 interface for Go (05f89ca)
  • Improve: Expose sz_dispatch_table_init for Go (5ff7ba1)
  • Fix: Missing Goldmont & Ice SHA dispatch (e29bded)
  • Fix: Supporting unaligned SHA-256 states (c770e48)
  • Fix: Missing C.sz_checksum (652735d)
  • Fix: Hex formatting in Swift on Linux (fc65328)
  • Improve: SHA for Go, JS, Swift (a165322)