Skip to content

Releases: ashvardanian/StringZilla

v4.2: Faster Hashing and SHA-256 🫆

07 Oct 19:29

Choose a tag to compare

User-facing updates:

  • 🆕 SHA-256 checksums
  • 🆕 Detect compilation settings

Implementation details:

  • 🆕 Intel Goldmont capabilities level
  • 🆕 Arm NEON+SHA capabilities level
  • Hardened Rust builds & capability masking
  • Faster buffer filling in sz_hash in NEON backend
  • Fixed tail handling in sz_copy in SVE backend

Minor

  • Add: Check comp-time capabilities (3347be4)
  • Add: sz_cap_goldmont_k capability! (f70e927)
  • Add: neon+sha new capability! (fcb68a4)
  • Add: Sha256 to bench_token (bb077da)
  • Add: hmac_sha256 APIs (bf1971e)
  • Add: Sha256 class for Python (6ae7b75)
  • Add: Initial Sha256 variant for NEON (bd35030)
  • Add: SHA256 for Arm (20672dd)

Patch

  • Fix: Avoid unaligned SHA loads on ArmV7 (ebf0503)
  • Fix: Sign conversion warning (3c3e5fc)
  • Make: before-all for dnf on Fedora & apt on Debian (fc74452)
  • Make: Consume env-vars for Rust backend builds (222fc39)
  • Improve: Amortize bench_unary costs (d8d19ce)
  • Fix: Init uint32x4_t on MSVC (3dce631)
  • Improve: Bring back SVE2 hash for short inputs (b1c750b)
  • Improve: More sorting tests (61e08ce)
  • Improve: Simplify SVE memory-ops (35e2236)
  • Fix: sz_copy_sve tail issue (2fa818d)
  • Fix: Avoid <arm_neon_sve_bridge.h> (e5b4496)
  • Improve: Different SHA pipeline for AArch64 (c8aafd3)
  • Improve: Try better SHA pipelining (313f71f)
  • Improve: Faster 2-block SHA256 on NEON (9425341)
  • Improve: Deprecate SVE2 hashing (9cb1588)
  • Improve: Try using non-temporal SVE loads (4572a63)
  • Fix: svlasta_u64(svpfalse_b()) UB (4833e83)
  • Improve: Westmere-like hash updates in NEON (064355f)
  • Improve: Hardening Rust builds (d6a9ba6)
  • Fix: Type-casting on Arm (03a0340)

v4.1: Intel Westmere Kernels

02 Oct 18:53

Choose a tag to compare

Thanks to @Algunenano and the broader ClickHouse team for help, back-porting StringZilla kernels to older CPUs 🤗
With this release:

  • Substring search and hashing on CPUs from Westmere to Haswell will become at least 2x faster.
  • Inferring Skylake capabilities in dynamic dispatch won't require VAES extensions only needed for Ice Lake and newer.
  • MSVC will correctly detect Haswell, Ice Lake, and NEON capabilities for compile-time dispatch, lacking options to differentiate other platforms from macros.

Minor

Patch

  • Fix: Checking for SSE and AES in MSVC (f7d95ad)
  • Docs: More links & GPU mentions (04d6ae3)
  • Improve: Reuse AES-NI since Westmere (a3b7cd5)
  • Fix: Replace Nehalem with Westmere (#249) (fe30683)
  • Fix: VAES debuted in Ice Lake (8cef111)

Release v4.0.15

30 Sep 19:24

Choose a tag to compare

Release: v4.0.15 [skip ci]

Patch

  • Improve: Faster unaligned loads in fingerprints (30ad812)
  • Fix: Avoid Ice Lake instructions on older CPUs (96ce576)
  • Improve: Faster streaming hashes on x86 (2ce62ba)
  • Docs: Levenshtein wave shape (8e1f70c)

Release v4.0.14

22 Sep 11:36

Choose a tag to compare

Release: v4.0.14 [skip ci]

Patch

Release v4.0.13

19 Sep 14:13

Choose a tag to compare

Release: v4.0.13 [skip ci]

Patch

  • Make: Pull CUDA within CIBW_BEFORE_ALL (6316330)

v4.0.12: Zero-Copy for Rust and Python

19 Sep 10:30

Choose a tag to compare

This release fixes a critical bug where non-owning Strs slices incorrectly copied entire parent data during GPU memory allocation, instead of just the slice portion. The fix ensures proper Apache Arrow-compatible StringTape format handling with correct offset normalization for zero-copy operations. GPU memory management is now significantly more efficient, eliminating unnecessary re-allocations when data already resides in GPU memory through intelligent parent chain traversal.

A new stringzillas.to_device() function enables explicit GPU memory pre-allocation, useful for testing and performance optimization:

import stringzilla as sz
import stringzillas as szs

# Create strings and slices
strs = sz.Strs(["hello", "world", "test", "data"])
slice_view = strs[1:3] # Non-owning view of ["world", "test"]

# Pre-allocate on GPU (if available)
gpu_strs = szs.to_device(strs)
gpu_slice = szs.to_device(slice_view) # Correctly handles slice offsets

Cross-platform builds are now more stable with fixes for Windows ARM64 cross-compilation, ensuring mutually exclusive architecture flags prevent header conflicts. The CI/CD pipeline correctly generates stringzillas-cuda packages by properly propagating environment variables through cibuildwheel. Enhanced test coverage includes complex Unicode scenarios with RTL text, emoji sequences, and different normalization forms. Documentation has been extended with Rust examples showcasing zero-copy compute_into APIs using StringTape format.

Patch

  • Make: Mutually exclusive platform flags (773d959)
  • Fix: Skip to_device tests w/out GPUs found (d701592)
  • Make: Propagate SZ_TARGET into CIBW env (b222e82)
  • Improve: Avoid realloc for on-GPU views (77e67cf)
  • Docs: Zero-copy Rust compute_into API with StringTape (f4ad81e)
  • Improve: Validate to_device(Strs) for unicode (f3c5357)
  • Improve: Pre-send to GPU with to_device (c78cd21)
  • Fix: Same Strs slicing as in StringTape (b4f8d12)

Release v4.0.11

16 Sep 21:51

Choose a tag to compare

Release: v4.0.11 [skip ci]

Patch

  • Improve: Striping APIs for Python (75321c5)
  • Docs: Coloring StringZilla green! (c3ccfa6)

Release v4.0.10

15 Sep 13:19

Choose a tag to compare

Release: v4.0.10 [skip ci]

Patch

  • Fix: Propagate big-endian & SIMD flags (0d6c4ef)

Release v4.0.9

15 Sep 12:37

Choose a tag to compare

Release: v4.0.9 [skip ci]

Patch

  • Make: Stricter SZ_IS_64BIT checks (0edabf5)
  • Make: Revert PyPI trusted publishing (c679e41)
  • Fix: Unmark__i386__ & _M_IX86 as 64-bit (4f3d8dc)
  • Make: Better detect 64-bit Py builds (1b91246)
  • Make: Identical CI job names (d15e779)

Release v4.0.8

15 Sep 01:21

Choose a tag to compare

Release: v4.0.8 [skip ci]

Patch

  • Docs: Outdated algorithm details (68dc092)
  • Make: Guess platform for PyPI and sdist (e1966de)
  • Make: Embed SZ_TARGET.env for PyPI sdists (57822a4)
  • Make: Require serial package for parallel PyPI packages (d3ef8c8)
  • Make: Move Py benchmarks to StringWa.rs (52c90da)
  • Improve: Compare Levenshtein to CuDF (fe1e32b)
  • Make: Bump StringTape (da974f5)
  • Improve: Take slices in compute_into (b507a8f)
  • Improve: compute_into APIs for Rust (103019c)
  • Docs: Ship different __description__s (1467001)
  • Make: Supress warnings in Windows CIBW (a879dd0)
  • Make: Drop Windows before-test override (d19e2fa)
  • Improve: Skip sz PyTests in szs runs (8abcdab)
  • Fix: Windows compilation issues (fee2ffc)
  • Fix: Safer conversion to RawParts (a9cfa22)