Skip to content

v4.0.12: Zero-Copy for Rust and Python

Choose a tag to compare

@ashvardanian ashvardanian released this 19 Sep 10:30
· 0 commits to 773d959504ed4daefa83d403f935d1e524d1c4ed since this release

This release fixes a critical bug where non-owning Strs slices incorrectly copied entire parent data during GPU memory allocation, instead of just the slice portion. The fix ensures proper Apache Arrow-compatible StringTape format handling with correct offset normalization for zero-copy operations. GPU memory management is now significantly more efficient, eliminating unnecessary re-allocations when data already resides in GPU memory through intelligent parent chain traversal.

A new stringzillas.to_device() function enables explicit GPU memory pre-allocation, useful for testing and performance optimization:

import stringzilla as sz
import stringzillas as szs

# Create strings and slices
strs = sz.Strs(["hello", "world", "test", "data"])
slice_view = strs[1:3] # Non-owning view of ["world", "test"]

# Pre-allocate on GPU (if available)
gpu_strs = szs.to_device(strs)
gpu_slice = szs.to_device(slice_view) # Correctly handles slice offsets

Cross-platform builds are now more stable with fixes for Windows ARM64 cross-compilation, ensuring mutually exclusive architecture flags prevent header conflicts. The CI/CD pipeline correctly generates stringzillas-cuda packages by properly propagating environment variables through cibuildwheel. Enhanced test coverage includes complex Unicode scenarios with RTL text, emoji sequences, and different normalization forms. Documentation has been extended with Rust examples showcasing zero-copy compute_into APIs using StringTape format.

Patch

  • Make: Mutually exclusive platform flags (773d959)
  • Fix: Skip to_device tests w/out GPUs found (d701592)
  • Make: Propagate SZ_TARGET into CIBW env (b222e82)
  • Improve: Avoid realloc for on-GPU views (77e67cf)
  • Docs: Zero-copy Rust compute_into API with StringTape (f4ad81e)
  • Improve: Validate to_device(Strs) for unicode (f3c5357)
  • Improve: Pre-send to GPU with to_device (c78cd21)
  • Fix: Same Strs slicing as in StringTape (b4f8d12)