v4.0.12: Zero-Copy for Rust and Python
This release fixes a critical bug where non-owning Strs slices incorrectly copied entire parent data during GPU memory allocation, instead of just the slice portion. The fix ensures proper Apache Arrow-compatible StringTape format handling with correct offset normalization for zero-copy operations. GPU memory management is now significantly more efficient, eliminating unnecessary re-allocations when data already resides in GPU memory through intelligent parent chain traversal.
A new stringzillas.to_device() function enables explicit GPU memory pre-allocation, useful for testing and performance optimization:
import stringzilla as sz
import stringzillas as szs
# Create strings and slices
strs = sz.Strs(["hello", "world", "test", "data"])
slice_view = strs[1:3] # Non-owning view of ["world", "test"]
# Pre-allocate on GPU (if available)
gpu_strs = szs.to_device(strs)
gpu_slice = szs.to_device(slice_view) # Correctly handles slice offsetsCross-platform builds are now more stable with fixes for Windows ARM64 cross-compilation, ensuring mutually exclusive architecture flags prevent header conflicts. The CI/CD pipeline correctly generates stringzillas-cuda packages by properly propagating environment variables through cibuildwheel. Enhanced test coverage includes complex Unicode scenarios with RTL text, emoji sequences, and different normalization forms. Documentation has been extended with Rust examples showcasing zero-copy compute_into APIs using StringTape format.
Patch
- Make: Mutually exclusive platform flags (773d959)
- Fix: Skip
to_devicetests w/out GPUs found (d701592) - Make: Propagate
SZ_TARGETintoCIBWenv (b222e82) - Improve: Avoid
reallocfor on-GPU views (77e67cf) - Docs: Zero-copy Rust
compute_intoAPI with StringTape (f4ad81e) - Improve: Validate
to_device(Strs)for unicode (f3c5357) - Improve: Pre-send to GPU with
to_device(c78cd21) - Fix: Same
Strsslicing as in StringTape (b4f8d12)