Skip to content

v7.5: Parallelism & Portability

Choose a tag to compare

@ashvardanian ashvardanian released this 14 Apr 08:55
· 20 commits to main since this release
  • Built-in OpenMP bundling for JS & Python 🐍
  • Intel Granite Rapids πŸͺ¨ F16 β†’ F32 GEMMs πŸ’Ž
  • Faster bit-vector population counts for Arm NEON 🦾
  • SME compatibility with non-Apple Clang on Apple machines 🍏
  • Hardening against MSan SVE false-positives, thanks to @alexey-milovidov 🦺
  • Hardening against GCC 13 Arm NEON code-gen bugs, thanks to @swasik πŸ‚
  • _into & _parallel GEMM Rust APIs: reusing memory & ForkUnion pools πŸ†•
  • De-vectorize serial kernels with compiler flags 🎏
  • Compress source & binary distributions for Windows πŸ—œοΈ
  • Pre-build & share FreeBSD, PowerPC, RISC-V, & LoongArch libs πŸ€—

Minor

  • Add: NEON popcount kernel for nk_reduce_moments_u1 (2181e0c)
  • Add: Tensor constructors, sealed trait family, div_ceil cleanup (2792279)
  • Add: Span-based matrix _into APIs, parallel Hammings/Jaccards, full-crate docs (99289df)
  • Add: OpenMP for Python & JavaScript (499ecc9)
  • Add: Granite Rapids AMX for F16 & F32 (28036ea)

Patch

  • Fix: Native ISA probe on Apple Clang + compile/runtime glyph (bc13e02)
  • Make: Detect illegal instructions in macOS CI (289cdaf)
  • Fix: Drop -march= on macOS setup.py builds (28aac74)
  • Fix: Exclude std::signal from WASM builds (14814c5)
  • Improve: Drop GNU statement-expression macros in SVE reduce helpers (b8b4ca0)
  • Make: Drop +nosimd from AArch64 baseline (23f5195)
  • Make: Forbid auto-vectorization in portable baseline builds (43e8324)
  • Make: Pin TU baseline to per-arch ABI floor across build systems (453ed5f)
  • Fix: Mitigate GCC 13 wrong BF16 splat in Arm NEON (#346) (fc3d8ec)
  • Improve: Log faulting capability detection (a401f8a)
  • Improve: Log faulting kernel on fatal signals in nk_test (22c7c79)
  • Make: Normalize Python test dependencies across CI and docs (8a0f3d4)
  • Make: Baseline-only ISA for shared-library test, harden Windows CI (1907685)
  • Fix: Wrong compiler probes for SMEBF16 & SMEBI32 (8b19ddb)
  • Make: Log host CPU capabilities in macOS and Windows CI jobs (988eeb2)
  • Fix: Pre-declare OpenMP loop counter, universal libomp for macOS (493a021)
  • Fix: Use int for OpenMP loop counters, absolute libomp install name (ccc0118)
  • Fix: GCC requires +sme prefix in target attribute for _arm_sc* stubs (291dc0a)
  • Fix: Signed OpenMP iterators, source-built libomp, JS KMP guard (dc1ae75)
  • Fix: OpenMP wheel builds on macOS and Windows (f569121)
  • Fix: Add target("sme") to _arm_sc* stubs for GCC compatibility (ad2add0)
  • Fix: Unpoison SVE scalar reductions for MemorySanitizer (#342) (b42eda7)
  • Improve: Move SME runtime stubs to types.h as weak inline definitions (64ca934)
  • Improve: Manual SME streaming control, single enter/exit per API call (6432837)
  • Fix: Update cdist edge-case test for re-added threads= kwarg (50681af)
  • Make: Allow force-enabling ISA targets via environment variables (0e58702)
  • Improve: Abandon F32β†’F64 via Ozaki on Granite Rapids (94a5f19)
  • Make: FreeBSD, PPC64le, LoongArch, RISC-V releases & compress Windows (a9a0d83)
  • Make: Standardize CI compilers and add Windows test job (9a22ea4)
  • Make: Shrink serial fallbacks with scoped size optimization (83154a8)
  • Make: Compress Windows builds (e30ad3d)
  • Fix: Streaming-compatible stubs for LLVM SME builds (0be7b2f)