v0.8.0
Highlights
- More perf!
mx.fast.rms_normandmx.fast.layer_norm- Switch to Nanobind substantially reduces overhead
- Up to 4x faster
__setitem__(e.g.a[...] = b)
Core
mx.inverse, CPU only- vmap over
mx.matmulandmx.addmm - Switch to nanobind from pybind11
- Faster setitem indexing
mx.fast.rms_norm, token generation benchmarkmx.fast.layer_norm, token generation benchmark- vmap for inverse and svd
- Faster non-overlapping pooling
Optimizers
- Set minimum value in cosine decay scheduler
Bugfixes
- Fix bug in multi-dimensional reduction