v0.7.0

awni released this 14 Mar 19:34

· 1175 commits to main since this release

63ab0ab

Highlights

Perf improvements for attention ops:
- No copy broadcast matmul (benchmarks)
- Fewer copies in reshape

Core

Faster broadcast + gemm
- benchmarks
mx.linalg.svd (CPU only)
Fewer copies in reshape
Faster small reductions
- benchmarks

NN

nn.RNN, nn.LSTM, nn.GRU

Bugfixes

Fix bug in depth traversal ordering
Fix two edge case bugs in compilation
Fix bug with modules with dictionaries of weights
Fix bug with scatter which broke MOE training
Fix bug with compilation kernel collision

Assets 2