Releases · ml-explore/mlx

14 Jun 21:13

awni

v0.15.1

af9079c

v0.15.1

🚀

Assets 2

07 Jun 03:16

awni

v0.15.0

cf236fc

v0.15.0

Highlights

Fast Metal GPU FFTs
- On average ~30x faster than CPU
- More benchmarks
mx.distributed with all_sum and all_gather

Core

Added dlpack device __dlpack_device__
Fast GPU FFTs benchmarks
Add docs for the mx.distributed
Add mx.view op

NN

softmin, hardshrink, and hardtanh activations

Bugfixes

Fix broadcast bug in bitwise ops
Allow more buffers for JIT compilation
Fix matvec vector stride bug
Fix multi-block sort stride management
Stable cumprod grad at 0
Buf fix with race condition in scan

Assets 2

31 May 19:34

awni

v0.14.1

0798824

v0.14.1

🚀

Assets 2

24 May 01:33

angeloskath

v0.14.0

9f9cb7a

v0.14.0

Highlights

Small-size build that JIT compiles kernels and omits the CPU backend which results in a binary <4MB
- Series of PRs 1, 2, 3, 4, 5
mx.gather_qmm quantized equivalent for mx.gather_mm which speeds up MoE inference by ~2x
- Some numbers
Grouped 2D convolutions
- Some numbers

Core

mx.conjugate
mx.conv3d and nn.Conv3d
List based indexing
Started mx.distributed which uses MPI (if installed) for communication across machines
- mx.distributed.init
- mx.distributed.all_gather
- mx.distributed.all_reduce_sum
Support conversion to and from dlpack
mx.linalg.cholesky on CPU
mx.quantized_matmul sped up for vector-matrix products
mx.trace
mx.block_masked_mm now supports floating point masks!

Fixes

Error messaging in eval
Add some missing docs
Scatter index bug
The extensions example now compiles and runs
CPU copy bug with many dimensions

Assets 2

17 May 03:52

awni

v0.13.1

6a9b584

v0.13.1

🚀

Assets 2

10 May 01:21

angeloskath

v0.13.0

8bd6bfa

v0.13.0

Highlights

Block sparse matrix multiply speeds up MoEs by >2x
- some numbers
Improved quantization algorithm should work well for all networks
- see evaluations
Improved gpu command submission speeds up training and inference
- some numbers

Core

Bitwise ops added:
- mx.bitwise_[or|and|xor], mx.[left|right]_shift, operator overloads
Groups added to Conv1d
Added mx.metal.device_info to get better informed memory limits
Added resettable memory stats
mlx.optimizers.clip_grad_norm and mlx.utils.tree_reduce added
Add mx.arctan2
Unary ops now accept array-like inputs ie one can do mx.sqrt(2)

Bugfixes

Fixed shape for slice update
Bugfix in quantize that used slightly wrong scales/biases
Fixed memory leak for multi-output primitives encountered with gradient checkpointing
Fixed conversion from other frameworks for all datatypes
Fixed index overflow for matmul with large batch size
Fixed initialization ordering that occasionally caused segfaults

Assets 2

02 May 23:38

awni

v0.12.2

02a9fc7

v0.12.2

Patch bump (#1067)

* version

* use 0.12.2

Assets 2

25 Apr 21:31

angeloskath

v0.12.0

82463e9

v0.12.0

Highlights

Faster quantized matmul
- Up to 40% faster QLoRA or prompt processing, some numbers

Core

mx.synchronize to wait for computation dispatched with mx.async_eval
mx.radians and mx.degrees
mx.metal.clear_cache to return to the OS the memory held by MLX as a cache for future allocations
Change quantization to always represent 0 exactly (relevant issue)

Bugfixes

Fixed quantization of a block with all 0s that produced NaNs
Fixed the len field in the buffer protocol implementation

Assets 2

18 Apr 20:25

awni

v0.11.0

090ff65

v0.11.0

Core

mx.block_masked_mm for block-level sparse matrix multiplication
Shared events for synchronization and asynchronous evaluation

NN

nn.QuantizedEmbedding layer
nn.quantize for quantizing modules
gelu_approx uses tanh for consistency with PyTorch

Assets 2

11 Apr 19:53

awni

v0.10.0

d07e295

v0.10.0

Highlights

Improvements for LLM generation
- Reshapeless quant matmul/matvec
- mx.async_eval
- Async command encoding

Core

Slightly faster reshapeless quantized gemms
Option for precise softmax
mx.metal.start_capture and mx.metal.stop_capture for GPU debug/profile
mx.expm1
mx.std
mx.meshgrid
CPU only mx.random.multivariate_normal
mx.cumsum (and other scans) for bfloat
Async command encoder with explicit barriers / dependency management

NN

nn.upsample support bicubic interpolation

Misc

Updated MLX Extension to work with nanobind

Bugfixes

Fix buffer donation in softmax and fast ops
Bug in layer norm vjp
Bug initializing from lists with scalar
Bug in indexing
CPU compilation bug
Multi-output compilation bug
Fix stack overflow issues in eval and array destruction

Assets 2

Releases: ml-explore/mlx

v0.15.1

Uh oh!

v0.15.0

Highlights

Core

NN

Bugfixes

Uh oh!

v0.14.1

Uh oh!

v0.14.0

Highlights

Core

Fixes

Uh oh!

v0.13.1

Uh oh!

v0.13.0

Highlights

Core

Bugfixes

Uh oh!

v0.12.2

Uh oh!

v0.12.0

Highlights

Core

Bugfixes

Uh oh!

v0.11.0

Core

NN

Uh oh!

v0.10.0

Highlights

Core

NN

Misc

Bugfixes

Uh oh!