Release v0.30.0 · ml-explore/mlx

Highlights

Support for Neural Accelerators on M5 (macOS >= 26.2)

What's Changed

Fix AdamW weight_decay default value in docstring by @goingreen in #2557
Fix dequantize python sig by @wrmsr in #2562
fix copies in sdpa by @awni in #2563
chore: Update Docs With Slice Copy Example by @krishi-saripalli in #2559
Fixed several type annotations in the MLX stubs which degraded to Unknown/Any by @Maalvi14 in #2560
typing: add type hints to mlx.core.array, linalg, and random by @XXXXRT666 in #2565
Set ccache size before building by @zcbenz in #2570
Faster fully depthwise-separable 1D conv by @awni in #2567
Fix a few ccache cache miss by @zcbenz in #2573
Some tweaks in cmake files by @zcbenz in #2574
Add batch offsets for mx.fast.rope by @awni in #2564
[CUDA] Use GEMM with epilogue instead of AddMM by @zcbenz in #2569
[CUDA] Fix alpha not respected when using bias epilogue by @zcbenz in #2578
Fix flaky addmm tests by @zcbenz in #2581
Adding Relu2 by @Goekdeniz-Guelmez in #2582
Add sdpa with sinks by @awni in #2558
[CUDA] Set bias as input when using bias epilogue by @zcbenz in #2584
[CUDA] Fix NCCL stub for release build by @awni in #2587
patch bump by @awni in #2588
Refactor code examples to use 'gelu' by @umbertomig in #2592
Fix metal scan by @awni in #2591
Fix typo in average_gradients function call by @umbertomig in #2594
No copy batch rope by @awni in #2595
Update export function example for array input by @umbertomig in #2598
Expose mx.depends to Python by @awni in #2606
fix: library loading for swift dynamic frameworks by @bilousoleksandr in #2568
Detect cache thrashing in LRUCache by @zcbenz in #2600
Lower sorted QMM gather threshold by @awni in #2609
implement Convolution::output_shape by @josharian in #2601
Avoid producing NaN in attention by @awni in #2608
[CUDA] Recycle CUDA events by @zcbenz in #2604
[CUDA] fix cudaGraphLaunch by @CC-Yeh in #2613
Support pickling array for bfloat16 by @CC-Yeh in #2586
New tuning for small K gemv by @jagrit06 in #2620
Allow None input to compiled functions by @awni in #2621
Compiled should not end in broadcast by @angeloskath in #2622
Bump the version by @angeloskath in #2627
[CUDA] Make CudaEvent work with multi-device by @zcbenz in #2614
Fix incorrect path and typos by @aisk in #2630
Fix for max block dim by @awni in #2631
Compile now can attach arbitrary data to an entry by @angeloskath in #2634
[CUDA] Wait for tasks in cuda by @awni in #2636
Fix status message by @angeloskath in #2638
fix cross entropy axis param by @awni in #2641
Faster triu, tril, where with scalar by @awni in #2644
[CUDA] Add a small column specialization to reduce by @angeloskath in #2642
[CUDA] Fix flaky test by @awni in #2646
Configure CMake to export compile_commands.json by @andportnoy in #2645
Faster complex matmul by @CC-Yeh in #2571
Fix compile when outputs change by @awni in #2648
Speed up compile for node with many parents by @awni in #2649
Fix and refactor row-reduce by @angeloskath in #2650
[CUDA] Fix jit file cache for large kernel names by @angeloskath in #2656
Fix all_gather vjp by @awni in #2654
Fix fast synch when fence is waited before a command buffer is created by @awni in #2657
Fix cumulative operations when axis=None by @aisk in #2653
Export with callback by @awni in #2612
bump patch by @awni in #2658
Enable addmm low-precision cpu by @awni in #2661
Precise sigmoid by @awni in #2659
Debug cuda conv by @awni in #2662
Speed up scalars part 2 by @awni in #2669
Normalize README bullet formatting and other Markdown small fixes by @Mistobaan in #2671
Modified sort behavior when running CPU or Metal to match NumPy/JAX by @Maalvi14 in #2667
remove unused unary file by @awni in #2672
Nccl timeout by @nastya236 in #2673
suppress gcc 10.1 warnings by @awni in #2679
patch bump by @awni in #2680
Improved mx.split() docs by @Maalvi14 in #2689
fix warnings showing up with -Wall by @andresy in #2692
Einsum error msg improvement by @Maalvi14 in #2690
optionally load metallib from framework by @davidkoski in #2702
Fix addmm cpu for beta != 1.0 by @awni in #2699
Add mx.median op by @awni in #2705
bump python by @awni in #2694
Fp8 conversion by @awni in #2686
fix: linux-{fedora}x86_64-build by @incertum in #2707
Add quantize/dequantize for mxfp8 and nvfp4 by @awni in #2688
Migrate CircleCI to GitHub Actions by @madrob in #2716
Fix KeyError for missing domain_uuid_key in Thunderbolt setup by @thechriswebb in #2682
fix memory count bug by @awni in #2717
Fix the order of hosts in the ring by @angeloskath in #2718
Fix docs path by @madrob in #2719
Use faster dequant for fp4 by @awni in #2720
update: add linux fedora container CI - CPP build test only by @incertum in #2722
add null check -- the bundleIdentifier is optional by @davidkoski in #2709
Fix compile multi capture by @awni in #2678
Set up publishing to PyPI and Test-PyPI by @madrob in #2721
Check isnan in maximum / minimum with CPU backend by @aisk in #2652
Fix addmm with empty matrices and beta != 1.0 by @harsh-sutariya in #2715
skip self-hosted runners on forks by @madrob in #2730
only build for macos 14 and up by @awni in #2731
don't test when doing release by @awni in #2734
Make cpu binary_op easily accessible by @angeloskath in #2733
fix property name by @madrob in #2736
Nccl reduce scatter, all gather by @nastya236 in #2727
[CUDA] Reduce use of managed memory by @awni in #2725
Shapeless support for zeros/ones_like by @CC-Yeh in #2726
Compatibility with pip-installed openmpi by @pcuenca in #2741
Fix release builds by @awni in #2746
patch bump by @awni in #2750
Fix dequantize python sig (dtype default) by @wrmsr in #2752
remove circle by @awni in #2753
Fix irregular_strides benchmark shape type by @wrmsr in #2754
Linux on arm by @awni in #2751
minor debugging for publishing by @madrob in #2739
Export custom kernel by @awni in #2756
Fix slice with negative strides by @awni in #2758
[CUDA] Check CUDA error in synchronize by @zcbenz in #2757
fix release by @awni in #2759
[CUDA] cuDNN forward attention by @zcbenz in #2743
Fix exporting with constants by @awni in #2769
Separate test-linux from build-linux/cuda in GitHub Actions by @zcbenz in #2765
[CUDA] Use arch specific targets when possible by @awni in #2771
Fix MPI distributed tests with CUDA backend by @zcbenz in #2775
Fix warnings with cmake 4.1 by @zcbenz in #2774
Use ccache in GitHub Actions by @zcbenz in #2773
[CUDA] Tune ops per buffer based on device by @awni in #2761
fix release 2 by @awni in #2767
Run CI for pushes by @zcbenz in #2777
Remove pip cache in GitHub Actions by @zcbenz in #2776
Build and test with multiple CUDA versions by @zcbenz in #2780
Use std::optional for mask_arr arg by @zcbenz in #2763
Do not run CPU tests in CUDA builds by @zcbenz in #2784
Test every commit in main branch by @zcbenz in #2781
Fix nightly build by @zcbenz in #2785
Remove unneeded tests in nightly build by @zcbenz in #2786
Fix building with CUDA < 12.8 by @zcbenz in #2782
Avoid duplicate CI runs when starting a PR from upstream branch by @zcbenz in #2788
build docs on linux by @awni in #2787
[CUDA] cuDNN backward attention by @zcbenz in #2762
more accurate rope fallback by @awni in #2792
Fix version tag by @awni in #2790
version by @awni in #2797
Add Masked Scatter by @CC-Yeh in #2663
Add Neural Accelerator Support by @jagrit06 in #2772

New Contributors

@goingreen made their first contribution in #2557
@krishi-saripalli made their first contribution in #2559
@Maalvi14 made their first contribution in #2560
@XXXXRT666 made their first contribution in #2565
@umbertomig made their first contribution in #2592
@bilousoleksandr made their first contribution in #2568
@josharian made their first contribution in #2601
@aisk made their first contribution in #2630
@Mistobaan made their first contribution in #2671
@incertum made their first contribution in #2707
@thechriswebb made their first contribution in #2682
@harsh-sutariya made their first contribution in #2715
@pcuenca made their first contribution in #2741

Full Changelog: v0.29.0...v0.30.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.30.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Highlights

What's Changed

New Contributors

Contributors

Uh oh!