v0.30.3
Highlights
- Support nvfp4 and mxfp8 quantized ops on Metal
- Support nvfp4 and mxfp8 quantized-quantized matrix-matrix multiplication on CUDA
What's Changed
- Bump the patch version by @angeloskath in #2922
- Faster copy for col contig to row contig by @awni in #2917
- Fix cuda release by @awni in #2925
- Metal logging by @CC-Yeh in #2904
- fix cuda release part 2 by @awni in #2926
- new[CI]: add linux sanitizer tests by @incertum in #2860
- patch bump by @awni in #2927
- Fix CUDA pypi release by @awni in #2929
- Move allocate_workspace to cuda/utils.h by @zcbenz in #2923
- Allow dry run for PyPI release workflow by @zcbenz in #2928
- Set rpath with cmake for CUDA build by @zcbenz in #2932
- Fix nightly build by @zcbenz in #2933
- Set install rpath of python bindings with cmake by @zcbenz in #2934
- Fix pid in local launch by @angeloskath in #2936
- Make CUDA CI run faster by @zcbenz in #2939
- refactor: use perf_counter for accurate benchmarking by @Satyam12singh in #2940
- Fix for non row-contig scales by @awni in #2941
- Fix stubgen by @zcbenz in #2942
- ci: add macOS 26 target by @madrob in #2937
- Fix float64 size in data_types.rst by @pdevine in #2948
- Fixes in mlx.distributed_config by @angeloskath in #2947
- Metal/CPU nvfp4 and mxfp8 by @awni in #2946
- [CUDA] Implement gather_mm_rhs by @zcbenz in #2902
- Fetch nanobind with cmake by @zcbenz in #2949
- refactor: use time.perf_counter for consistent and accurate benchmarking by @Satyam12singh in #2943
- BUG FIX - Addition of missing parameter in random::uniform by @hwiesmann in #2963
- Fix doc issues in
mlx.nn.init.he_normalandmlx.nn.hard_tanhby @Redempt1onzzZZ in #2968 - fix numpy dtype bug by @awni in #2960
- QQ linear by @nastya236 in #2931
- fix array allocator with user buffer and deleter by @andresy in #2971
- Swizzle scales by @nastya236 in #2979
- Fix
grid_dim_xcalculations by @CC-Yeh in #2980 - Add asarray to array_namespace by @Anri-Lombard in #2966
- fix doc by @CC-Yeh in #2988
- replace MLX_IBV_COORDINATOR with MLX_JACCL_COORDINATOR by @Evanev7 in #2986
- Fix RandomBits::is_equivalent to include width by @MillaFleurs in #2978
- Don't try to use NAX at run-time if kernels aren't there by @awni in #2982
- Expose to/from fp8 in Python and don't auto-convert fp8 when loading from safetensors by @awni in #2985
- Allow some non 2D inputs in qqmm by @awni in #2981
New Contributors
- @pdevine made their first contribution in #2948
- @hwiesmann made their first contribution in #2963
- @Anri-Lombard made their first contribution in #2966
- @Evanev7 made their first contribution in #2986
- @MillaFleurs made their first contribution in #2978
Full Changelog: v0.30.1...v0.30.3