Release v0.28.0 · ml-explore/mlx

Highlights

First version of fused sdpa vector for CUDA
Convolutions in CUDA
Speed improvements in CUDA normalization layers, softmax, compiled kernels, overheads and more

What's Changed

[CUDA] Fix segfault on exit by @awni in #2424
[CUDA] No occupancy query for launch params by @awni in #2426
[CUDA] More sizes for gemv by @awni in #2429
Add more CUDA architectures for PyPi package by @awni in #2427
Use ccache in CI by @zcbenz in #2414
[CUDA] Use aligned vector in Layer Norm and RMS norm by @awni in #2433
Cuda faster softmax by @awni in #2435
Remove the kernel arg from get_launch_args by @zcbenz in #2437
Move arange to its own file by @zcbenz in #2438
Use load_vector in arg_reduce by @zcbenz in #2439
Make CI faster by @zcbenz in #2440
[CUDA] Quantized refactoring by @angeloskath in #2442
fix circular reference by @awni in #2443
[CUDA] Fix gemv regression by @awni in #2445
Fix wrong graph key when using concurrent context by @zcbenz in #2447
Fix custom metal extension by @awni in #2446
Add tests for export including control flow models and quantized models by @junpeiz in #2430
[CUDA] Backward convolution by @zcbenz in #2431
[CUDA] Save primitive inputs faster by @zcbenz in #2449
[CUDA] Vectorize generated kernels by @angeloskath in #2444
[CUDA] Matmul utils initial commit by @angeloskath in #2441
Fix arctan2 grads by @angeloskath in #2453
Use LRU cache for cuda graph by @zcbenz in #2448
Add missing algorithm header to jit_compiler.cpp for Linux builds by @zamderax in #2460
Default install cuda on linux by @awni in #2462
fix wraps compile by @awni in #2461
Feat: add USE_SYSTEM_FMT CMake option by @GaetanLepage in #2219
Use SmallVector for shapes and strides by @zcbenz in #2454
Fix install tags by @awni in #2464
Faster gather qmm sorted test by @awni in #2463
Fix cublas on h100 by @awni in #2466
revert default cuda install by @awni in #2465
feat: support a destinations based in tree flatten/unflatten by @LVivona in #2450
Fix typo in metal command encoder by @angeloskath in #2471
Update CUDA sdpa by @jagrit06 in #2468
version by @awni in #2470

New Contributors

@junpeiz made their first contribution in #2430
@zamderax made their first contribution in #2460
@GaetanLepage made their first contribution in #2219
@LVivona made their first contribution in #2450

Full Changelog: v0.27.1...v0.28.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.28.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Highlights

What's Changed

New Contributors

Contributors

Uh oh!