Skip to content

0.49.0

Latest

Choose a tag to compare

@matthewdouglas matthewdouglas released this 11 Dec 20:51
· 2 commits to main since this release

Highlights

x86-64 CPU Improvements

CPU performance for 4bit is significantly improved on x86-64, with optimized kernel paths for CPUs that have AVX512 or AVX512BF16 support.

AMD ROCm Experimental Wheels

  • Experimental support for AMD devices is now included in our PyPI wheels on Linux x86-64.
  • We've added additional GPU target devices as outlined in our docs.
  • Support for using the default blocksize of 64 for 4bit was added for RDNA GPUs in #1748.

macOS 14+ Wheels

  • We're now publishing wheels for macOS 14+!
  • The 4bit and 8bit quantization features are supported on MPS by slow implementations. We plan to enable Metal kernels with improved performance in the future.

🚨 Breaking Changes

  • Dropped support for Python 3.9.
  • Dropped compilation support for Maxwell GPUs in the CUDA backend.

What's Changed

New Contributors

Full Changelog: 0.48.2...0.49.0