Skip to content

MAX 25.2

Latest
Compare
Choose a tag to compare
@Ahajha Ahajha released this 25 Mar 15:30

Announcing MAX 25.2, featuring significant enhancements for large-scale AI deployment and GPU optimization. This release adds comprehensive NVIDIA Hopper support with high-performance kernels, multi-GPU tensor parallelism for large models like Llama-3.3-70B, and expanded model support (Phi3, Olmo, Granite). Key additions include GPTQ quantization for memory efficiency, advanced long context optimizations (in-flight batching, chunked prefill, copy-on-write), and improved kernel caching reducing compilation times up to 28%. New Mojo GPU APIs offer developers greater control and performance.

For additional details, checkout the changelog.