Compressed Tensors v0.14.0
What's Changed
- [decompression] Added qparam decompression by @shanjiaz in #537
- Use standard E8M0 scale format for MXFP4 by @mgoin in #538
- Add W4AFP8 preset scheme by @Etelis in #542
- [MXFP4] Fix scale offset by @dsikka in #541
- [CI] Add mergify and stale PR rules by @dsikka in #543
- [Offload] Fix
delete_offload_parameter, addclear_quantizationby @kylesayrs in #539 - [Deprecation] Add deprecation warning for marlin24 format by @Etelis in #544
- [Offload] Add offloading logic by @kylesayrs in #529
- Pin torch by @dsikka in #546
- [Testing] Pin transformers by @kylesayrs in #549
- limit transformers to <5.0.0 by @dhuangnm in #550
- Modernize python310 type hints in quantization forward by @LudovicoYIN in #548
- [Offload] Remove Accelerate by @kylesayrs in #530
- KV Cache Quantization support deepseek v3 by @zkl-ai in #533
- [Bugfix] Remove assert when dispatched to device by @kylesayrs in #554
- Modernize python310 type hints in quantization by @LudovicoYIN in #553
- [Observers] Change default weight observer to "memoryless_minmax" by @kylesayrs in #540
- Update pytest command to include report options by @dsikka in #557
- Change runner from IBM to GCP for Python tests by @dsikka in #561
- Modernize python310 type hints in utils by @LudovicoYIN in #560
- Update quantization strategy validation for actorder by @dsikka in #556
- [Offload]
DistributedCPUCacheby @kylesayrs in #534 - [MXFP4][GPTQ] Extend rounding to support FP32 by @dsikka in #551
- [Tests] Fix typo, prepare for meta offload tensors by @kylesayrs in #562
- Modernize python310 type hints in compressors by @LudovicoYIN in #563
- Modernize python310 type hints in transform/offload/registry by @LudovicoYIN in #565
- [Offload] [Bugfix] Fix distributed cpu tensor reconstruction by @kylesayrs in #567
- [Offload] [Bugfix] Reserve extra dispatch memory for fragmentation by @kylesayrs in #566
- Remove Neural Magic copyright by @Etelis in #559
- [Transforms] Support loading transforms in transformers by @kylesayrs in #528
- [Offload]
DistributedDeviceCacheby @kylesayrs in #568 - Revert "[Transforms] Support loading transforms in transformers" by @HDCharles in #578
- [Offload]
DiskCache,DistributedDiskCacheby @kylesayrs in #535 - [Offload] Make
update_offload_parametermore async and direct (2) by @kylesayrs in #576 - [Copyright] Add vLLM copyright enforcement by @kylesayrs in #575
- [Bugfix] Handle updating tensors with gradients by @kylesayrs in #580
- [bugfix] get_device_memory rank>0 fix by @HDCharles in #582
- Remove upper limit for torch dependency to support 2.10 by @dsikka in #583
- Set seed to fix flaky test by @dsikka in #584
- [Offload] Convert accelerate for loading/saving by @kylesayrs in #572
- [Bugfix] Allow parameter overwrite if shapes do not match by @kylesayrs in #586
- [Bugfix] [Offloading] Even more reserved memory, scaling with model size by @kylesayrs in #587
- Implement init_dist for distributed setup by @HDCharles in #589
- FP8 Block Quantization: Non-Divisible Shape Support by @Etelis in #547
- [Bugfix]: Reduce memory usage when load device does not match dispatch device by @kylesayrs in #592
- [bugfix] load_offloaded_model qwen3vl8b by @HDCharles in #591
- [Offload] clean up deprecation warnings, which can accumulate to 100k+ warnings by @brian-dellabetta in #593
- [Offload] Deprecate
update_parameter_databy @kylesayrs in #588 - [Bugfix] Fix
clear_quantizationby @kylesayrs in #596 - Allow broadcasting fp8 by @HDCharles in #603
- fix ruff for release by @HDCharles in #604
- [Offload] Fully invertible conversion functions by @kylesayrs in #601
- [Offload] Better device/cpu memory estimates when loading with
load_offloaded_modelby @kylesayrs in #605
New Contributors
- @LudovicoYIN made their first contribution in #548
Full Changelog: 0.13.0...0.14.0