Highlights
- New / Updated PTQ Algorithms:
- Improved layer support:
- Quantization of SDPA without FX #1299
- New export flows:
- Improved examples:
- Allow signed scales #1308
- QONNX export with
dynamo=True#1234
What's Changed
- Fix (setup): solve incompatibility between isort and yapf by @Giuseppe5 in #1293
- Feat (graph/hadamard): 152 had support by @Giuseppe5 in #1295
- feat (ex/llm): fully parametrise attention quantization by @nickfraser in #1287
- Feat (ex/llm): "auto" dtype by @pablomlago in #1301
- Setup: update transformers version by @Giuseppe5 in #1304
- Feat (brevitas_examples/llm): quant SDPA without FX by @Giuseppe5 in #1299
- Fix (brevitas_examples/llm): update README and yaml by @Giuseppe5 in #1305
- Setup: temporary pin pytest by @Giuseppe5 in #1307
- Feat (brevitas_examples/llm): configurable expansion step by @Giuseppe5 in #1280
- Versioning support in documentation by @Giuseppe5 in #1298
- Feat (graph/rotate): improve R2 region in SDPA by @Giuseppe5 in #1310
- Docs: improve docs build by @Giuseppe5 in #1314
- Fix (graph/rotation): rotation on subset of channels for SDPA by @Giuseppe5 in #1312
- Feat (brevitas_examples/llm): GGUF export by @Giuseppe5 in #1291
- Setup: bump torch version by @Giuseppe5 in #1205
- Fix (ex/imagenet): add forward pass in imagenet ptq example by @Giuseppe5 in #1316
- Feat (qronos): initial implementation of Qronos by @i-colbert in #1311
- Fix (graph/equalize): correct class check during rotation merging by @Giuseppe5 in #1317
- docs (core): Typo fix in docstring by @nickfraser in #1318
- Fix (llm): removing duplicate set_seed function by @i-colbert in #1319
- Fix (ex/common): save scales during optimization by @pablomlago in #1313
- Fix (llm): compatibility with non-uniform RMSNorm shapes by @i-colbert in #1324
- Fix (graph/gpxq): fix memory leak with weight_orig by @Giuseppe5 in #1325
- Shark LLM export by @Giuseppe5 in #1300
- Feat (scaling): rescaled min-max scaling and zero point by @i-colbert in #1320
- Fix (gguf): derived modify_tensors() returns generator by @i-colbert in #1329
- Fix (gpxq): device management of weight_orig for GPxQ by @i-colbert in #1330
- Fix (gguf): resolving zero point permutation issue with LlamaModel by @i-colbert in #1332
- Feat (examples): refactor imagenet and stable_diffusion entrypoints by @pablomlago in #1281
- Feat: skipping rotation optimization with load_checkpoint by @i-colbert in #1331
- Feat (core): Remove assumptions on positiveness of scales by @pablomlago in #1308
- Fix (export/qonnx): Add export support with
dynamo=Trueby @nickfraser in #1234 - Fix (core/ops_ste): preserve dtype during clamp by @Giuseppe5 in #1340
- Fix (eq/rotation): find value if passed as kwarg by @nickfraser in #1338
- Feat (ex): tests Stable Diffusion and ImageNet by @pablomlago in #1339
- Feat (ex/sdxl): DDP-like bias correction for SDXL by @pablomlago in #1342
- Feat (graph): Minor refactoring layerwise_layer_handler by @pablomlago in #1335
- Fix (torch_utils): remove deprecated functions by @Giuseppe5 in #1344
- Fix (setup): test against latest 2.1 torch by @Giuseppe5 in #1348
- Rotation fix by @Giuseppe5 in #1334
- Docs (qronos): adding docs and configs by @i-colbert in #1326
- Feat: Added ONNX export to BNN-PYNQ example by @nickfraser in #916
- Fix (deps): set
accelerate<1.10by @nickfraser in #1352 - Fix (copyright): Fix some missing copyright headers by @nickfraser in #1353
- Fix (ex/llm/benchmark): import error by @nickfraser in #1356
- Setup: temporarily pin diffusers version by @Giuseppe5 in #1360
- Fix (core/scaling): handle edge cases with signed scale by @Giuseppe5 in #1347
- Fix(core/stochastic_round): adjust stochastic round device by @Giuseppe5 in #1359
- Feat (papers): expansion paper configs by @Giuseppe5 in #1355
- Fix (docs): correct link to docs in initial README by @Giuseppe5 in #1361
- Fix (setup.py): Update author and contact in
setup.pyby @nickfraser in #1365 - Docs (README): Update maximum PyTorch version by @nickfraser in #1366
- Docs (getting started): Typo fix in example by @nickfraser in #1367
- requirements: Updated PyTorch, python versions by @nickfraser in #1370
- Feat (core/scaling): Add option to restrict the output of (scale / threshold) by @nickfraser in #1369
- deps (ex) update accelerate version by @nickfraser in #1371
Full Changelog: v0.12.0...v0.12.1