Use model compression pathways #1419

kylesayrs · 2025-05-08T15:30:49Z

Purpose

Use in-memory model compression pathway in order to reduce memory requirements when saving models
These changes along with postprocessing changes move users towards a pattern where they are aware of the status of the model (frozen/compressed) and call save_pretrained manually

Fixes

Fixes OOM during save_pretrained of compressed model #1183

Prerequisites

[Tests] Use proper offloading utils in test_compress_tensor_utils #1449

Changes

Modify save_pretrained_wrapper to use compress_model(model) rather than compress(state_dict)
Modify save_pretrained_wrapper so that the state dict is only retrieved if not skipping compression stats
Modify save_pretrained_wrapper to save dictionary and python files, even if there is no explicit compressor
Modify save_checkpoint (used by training) to decompress after the checkpoint is saved

Example/Testing Changes

Below lists all of the instances where a model undergoes saving (no immediately followed by script

File Path	Solution
examples/trl_mixin/ex_trl_constant.py test_oneshot_and_finetune.py tests/llmcompressor/transformers/obcq/test_obcq_completion.py	Decompress in between stages
examples/quantization_2of4_sparse_w4a16/llama7b_sparse_w4a16.py test_oneshot_and_finetune_with_tokenizer.py	Do not save in between stages to avoid compressed state
test_oneshot_then_finetune.py	No work is required, as model is decompressed upon loading from disk
test_compress_tensor_utils.py	Fix test to use `dispatch_model` (which is actually used by transformers) rather than `cpu_offload`

Testing

State Dict	In Memory

oneshot_save.py

import torch
from transformers import AutoModelForCausalLM
from llmcompressor import oneshot
from llmcompressor.modifiers.quantization import QuantizationModifier
from pttp import TensorProfiler

#MODEL_ID = "DeepSeek-V3_local_bf16"
MODEL_ID = "meta-llama/Meta-Llama-3-8B-Instruct"

with TensorProfiler() as prof:
    prof.mark_event("Load model")
    model = AutoModelForCausalLM.from_pretrained(MODEL_ID, torch_dtype=torch.bfloat16)

    prof.mark_event("Oneshot")
    oneshot(
        model=model,
        recipe=QuantizationModifier(targets="Linear", scheme="W4A16"),
        trust_remote_code_model=True,
    )

    prof.mark_event("Save model")
    model.save_pretrained("sav_testing", save_compressed=True, skip_compression_stats=True)

prof.save_memory_timeline("save_timeline.png")

Testing

Nightly: https://github.com/neuralmagic/llm-compressor-testing/actions/runs/15453075963

Signed-off-by: Kyle Sayers <[email protected]>

github-actions · 2025-05-08T15:30:57Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

Signed-off-by: Kyle Sayers <[email protected]>

…ession-pathways

Signed-off-by: Kyle Sayers <[email protected]>

brian-dellabetta

exciting!

…1449) ## Purpose ## * Prerequisite for #1419 * This PR disables getting the offloaded state dict unless necessary (sparsity statistics). However, the utility function `cpu_offload` only works if the offloaded state dict is retrieved. Let's replace this with `dispatch_model`, which is the actual function used by `PretrainedModel`, not `cpu_offload` ## Changes ## * Rename `device_map` to `device` * Use `dispatch_model` rather than `cpu_offload` * Use `align_module_device` and `update_offload_parameter` utilities * This change is necessary because, after these changes, some of these test models no longer have offloaded state dicts (which is the way it should always have been) Signed-off-by: Kyle Sayers <[email protected]>

…ession-pathways

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs · 2025-06-05T12:44:59Z

Nightly is passing

brian-dellabetta

good stuff!

src/llmcompressor/pytorch/model_load/helpers.py

gemini-code-assist · 2025-06-05T22:52:30Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

gemini-code-assist · 2025-06-05T22:52:32Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

change compression

bd563e1

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs added 2 commits May 14, 2025 11:33

use model compression pathway

d226556

Signed-off-by: Kyle Sayers <[email protected]>

modify tests and examples

f453408

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs added the ready When a PR is ready for review label May 14, 2025

avoid circular import

bbe068c

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs removed the ready When a PR is ready for review label May 14, 2025

kylesayrs added 3 commits May 19, 2025 14:46

fix test which should use dispatch_model rather than cpu_offload

a432d9d

Signed-off-by: Kyle Sayers <[email protected]>

fix reload, uncompress for oneshot_and_finetune

8f891f8

Signed-off-by: Kyle Sayers <[email protected]>

Merge remote-tracking branch 'origin' into kylesayrs/use-memory-compr…

e2fbbf3

…ession-pathways

kylesayrs added the ready When a PR is ready for review label May 19, 2025

use proper offloading

b301983

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs mentioned this pull request May 19, 2025

[Tests] Use proper offloading utils in test_compress_tensor_utils #1449

Merged

kylesayrs marked this pull request as ready for review May 20, 2025 04:44

kylesayrs changed the title ~~[WIP] Use model compression pathways~~ Use model compression pathways May 20, 2025

brian-dellabetta previously approved these changes May 20, 2025

View reviewed changes

brian-dellabetta mentioned this pull request May 20, 2025

OOM during save_pretrained of compressed model #1183

Closed

kylesayrs and others added 2 commits May 22, 2025 14:21

Merge remote-tracking branch 'origin' into kylesayrs/use-memory-compr…

ae3d08b

…ession-pathways

Merge branch 'main' into kylesayrs/use-memory-compression-pathways

8c1b9d7

brian-dellabetta dismissed their stale review via 8c1b9d7 May 30, 2025 18:13

This was referenced Jun 2, 2025

[AWQ] Insane memory requirement: over 900GB for 32B model #1409

Closed

[DeepSeek-V3/R1] Anyone had success quantizing DeepSeek-V3 using llm-compressor? #1482

Closed

AWQ Qwen3-235B-A22B and Qwen3-30B-A3B #1406

Closed

kylesayrs added 3 commits June 3, 2025 16:01

Merge remote-tracking branch 'origin' into kylesayrs/use-memory-compr…

903333f

…ession-pathways

decompressor for obcq test

c1567d4

Signed-off-by: Kyle Sayers <[email protected]>

reduce test runtime

94098e2

Signed-off-by: Kyle Sayers <[email protected]>

brian-dellabetta approved these changes Jun 5, 2025

View reviewed changes

src/llmcompressor/pytorch/model_load/helpers.py Show resolved Hide resolved

rahul-tuli approved these changes Jun 5, 2025

View reviewed changes

Merge branch 'main' into kylesayrs/use-memory-compression-pathways

3aa99c2

kylesayrs enabled auto-merge (squash) June 5, 2025 18:48

Merge branch 'main' into kylesayrs/use-memory-compression-pathways

55069fa

kylesayrs merged commit 421bd61 into main Jun 5, 2025
11 checks passed

kylesayrs deleted the kylesayrs/use-memory-compression-pathways branch June 5, 2025 22:52

brian-dellabetta mentioned this pull request Jun 24, 2025

[Feature] Log/info/Save/Restore quantization steps #1410

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use model compression pathways #1419

Use model compression pathways #1419

Uh oh!

kylesayrs commented May 8, 2025 •

edited

Loading

Uh oh!

github-actions bot commented May 8, 2025

Uh oh!

brian-dellabetta left a comment

Uh oh!

kylesayrs commented Jun 5, 2025

Uh oh!

brian-dellabetta left a comment

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot commented Jun 5, 2025

Uh oh!

gemini-code-assist bot commented Jun 5, 2025

Uh oh!

Uh oh!

Use model compression pathways #1419

Use model compression pathways #1419

Uh oh!

Conversation

kylesayrs commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Fixes

Prerequisites

Changes

Example/Testing Changes

Testing

Testing

Uh oh!

github-actions bot commented May 8, 2025

Uh oh!

brian-dellabetta left a comment

Choose a reason for hiding this comment

Uh oh!

kylesayrs commented Jun 5, 2025

Uh oh!

brian-dellabetta left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot commented Jun 5, 2025

Uh oh!

gemini-code-assist bot commented Jun 5, 2025

Uh oh!

Uh oh!

kylesayrs commented May 8, 2025 •

edited

Loading