Skip to content

Commit a089bfb

Browse files
authored
Merge branch 'vllm-project:main' into deprecation/torch-dtype-to-dtype
2 parents 0f8ba84 + ceed4df commit a089bfb

File tree

4 files changed

+11
-4
lines changed

4 files changed

+11
-4
lines changed

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -157,7 +157,7 @@ def localversion_func(version: ScmVersion) -> str:
157157
"pytest>=6.0.0",
158158
"pytest-mock>=3.6.0",
159159
"pytest-rerunfailures>=13.0",
160-
"lm_eval==0.4.9",
160+
"lm_eval==0.4.9.2",
161161
# test dependencies
162162
"beautifulsoup4~=4.12.3",
163163
"cmarkgfm>=2024.1.14",
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# Quantizing models without a model definition
2+
3+
`model_free_ptq` provides a PTQ pathway for data-free schemes (such as FP8 Dynamic Per Token or FP8 Block). Specifically, this pathway removes the requirement for a model definition or the need to load the model through transformers. If you are interested in applying a data-free scheme, there are two key scenarios in which applying this pathway may make sense for your model:
4+
5+
1. The model does not have a model definition available through transformers. This may be the case for a brand new model which has not landed in transformers.
6+
2. The model is very large (such as Kimi K2 Thinking) and is running into issues with `oneshot`
7+
8+
9+
`model_free_ptq` works directly with the safetensors in the checkpoint to which observers are applied, thereby removing the requirement for a model definition or transformers.

tests/llmcompressor/modifiers/transform/test_correctness.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -45,8 +45,7 @@ def test_apply_correctness(
4545
with torch.no_grad():
4646
true_output = model(**input)
4747

48-
modifier.on_initialize(state)
49-
modifier.on_start(state, None)
48+
modifier.initialize(state)
5049

5150
with torch.no_grad():
5251
output = model(**input)

tests/llmcompressor/utils/test_helpers.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -174,5 +174,4 @@ def hook(module, args):
174174
with disable_lm_head(model):
175175
input = {key: value.to("cuda") for key, value in model.dummy_inputs.items()}
176176
output = model(**input)
177-
assert lm_input_device == torch.device("cuda:0")
178177
assert output.logits.device == torch.device("meta")

0 commit comments

Comments
 (0)