-
Notifications
You must be signed in to change notification settings - Fork 490
add example of w8a8fp8 for qwen3.5 #2631
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,45 @@ | ||||||||||||||||||||||
| import torch | ||||||||||||||||||||||
| from compressed_tensors.utils import save_mtp_tensors_to_checkpoint | ||||||||||||||||||||||
| from datasets import load_dataset | ||||||||||||||||||||||
| from transformers import AutoProcessor, Qwen3_5MoeForConditionalGeneration | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| from llmcompressor import oneshot | ||||||||||||||||||||||
| from llmcompressor.modifiers.quantization import QuantizationModifier | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| # NOTE: This example requires transformers >= v5 | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| MODEL_ID = "/Qwen/Qwen3.5-122B-A10B" | ||||||||||||||||||||||
|
zhangxin81 marked this conversation as resolved.
Outdated
|
||||||||||||||||||||||
|
|
||||||||||||||||||||||
| # Load model. | ||||||||||||||||||||||
| model = Qwen3_5MoeForConditionalGeneration.from_pretrained(MODEL_ID, dtype="auto") | ||||||||||||||||||||||
| processor = AutoProcessor.from_pretrained(MODEL_ID) | ||||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
# Check the actual file content at the specified lines
cat -n examples/quantization_w8a8_fp8/qwen3_5_example.py | head -50Repository: vllm-project/llm-compressor Length of output: 1796 🏁 Script executed: #!/bin/bash
# Find the oneshot function to verify its signature
find . -type f -name "*.py" -exec grep -l "def oneshot" {} \;Repository: vllm-project/llm-compressor Length of output: 174 🏁 Script executed: #!/bin/bash
# Search for oneshot import and usage patterns
rg "from.*oneshot|import.*oneshot" -A 2 -B 2 | head -50Repository: vllm-project/llm-compressor Length of output: 3476 🏁 Script executed: #!/bin/bash
# Read the oneshot function signature
head -100 src/llmcompressor/entrypoints/oneshot.py | cat -nRepository: vllm-project/llm-compressor Length of output: 4747 🏁 Script executed: #!/bin/bash
# Look for type hints on the recipe parameter
rg "def oneshot" -A 30 src/llmcompressor/entrypoints/oneshot.pyRepository: vllm-project/llm-compressor Length of output: 1234 🏁 Script executed (no clone): Length of output: 156 🏁 Script executed: #!/bin/bash
# Check how QuantizationModifier is used in tests with oneshot
rg "oneshot.*recipe=.*QuantizationModifier|QuantizationModifier.*recipe" -B 5 -A 5 | head -80Repository: vllm-project/llm-compressor Length of output: 3255 🏁 Script executed: #!/bin/bash
# Look at actual test examples to see how recipe is passed to oneshot
rg "oneshot\(" -A 5 tests/ | grep -A 5 "recipe=" | head -60Repository: vllm-project/llm-compressor Length of output: 3785 🏁 Script executed: #!/bin/bash
# Check if there's parameter handling that converts QuantizationModifier to something else
rg "QuantizationModifier" examples/ tests/ | head -20Repository: vllm-project/llm-compressor Length of output: 2340 🏁 Script executed: #!/bin/bash
# Check how oneshot handles the recipe parameter internally
rg "def oneshot" -A 100 src/llmcompressor/entrypoints/oneshot.py | head -120Repository: vllm-project/llm-compressor Length of output: 4316 🏁 Script executed: #!/bin/bash
# Look at the parse_args function to see how recipe is processed
rg "def parse_args|recipe" src/llmcompressor/args/ | head -40Repository: vllm-project/llm-compressor Length of output: 2087 🏁 Script executed: #!/antml:bash
# Check the actual implementation of how recipe is used in oneshot
sed -n '200,400p' src/llmcompressor/entrypoints/oneshot.py | cat -nRepository: vllm-project/llm-compressor Length of output: 10924 🏁 Script executed: #!/bin/bash
# Look at the Oneshot.__init__ method to see how recipe is processed
sed -n '150,300p' src/llmcompressor/entrypoints/oneshot.py | cat -nRepository: vllm-project/llm-compressor Length of output: 6353 🏁 Script executed: #!/bin/bash
# Search for how recipe parameter is converted/processed
rg "self.recipe" src/llmcompressor/entrypoints/oneshot.py -B 2 -A 2Repository: vllm-project/llm-compressor Length of output: 1029 🏁 Script executed: #!/bin/bash
# Look at parse_args to see how it handles recipe parameter
rg "def parse_args" -A 50 src/llmcompressor/args/utils.py | head -80Repository: vllm-project/llm-compressor Length of output: 1702 🏁 Script executed: #!/antml:bash
# Check RecipeArguments to see what recipe field accepts
cat src/llmcompressor/args/recipe_arguments.py | head -100Repository: vllm-project/llm-compressor Length of output: 1231 🏁 Script executed: #!/bin/bash
# Search for how modifiers are handled when passed as recipe
rg "isinstance.*recipe.*Modifier|Modifier.*recipe" src/ -A 3 -B 3Repository: vllm-project/llm-compressor Length of output: 53 🏁 Script executed: #!/bin/bash
# Check if there's special handling in parse_args or HfArgumentParser usage
rg "recipe.*isinstance|type.*recipe" src/llmcompressor/args/ -A 3 -B 3Repository: vllm-project/llm-compressor Length of output: 53 🏁 Script executed: #!/an/bash
# Look for the actual test that passes QuantizationModifier to oneshot
cat -n tests/e2e/e2e_utils.py | grep -A 10 -B 10 "recipe.*QuantizationModifier"Repository: vllm-project/llm-compressor Length of output: 993 🏁 Script executed: #!/bin/bash
# Check HfArgumentParser to see if it has special handling
rg "HfArgumentParser|parse_dict" src/llmcompressor/args/utils.py -B 3 -A 10Repository: vllm-project/llm-compressor Length of output: 933 🏁 Script executed: #!/bin/bash
# Look for how recipe kwarg is handled when it's a Modifier object
rg "if.*kwargs|isinstance.*kwargs.*recipe" src/llmcompressor/entrypoints/oneshot.py -B 3 -A 5Repository: vllm-project/llm-compressor Length of output: 682 🏁 Script executed: #!/bin/bash
# Check if parse_dict from transformers HfArgumentParser allows non-string values
python3 << 'EOF'
# Check what HfArgumentParser.parse_dict does with non-string values
from transformers import HfArgumentParser
from dataclasses import dataclass, field
from typing import Optional
`@dataclass`
class TestArgs:
recipe: Optional[str] = field(default=None)
parser = HfArgumentParser(TestArgs)
# Test 1: Passing a string
result1 = parser.parse_dict({"recipe": "path/to/recipe.yaml"})
print(f"Test 1 (string): {result1}")
# Test 2: Passing a non-string object
class DummyModifier:
pass
result2 = parser.parse_dict({"recipe": DummyModifier()})
print(f"Test 2 (object): {result2}")
print(f"Test 2 recipe type: {type(result2[0].recipe)}")
EOFRepository: vllm-project/llm-compressor Length of output: 195 🏁 Script executed: #!/bin/bash
# Verify the exact model ID and leading slash behavior in HF
python3 << 'EOF'
import os
# The claim is that leading slash breaks from_pretrained
model_id_slash = "/Qwen/Qwen3.5-122B-A10B"
model_id_no_slash = "Qwen/Qwen3.5-122B-A10B"
print(f"Path with slash: {model_id_slash}")
print(f"Is absolute: {os.path.isabs(model_id_slash)}")
print()
print(f"Path without slash: {model_id_no_slash}")
print(f"Is absolute: {os.path.isabs(model_id_no_slash)}")
print()
print("HF transformers treats leading slash as absolute file path,")
print("which would make from_pretrained look for /Qwen/Qwen3.5-122B-A10B")
print("as a local directory instead of a Hub model ID.")
EOFRepository: vllm-project/llm-compressor Length of output: 367 🏁 Script executed: #!/bin/bash
# Check if there are any other examples with similar patterns
rg "MODEL_ID.*=.*/" examples/ | head -20Repository: vllm-project/llm-compressor Length of output: 2004 🏁 Script executed: #!/antml:bash
# Double check the exact code in the file one more time
sed -n '10,16p' examples/quantization_w8a8_fp8/qwen3_5_example.pyRepository: vllm-project/llm-compressor Length of output: 259 🏁 Script executed: #!/bash
# Final confirmation: check if there are integration tests that pass modifiers to oneshot successfully
rg "oneshot.*QuantizationModifier|oneshot.*GPTQModifier" tests/ -lRepository: vllm-project/llm-compressor Length of output: 93 Remove the leading slash from the model ID so the example can load from Hugging Face Hub. At Line 11, Suggested fix-MODEL_ID = "/Qwen/Qwen3.5-122B-A10B"
+MODEL_ID = "Qwen/Qwen3.5-122B-A10B"📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||||||||||||||||||
|
|
||||||||||||||||||||||
| # No need to include mtp layers as they are not loaded | ||||||||||||||||||||||
| # through Qwen3_5MoeForConditionalGeneration | ||||||||||||||||||||||
| recipe = QuantizationModifier( | ||||||||||||||||||||||
| targets="Linear", | ||||||||||||||||||||||
| scheme="FP8_DYNAMIC", | ||||||||||||||||||||||
| ignore=[ | ||||||||||||||||||||||
| "re:.*lm_head", | ||||||||||||||||||||||
| "re:visual.*", | ||||||||||||||||||||||
| "re:model.visual.*", | ||||||||||||||||||||||
| "re:.*mlp.gate$", | ||||||||||||||||||||||
| "re:.*embed_tokens$", | ||||||||||||||||||||||
| "re:.*shared_expert_gate$", | ||||||||||||||||||||||
| "re:.*linear_attn.*", | ||||||||||||||||||||||
| ], | ||||||||||||||||||||||
|
Comment on lines
+22
to
+30
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The ignore=[
"re:.*lm_head",
"re:.*mlp.gate$",
"re:.*embed_tokens$",
"re:.*shared_expert_gate$",
], |
||||||||||||||||||||||
| ) | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| # Apply quantization. | ||||||||||||||||||||||
| oneshot( | ||||||||||||||||||||||
| model=model, | ||||||||||||||||||||||
| recipe=recipe) | ||||||||||||||||||||||
|
Comment on lines
+34
to
+36
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The
Suggested change
Comment on lines
+34
to
+36
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
# Read-only verification of oneshot recipe handling and usage patterns
rg -n -C4 '^def oneshot\(' src/llmcompressor/entrypoints/oneshot.py
rg -n -C4 'recipe:' src/llmcompressor/entrypoints/oneshot.py
rg -n -C4 'isinstance\(recipe|parse.*recipe|load.*recipe|Modifier' src/llmcompressor/entrypoints/oneshot.py
rg -n --type=py -C2 'oneshot\(' examplesRepository: vllm-project/llm-compressor Length of output: 38177 🏁 Script executed: #!/bin/bash
# Check the definition of recipe in qwen3_5_example.py
head -40 examples/quantization_w8a8_fp8/qwen3_5_example.py | tail -20Repository: vllm-project/llm-compressor Length of output: 526 🏁 Script executed: #!/bin/bash
# Search for how recipe parameter is actually used inside oneshot function
rg -A 20 'def oneshot\(' src/llmcompressor/entrypoints/oneshot.py | head -50Repository: vllm-project/llm-compressor Length of output: 850 🏁 Script executed: #!/bin/bash
# Check if recipe is coerced or converted to string, or if objects are accepted
rg -n 'recipe' src/llmcompressor/entrypoints/oneshot.py | grep -E '(isinstance|type|convert|parse|Modifier)' | head -20Repository: vllm-project/llm-compressor Length of output: 241 🏁 Script executed: #!/bin/bash
# Read full qwen3_5_example.py to see recipe definition
cat -n examples/quantization_w8a8_fp8/qwen3_5_example.pyRepository: vllm-project/llm-compressor Length of output: 1796 🏁 Script executed: #!/bin/bash
# Find parse_args function to understand recipe parameter handling
rg -n 'def parse_args' src/llmcompressor/entrypoints/oneshot.pyRepository: vllm-project/llm-compressor Length of output: 53 🏁 Script executed: #!/bin/bash
# Check oneshot.py imports and look for parse_args
head -100 src/llmcompressor/entrypoints/oneshot.pyRepository: vllm-project/llm-compressor Length of output: 4047 🏁 Script executed: #!/bin/bash
# Search for parse_args in entire src directory
rg -n 'def parse_args' src/Repository: vllm-project/llm-compressor Length of output: 277 🏁 Script executed: #!/bin/bash
# Read parse_args implementation
sed -n '21,150p' src/llmcompressor/args/utils.pyRepository: vllm-project/llm-compressor Length of output: 1702 🏁 Script executed: #!/bin/bash
# Check how recipe is handled after parse_args in oneshot function
rg -A 150 'def oneshot\(' src/llmcompressor/entrypoints/oneshot.py | head -200Repository: vllm-project/llm-compressor Length of output: 7783 🏁 Script executed: #!/bin/bash
# Find RecipeArguments class to see how it handles recipe parameter
rg -n 'class RecipeArguments' src/Repository: vllm-project/llm-compressor Length of output: 139 🏁 Script executed: #!/bin/bash
# Read RecipeArguments implementation
find src -name "*recipe*args*" -type fRepository: vllm-project/llm-compressor Length of output: 53 🏁 Script executed: #!/bin/bash
# Read RecipeArguments class
cat -n src/llmcompressor/args/recipe_arguments.pyRepository: vllm-project/llm-compressor Length of output: 1532 🏁 Script executed: #!/bin/bash
# Look for how recipe is actually processed in Oneshot class
rg -A 50 'class Oneshot:' src/llmcompressor/entrypoints/oneshot.py | head -80Repository: vllm-project/llm-compressor Length of output: 2226 🏁 Script executed: #!/bin/bash
# Read the Oneshot.__init__ method to see how recipe is actually handled
rg -A 100 'def __init__' src/llmcompressor/entrypoints/oneshot.py | head -150Repository: vllm-project/llm-compressor Length of output: 4009 🏁 Script executed: #!/bin/bash
# Check how Oneshot uses recipe after initialization
rg -n 'self\.recipe' src/llmcompressor/entrypoints/oneshot.pyRepository: vllm-project/llm-compressor Length of output: 364 🏁 Script executed: #!/bin/bash
# Look at apply_recipe_modifiers to see how recipe is used
rg -A 100 'def apply_recipe_modifiers' src/llmcompressor/entrypoints/oneshot.pyRepository: vllm-project/llm-compressor Length of output: 3644 🏁 Script executed: #!/bin/bash
# Check if there's any special handling for Modifier objects in parse_args or elsewhere
rg -n 'isinstance.*Modifier|Modifier.*recipe' src/llmcompressor/entrypoints/oneshot.pyRepository: vllm-project/llm-compressor Length of output: 53 🏁 Script executed: #!/bin/bash
# Check if HfArgumentParser from transformers can handle non-string objects
# Let's look at how parse_dict is actually used and if there's special handling
rg -B 5 -A 15 'parser.parse_dict' src/llmcompressor/args/utils.pyRepository: vllm-project/llm-compressor Length of output: 855 🏁 Script executed: #!/bin/bash
# Check if there's any special isinstance or type checking for recipe before parse_args
rg -B 10 'parse_args\(' src/llmcompressor/entrypoints/oneshot.py | head -30Repository: vllm-project/llm-compressor Length of output: 457 🏁 Script executed: #!/bin/bash
# Check what HfArgumentParser.parse_dict does with non-matching types
# Look for how recipe value is actually handled when it's not a string
python3 << 'EOF'
from transformers.hf_argparse import HfArgumentParser
from dataclasses import dataclass
from typing import Optional
`@dataclass`
class TestArgs:
recipe: Optional[str] = None
parser = HfArgumentParser(TestArgs)
# Test 1: passing a string (normal case)
try:
result1 = parser.parse_dict({"recipe": "path/to/recipe.yaml"})
print(f"Test 1 (string): SUCCESS - {result1}")
except Exception as e:
print(f"Test 1 (string): FAILED - {type(e).__name__}: {e}")
# Test 2: passing a non-string object
class FakeModifier:
def __init__(self):
self.name = "test_modifier"
try:
result2 = parser.parse_dict({"recipe": FakeModifier()})
print(f"Test 2 (object): SUCCESS - recipe type = {type(result2[0].recipe)}")
print(f" recipe value = {result2[0].recipe}")
except Exception as e:
print(f"Test 2 (object): FAILED - {type(e).__name__}: {e}")
EOFRepository: vllm-project/llm-compressor Length of output: 195 🏁 Script executed: #!/bin/bash
# Check session.initialize to see how it handles recipe parameter
rg -n 'def initialize' src/llmcompressor/core/session_functions.pyRepository: vllm-project/llm-compressor Length of output: 53 🏁 Script executed: #!/bin/bash
# Search for how recipe is actually processed in CompressionSession or related classes
rg -n 'def initialize' src/llmcompressor/core/ | head -20Repository: vllm-project/llm-compressor Length of output: 186 🏁 Script executed: #!/bin/bash
# Check session.initialize signature and how it handles recipe
sed -n '74,150p' src/llmcompressor/core/session.pyRepository: vllm-project/llm-compressor Length of output: 3509 🏁 Script executed: #!/bin/bash
# Check lifecycle.initialize
sed -n '73,150p' src/llmcompressor/core/lifecycle.pyRepository: vllm-project/llm-compressor Length of output: 2616 🏁 Script executed: #!/bin/bash
# Check RecipeInput type definition
rg -n 'RecipeInput' src/llmcompressor/core/lifecycle.pyRepository: vllm-project/llm-compressor Length of output: 207 🏁 Script executed: #!/bin/bash
# Find Recipe class and create_instance method
rg -n 'class Recipe' src/ | head -5Repository: vllm-project/llm-compressor Length of output: 202 🏁 Script executed: #!/bin/bash
# Find RecipeInput definition
rg -n 'RecipeInput' src/llmcompressor/recipe/__init__.pyRepository: vllm-project/llm-compressor Length of output: 170 🏁 Script executed: #!/bin/bash
# Find RecipeInput definition in recipe module
rg -n 'RecipeInput' src/llmcompressor/recipe/ | head -10Repository: vllm-project/llm-compressor Length of output: 424 🏁 Script executed: #!/bin/bash
# Check Recipe.create_instance to confirm it handles Modifier objects
rg -A 50 'def create_instance' src/llmcompressor/recipe/recipe.py | head -80Repository: vllm-project/llm-compressor Length of output: 2198 🏁 Script executed: #!/bin/bash
# Verify QuantizationModifier is a subclass of Modifier
rg -n 'class QuantizationModifier' src/llmcompressor/modifiers/quantization/Repository: vllm-project/llm-compressor Length of output: 192 🏁 Script executed: #!/bin/bash
# Check if there's any special handling of recipe parameter BEFORE it reaches parse_args
# in the oneshot function
sed -n '249,380p' src/llmcompressor/entrypoints/oneshot.py | tail -100Repository: vllm-project/llm-compressor Length of output: 5249 🏁 Script executed: #!/bin/bash
# Check the full oneshot function body after the signature to see if recipe is processed
rg -A 200 'def oneshot\(' src/llmcompressor/entrypoints/oneshot.py | grep -A 50 'model_args, dataset_args'Repository: vllm-project/llm-compressor Length of output: 53 Update type hints or documentation to support The 🤖 Prompt for AI Agents |
||||||||||||||||||||||
|
|
||||||||||||||||||||||
| # Save to disk in compressed-tensors format. | ||||||||||||||||||||||
| SAVE_DIR = MODEL_ID.rstrip("/").split("/")[-1] + "-FP8-Dynamic" | ||||||||||||||||||||||
| model.save_pretrained(SAVE_DIR) | ||||||||||||||||||||||
| processor.save_pretrained(SAVE_DIR) | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| # MTP layers are excluded from the model through Qwen3_5MoeForConditionalGeneration | ||||||||||||||||||||||
| # Save them as-is from the original checkpoint into the quantized output. | ||||||||||||||||||||||
| save_mtp_tensors_to_checkpoint(source_model=MODEL_ID, dest_dir=SAVE_DIR) | ||||||||||||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The imports
torchandload_datasetare not used in this script. Sinceoneshotcan accept a dataset name as a string,load_datasetis unnecessary. These should be removed to keep the example clean.