Skip to content
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 45 additions & 0 deletions examples/quantization_w8a8_fp8/qwen3_5_example.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
import torch
from compressed_tensors.utils import save_mtp_tensors_to_checkpoint
from datasets import load_dataset
Comment on lines +1 to +3
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The imports torch and load_dataset are not used in this script. Since oneshot can accept a dataset name as a string, load_dataset is unnecessary. These should be removed to keep the example clean.

from transformers import AutoProcessor, Qwen3_5MoeForConditionalGeneration

from llmcompressor import oneshot
from llmcompressor.modifiers.quantization import QuantizationModifier

# NOTE: This example requires transformers >= v5

MODEL_ID = "/Qwen/Qwen3.5-122B-A10B"
Comment thread
zhangxin81 marked this conversation as resolved.
Outdated

# Load model.
model = Qwen3_5MoeForConditionalGeneration.from_pretrained(MODEL_ID, dtype="auto")
processor = AutoProcessor.from_pretrained(MODEL_ID)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check the actual file content at the specified lines
cat -n examples/quantization_w8a8_fp8/qwen3_5_example.py | head -50

Repository: vllm-project/llm-compressor

Length of output: 1796


🏁 Script executed:

#!/bin/bash
# Find the oneshot function to verify its signature
find . -type f -name "*.py" -exec grep -l "def oneshot" {} \;

Repository: vllm-project/llm-compressor

Length of output: 174


🏁 Script executed:

#!/bin/bash
# Search for oneshot import and usage patterns
rg "from.*oneshot|import.*oneshot" -A 2 -B 2 | head -50

Repository: vllm-project/llm-compressor

Length of output: 3476


🏁 Script executed:

#!/bin/bash
# Read the oneshot function signature
head -100 src/llmcompressor/entrypoints/oneshot.py | cat -n

Repository: vllm-project/llm-compressor

Length of output: 4747


🏁 Script executed:

#!/bin/bash
# Look for type hints on the recipe parameter
rg "def oneshot" -A 30 src/llmcompressor/entrypoints/oneshot.py

Repository: vllm-project/llm-compressor

Length of output: 1234


🏁 Script executed (no clone):

#!/bin/bash
# Check Hugging Face transformers documentation behavior
# Create a test script to understand path handling
python3 << 'EOF'
# Simulate how HF from_pretrained might handle paths
import os

model_id_with_slash = "/Qwen/Qwen3.5-122B-A10B"
model_id_no_slash = "Qwen/Qwen3.5-122B-A10B"

# Check if leading slash makes it absolute path
print(f"With slash is absolute: {os.path.isabs(model_id_with_slash)}")
print(f"No slash is absolute: {os.path.isabs(model_id_no_slash)}")

# This is how HF would interpret it
print(f"With slash: {os.path.expanduser(model_id_with_slash)}")
print(f"No slash: {os.path.expanduser(model_id_no_slash)}")
EOF

Length of output: 156


🏁 Script executed:

#!/bin/bash
# Check how QuantizationModifier is used in tests with oneshot
rg "oneshot.*recipe=.*QuantizationModifier|QuantizationModifier.*recipe" -B 5 -A 5 | head -80

Repository: vllm-project/llm-compressor

Length of output: 3255


🏁 Script executed:

#!/bin/bash
# Look at actual test examples to see how recipe is passed to oneshot
rg "oneshot\(" -A 5 tests/ | grep -A 5 "recipe=" | head -60

Repository: vllm-project/llm-compressor

Length of output: 3785


🏁 Script executed:

#!/bin/bash
# Check if there's parameter handling that converts QuantizationModifier to something else
rg "QuantizationModifier" examples/ tests/ | head -20

Repository: vllm-project/llm-compressor

Length of output: 2340


🏁 Script executed:

#!/bin/bash
# Check how oneshot handles the recipe parameter internally
rg "def oneshot" -A 100 src/llmcompressor/entrypoints/oneshot.py | head -120

Repository: vllm-project/llm-compressor

Length of output: 4316


🏁 Script executed:

#!/bin/bash
# Look at the parse_args function to see how recipe is processed
rg "def parse_args|recipe" src/llmcompressor/args/ | head -40

Repository: vllm-project/llm-compressor

Length of output: 2087


🏁 Script executed:

#!/antml:bash
# Check the actual implementation of how recipe is used in oneshot
sed -n '200,400p' src/llmcompressor/entrypoints/oneshot.py | cat -n

Repository: vllm-project/llm-compressor

Length of output: 10924


🏁 Script executed:

#!/bin/bash
# Look at the Oneshot.__init__ method to see how recipe is processed
sed -n '150,300p' src/llmcompressor/entrypoints/oneshot.py | cat -n

Repository: vllm-project/llm-compressor

Length of output: 6353


🏁 Script executed:

#!/bin/bash
# Search for how recipe parameter is converted/processed
rg "self.recipe" src/llmcompressor/entrypoints/oneshot.py -B 2 -A 2

Repository: vllm-project/llm-compressor

Length of output: 1029


🏁 Script executed:

#!/bin/bash
# Look at parse_args to see how it handles recipe parameter
rg "def parse_args" -A 50 src/llmcompressor/args/utils.py | head -80

Repository: vllm-project/llm-compressor

Length of output: 1702


🏁 Script executed:

#!/antml:bash
# Check RecipeArguments to see what recipe field accepts
cat src/llmcompressor/args/recipe_arguments.py | head -100

Repository: vllm-project/llm-compressor

Length of output: 1231


🏁 Script executed:

#!/bin/bash
# Search for how modifiers are handled when passed as recipe
rg "isinstance.*recipe.*Modifier|Modifier.*recipe" src/ -A 3 -B 3

Repository: vllm-project/llm-compressor

Length of output: 53


🏁 Script executed:

#!/bin/bash
# Check if there's special handling in parse_args or HfArgumentParser usage
rg "recipe.*isinstance|type.*recipe" src/llmcompressor/args/ -A 3 -B 3

Repository: vllm-project/llm-compressor

Length of output: 53


🏁 Script executed:

#!/an/bash
# Look for the actual test that passes QuantizationModifier to oneshot
cat -n tests/e2e/e2e_utils.py | grep -A 10 -B 10 "recipe.*QuantizationModifier"

Repository: vllm-project/llm-compressor

Length of output: 993


🏁 Script executed:

#!/bin/bash
# Check HfArgumentParser to see if it has special handling
rg "HfArgumentParser|parse_dict" src/llmcompressor/args/utils.py -B 3 -A 10

Repository: vllm-project/llm-compressor

Length of output: 933


🏁 Script executed:

#!/bin/bash
# Look for how recipe kwarg is handled when it's a Modifier object
rg "if.*kwargs|isinstance.*kwargs.*recipe" src/llmcompressor/entrypoints/oneshot.py -B 3 -A 5

Repository: vllm-project/llm-compressor

Length of output: 682


🏁 Script executed:

#!/bin/bash
# Check if parse_dict from transformers HfArgumentParser allows non-string values
python3 << 'EOF'
# Check what HfArgumentParser.parse_dict does with non-string values
from transformers import HfArgumentParser
from dataclasses import dataclass, field
from typing import Optional

`@dataclass`
class TestArgs:
    recipe: Optional[str] = field(default=None)

parser = HfArgumentParser(TestArgs)

# Test 1: Passing a string
result1 = parser.parse_dict({"recipe": "path/to/recipe.yaml"})
print(f"Test 1 (string): {result1}")

# Test 2: Passing a non-string object
class DummyModifier:
    pass

result2 = parser.parse_dict({"recipe": DummyModifier()})
print(f"Test 2 (object): {result2}")
print(f"Test 2 recipe type: {type(result2[0].recipe)}")
EOF

Repository: vllm-project/llm-compressor

Length of output: 195


🏁 Script executed:

#!/bin/bash
# Verify the exact model ID and leading slash behavior in HF
python3 << 'EOF'
import os

# The claim is that leading slash breaks from_pretrained
model_id_slash = "/Qwen/Qwen3.5-122B-A10B"
model_id_no_slash = "Qwen/Qwen3.5-122B-A10B"

print(f"Path with slash: {model_id_slash}")
print(f"Is absolute: {os.path.isabs(model_id_slash)}")
print()
print(f"Path without slash: {model_id_no_slash}")
print(f"Is absolute: {os.path.isabs(model_id_no_slash)}")
print()
print("HF transformers treats leading slash as absolute file path,")
print("which would make from_pretrained look for /Qwen/Qwen3.5-122B-A10B")
print("as a local directory instead of a Hub model ID.")
EOF

Repository: vllm-project/llm-compressor

Length of output: 367


🏁 Script executed:

#!/bin/bash
# Check if there are any other examples with similar patterns
rg "MODEL_ID.*=.*/" examples/ | head -20

Repository: vllm-project/llm-compressor

Length of output: 2004


🏁 Script executed:

#!/antml:bash
# Double check the exact code in the file one more time
sed -n '10,16p' examples/quantization_w8a8_fp8/qwen3_5_example.py

Repository: vllm-project/llm-compressor

Length of output: 259


🏁 Script executed:

#!/bash
# Final confirmation: check if there are integration tests that pass modifiers to oneshot successfully
rg "oneshot.*QuantizationModifier|oneshot.*GPTQModifier" tests/ -l

Repository: vllm-project/llm-compressor

Length of output: 93


Remove the leading slash from the model ID so the example can load from Hugging Face Hub.

At Line 11, MODEL_ID starts with a /, making it an absolute file path. Hugging Face from_pretrained() will attempt to load from the local filesystem at /Qwen/Qwen3.5-122B-A10B and fail. Use "Qwen/Qwen3.5-122B-A10B" instead to load from Hub.

Suggested fix
-MODEL_ID = "/Qwen/Qwen3.5-122B-A10B"
+MODEL_ID = "Qwen/Qwen3.5-122B-A10B"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
MODEL_ID = "/Qwen/Qwen3.5-122B-A10B"
# Load model.
model = Qwen3_5MoeForConditionalGeneration.from_pretrained(MODEL_ID, dtype="auto")
processor = AutoProcessor.from_pretrained(MODEL_ID)
MODEL_ID = "Qwen/Qwen3.5-122B-A10B"
# Load model.
model = Qwen3_5MoeForConditionalGeneration.from_pretrained(MODEL_ID, dtype="auto")
processor = AutoProcessor.from_pretrained(MODEL_ID)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/quantization_w8a8_fp8/qwen3_5_example.py` around lines 11 - 15,
MODEL_ID currently has a leading slash which makes
Qwen3_5MoeForConditionalGeneration.from_pretrained and
AutoProcessor.from_pretrained try to load from the local filesystem; change the
MODEL_ID string by removing the leading "/" so it references the Hugging Face
Hub package name (update the MODEL_ID constant used by from_pretrained calls).


# No need to include mtp layers as they are not loaded
# through Qwen3_5MoeForConditionalGeneration
recipe = QuantizationModifier(
targets="Linear",
scheme="FP8_DYNAMIC",
ignore=[
"re:.*lm_head",
"re:visual.*",
"re:model.visual.*",
"re:.*mlp.gate$",
"re:.*embed_tokens$",
"re:.*shared_expert_gate$",
"re:.*linear_attn.*",
],
Comment on lines +22 to +30
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The ignore list includes patterns for visual modules (re:visual.*, re:model.visual.*) and linear_attn, which appear to be redundant for the Qwen3_5Moe text model architecture. Removing these makes the recipe more concise and specific to the model.

    ignore=[
        "re:.*lm_head",
        "re:.*mlp.gate$",
        "re:.*embed_tokens$",
        "re:.*shared_expert_gate$",
    ],

)

# Apply quantization.
oneshot(
model=model,
recipe=recipe)
Comment on lines +34 to +36
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The oneshot call is missing a dataset and processor. Providing a calibration dataset is standard for the oneshot workflow to ensure proper initialization of the quantization modifiers and to provide a complete example. Additionally, passing the processor explicitly is recommended since it has already been loaded.

Suggested change
oneshot(
model=model,
recipe=recipe)
oneshot(
model=model,
dataset="open_platypus",
recipe=recipe,
processor=processor,
)

Comment on lines +34 to +36
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Read-only verification of oneshot recipe handling and usage patterns
rg -n -C4 '^def oneshot\(' src/llmcompressor/entrypoints/oneshot.py
rg -n -C4 'recipe:' src/llmcompressor/entrypoints/oneshot.py
rg -n -C4 'isinstance\(recipe|parse.*recipe|load.*recipe|Modifier' src/llmcompressor/entrypoints/oneshot.py
rg -n --type=py -C2 'oneshot\(' examples

Repository: vllm-project/llm-compressor

Length of output: 38177


🏁 Script executed:

#!/bin/bash
# Check the definition of recipe in qwen3_5_example.py
head -40 examples/quantization_w8a8_fp8/qwen3_5_example.py | tail -20

Repository: vllm-project/llm-compressor

Length of output: 526


🏁 Script executed:

#!/bin/bash
# Search for how recipe parameter is actually used inside oneshot function
rg -A 20 'def oneshot\(' src/llmcompressor/entrypoints/oneshot.py | head -50

Repository: vllm-project/llm-compressor

Length of output: 850


🏁 Script executed:

#!/bin/bash
# Check if recipe is coerced or converted to string, or if objects are accepted
rg -n 'recipe' src/llmcompressor/entrypoints/oneshot.py | grep -E '(isinstance|type|convert|parse|Modifier)' | head -20

Repository: vllm-project/llm-compressor

Length of output: 241


🏁 Script executed:

#!/bin/bash
# Read full qwen3_5_example.py to see recipe definition
cat -n examples/quantization_w8a8_fp8/qwen3_5_example.py

Repository: vllm-project/llm-compressor

Length of output: 1796


🏁 Script executed:

#!/bin/bash
# Find parse_args function to understand recipe parameter handling
rg -n 'def parse_args' src/llmcompressor/entrypoints/oneshot.py

Repository: vllm-project/llm-compressor

Length of output: 53


🏁 Script executed:

#!/bin/bash
# Check oneshot.py imports and look for parse_args
head -100 src/llmcompressor/entrypoints/oneshot.py

Repository: vllm-project/llm-compressor

Length of output: 4047


🏁 Script executed:

#!/bin/bash
# Search for parse_args in entire src directory
rg -n 'def parse_args' src/

Repository: vllm-project/llm-compressor

Length of output: 277


🏁 Script executed:

#!/bin/bash
# Read parse_args implementation
sed -n '21,150p' src/llmcompressor/args/utils.py

Repository: vllm-project/llm-compressor

Length of output: 1702


🏁 Script executed:

#!/bin/bash
# Check how recipe is handled after parse_args in oneshot function
rg -A 150 'def oneshot\(' src/llmcompressor/entrypoints/oneshot.py | head -200

Repository: vllm-project/llm-compressor

Length of output: 7783


🏁 Script executed:

#!/bin/bash
# Find RecipeArguments class to see how it handles recipe parameter
rg -n 'class RecipeArguments' src/

Repository: vllm-project/llm-compressor

Length of output: 139


🏁 Script executed:

#!/bin/bash
# Read RecipeArguments implementation
find src -name "*recipe*args*" -type f

Repository: vllm-project/llm-compressor

Length of output: 53


🏁 Script executed:

#!/bin/bash
# Read RecipeArguments class
cat -n src/llmcompressor/args/recipe_arguments.py

Repository: vllm-project/llm-compressor

Length of output: 1532


🏁 Script executed:

#!/bin/bash
# Look for how recipe is actually processed in Oneshot class
rg -A 50 'class Oneshot:' src/llmcompressor/entrypoints/oneshot.py | head -80

Repository: vllm-project/llm-compressor

Length of output: 2226


🏁 Script executed:

#!/bin/bash
# Read the Oneshot.__init__ method to see how recipe is actually handled
rg -A 100 'def __init__' src/llmcompressor/entrypoints/oneshot.py | head -150

Repository: vllm-project/llm-compressor

Length of output: 4009


🏁 Script executed:

#!/bin/bash
# Check how Oneshot uses recipe after initialization
rg -n 'self\.recipe' src/llmcompressor/entrypoints/oneshot.py

Repository: vllm-project/llm-compressor

Length of output: 364


🏁 Script executed:

#!/bin/bash
# Look at apply_recipe_modifiers to see how recipe is used
rg -A 100 'def apply_recipe_modifiers' src/llmcompressor/entrypoints/oneshot.py

Repository: vllm-project/llm-compressor

Length of output: 3644


🏁 Script executed:

#!/bin/bash
# Check if there's any special handling for Modifier objects in parse_args or elsewhere
rg -n 'isinstance.*Modifier|Modifier.*recipe' src/llmcompressor/entrypoints/oneshot.py

Repository: vllm-project/llm-compressor

Length of output: 53


🏁 Script executed:

#!/bin/bash
# Check if HfArgumentParser from transformers can handle non-string objects
# Let's look at how parse_dict is actually used and if there's special handling
rg -B 5 -A 15 'parser.parse_dict' src/llmcompressor/args/utils.py

Repository: vllm-project/llm-compressor

Length of output: 855


🏁 Script executed:

#!/bin/bash
# Check if there's any special isinstance or type checking for recipe before parse_args
rg -B 10 'parse_args\(' src/llmcompressor/entrypoints/oneshot.py | head -30

Repository: vllm-project/llm-compressor

Length of output: 457


🏁 Script executed:

#!/bin/bash
# Check what HfArgumentParser.parse_dict does with non-matching types
# Look for how recipe value is actually handled when it's not a string
python3 << 'EOF'
from transformers.hf_argparse import HfArgumentParser
from dataclasses import dataclass
from typing import Optional

`@dataclass`
class TestArgs:
    recipe: Optional[str] = None

parser = HfArgumentParser(TestArgs)

# Test 1: passing a string (normal case)
try:
    result1 = parser.parse_dict({"recipe": "path/to/recipe.yaml"})
    print(f"Test 1 (string): SUCCESS - {result1}")
except Exception as e:
    print(f"Test 1 (string): FAILED - {type(e).__name__}: {e}")

# Test 2: passing a non-string object
class FakeModifier:
    def __init__(self):
        self.name = "test_modifier"

try:
    result2 = parser.parse_dict({"recipe": FakeModifier()})
    print(f"Test 2 (object): SUCCESS - recipe type = {type(result2[0].recipe)}")
    print(f"  recipe value = {result2[0].recipe}")
except Exception as e:
    print(f"Test 2 (object): FAILED - {type(e).__name__}: {e}")
EOF

Repository: vllm-project/llm-compressor

Length of output: 195


🏁 Script executed:

#!/bin/bash
# Check session.initialize to see how it handles recipe parameter
rg -n 'def initialize' src/llmcompressor/core/session_functions.py

Repository: vllm-project/llm-compressor

Length of output: 53


🏁 Script executed:

#!/bin/bash
# Search for how recipe is actually processed in CompressionSession or related classes
rg -n 'def initialize' src/llmcompressor/core/ | head -20

Repository: vllm-project/llm-compressor

Length of output: 186


🏁 Script executed:

#!/bin/bash
# Check session.initialize signature and how it handles recipe
sed -n '74,150p' src/llmcompressor/core/session.py

Repository: vllm-project/llm-compressor

Length of output: 3509


🏁 Script executed:

#!/bin/bash
# Check lifecycle.initialize
sed -n '73,150p' src/llmcompressor/core/lifecycle.py

Repository: vllm-project/llm-compressor

Length of output: 2616


🏁 Script executed:

#!/bin/bash
# Check RecipeInput type definition
rg -n 'RecipeInput' src/llmcompressor/core/lifecycle.py

Repository: vllm-project/llm-compressor

Length of output: 207


🏁 Script executed:

#!/bin/bash
# Find Recipe class and create_instance method
rg -n 'class Recipe' src/ | head -5

Repository: vllm-project/llm-compressor

Length of output: 202


🏁 Script executed:

#!/bin/bash
# Find RecipeInput definition
rg -n 'RecipeInput' src/llmcompressor/recipe/__init__.py

Repository: vllm-project/llm-compressor

Length of output: 170


🏁 Script executed:

#!/bin/bash
# Find RecipeInput definition in recipe module
rg -n 'RecipeInput' src/llmcompressor/recipe/ | head -10

Repository: vllm-project/llm-compressor

Length of output: 424


🏁 Script executed:

#!/bin/bash
# Check Recipe.create_instance to confirm it handles Modifier objects
rg -A 50 'def create_instance' src/llmcompressor/recipe/recipe.py | head -80

Repository: vllm-project/llm-compressor

Length of output: 2198


🏁 Script executed:

#!/bin/bash
# Verify QuantizationModifier is a subclass of Modifier
rg -n 'class QuantizationModifier' src/llmcompressor/modifiers/quantization/

Repository: vllm-project/llm-compressor

Length of output: 192


🏁 Script executed:

#!/bin/bash
# Check if there's any special handling of recipe parameter BEFORE it reaches parse_args
# in the oneshot function
sed -n '249,380p' src/llmcompressor/entrypoints/oneshot.py | tail -100

Repository: vllm-project/llm-compressor

Length of output: 5249


🏁 Script executed:

#!/bin/bash
# Check the full oneshot function body after the signature to see if recipe is processed
rg -A 200 'def oneshot\(' src/llmcompressor/entrypoints/oneshot.py | grep -A 50 'model_args, dataset_args'

Repository: vllm-project/llm-compressor

Length of output: 53


Update type hints or documentation to support QuantizationModifier objects in recipe parameter.

The qwen3_5_example.py passes a QuantizationModifier object as the recipe argument (line 19, 36), but oneshot() at line 262 declares recipe: str | list[str] | None = None, and RecipeArguments declares recipe: str | None. While Recipe.create_instance() does accept Modifier objects via its RecipeInput union type (which includes Modifier | List[Modifier]), the type hints along the call chain—particularly in oneshot() and RecipeArguments—do not reflect this capability. This creates a type safety inconsistency; the example works because Recipe.create_instance() internally handles Modifier objects, but the type annotations should either be updated to include Modifier | List[Modifier], or the docstring should explicitly document that Modifier instances are accepted.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/quantization_w8a8_fp8/qwen3_5_example.py` around lines 34 - 36, The
example passes a QuantizationModifier to oneshot(), but oneshot() and
RecipeArguments only type recipe as str | list[str] | None (and
RecipeArguments.recipe: str | None), causing a type mismatch; update the type
hints to accept Modifier | list[Modifier] (or QuantizationModifier |
list[QuantizationModifier]) so the signatures align with
Recipe.create_instance() which accepts RecipeInput (Modifier | list[Modifier] |
str | list[str]), and update any related docstrings to mention Modifier
instances are supported; specifically modify the oneshot() parameter annotation,
RecipeArguments.recipe annotation, and any docs referencing recipe to include
Modifier (or use the Modifier union) so QuantizationModifier passes type
checking.


# Save to disk in compressed-tensors format.
SAVE_DIR = MODEL_ID.rstrip("/").split("/")[-1] + "-FP8-Dynamic"
model.save_pretrained(SAVE_DIR)
processor.save_pretrained(SAVE_DIR)

# MTP layers are excluded from the model through Qwen3_5MoeForConditionalGeneration
# Save them as-is from the original checkpoint into the quantized output.
save_mtp_tensors_to_checkpoint(source_model=MODEL_ID, dest_dir=SAVE_DIR)
Loading