add example of w8a8fp8 for qwen3.5 by zhangxin81 · Pull Request #2631 · vllm-project/llm-compressor

zhangxin81 · 2026-04-20T08:42:27Z

add example of w8a8fp8 for qwen3.5

Signed-off-by: zhangxin81 <115389973+zhangxin81@users.noreply.github.com>

github-actions · 2026-04-20T08:42:37Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

coderabbitai · 2026-04-20T08:42:43Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 880be3d4-6bb4-4977-ad18-b2d7f2d22c92

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

Walkthrough

Added a new example script demonstrating FP8 quantization for the Qwen3.5 MoE model using llmcompressor.oneshot. The script loads the model, applies FP8 dynamic quantization to Linear modules, saves the quantized output, and preserves MTP tensors from the original checkpoint.

Changes

Cohort / File(s)	Summary
FP8 Quantization Example `examples/quantization_w8a8_fp8/qwen3_5_example.py`	New example script implementing FP8 quantization workflow: model loading, QuantizationModifier configuration with FP8_DYNAMIC scheme, oneshot quantization application, model/processor saving, and MTP tensor preservation from original checkpoint.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

qwen, fp8, enhancement

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically summarizes the main change: adding an example for w8a8fp8 quantization for the Qwen3.5 model.
Description check	✅ Passed	The description is related to the changeset as it references the same example addition for w8a8fp8 quantization for qwen3.5, though it lacks detail.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces an example script for quantizing the Qwen 3.5 MoE model to FP8 using the llmcompressor library. The review feedback suggests several improvements to make the example more robust and cleaner: removing unused imports (torch and load_dataset), updating the MODEL_ID to a portable Hugging Face identifier instead of an absolute local path, and simplifying the quantization recipe by removing redundant ignore patterns for visual modules. Additionally, it is recommended to include a calibration dataset and the processor in the oneshot call to provide a complete and functional workflow.

gemini-code-assist · 2026-04-20T08:43:58Z

+import torch
+from compressed_tensors.utils import save_mtp_tensors_to_checkpoint
+from datasets import load_dataset


The imports torch and load_dataset are not used in this script. Since oneshot can accept a dataset name as a string, load_dataset is unnecessary. These should be removed to keep the example clean.

gemini-code-assist · 2026-04-20T08:43:59Z

+    ignore=[
+        "re:.*lm_head",
+        "re:visual.*",
+        "re:model.visual.*",
+        "re:.*mlp.gate$",
+        "re:.*embed_tokens$",
+        "re:.*shared_expert_gate$",
+        "re:.*linear_attn.*",
+    ],


The ignore list includes patterns for visual modules (re:visual.*, re:model.visual.*) and linear_attn, which appear to be redundant for the Qwen3_5Moe text model architecture. Removing these makes the recipe more concise and specific to the model.

ignore=[ "re:.*lm_head", "re:.*mlp.gate$", "re:.*embed_tokens$", "re:.*shared_expert_gate$", ],

gemini-code-assist · 2026-04-20T08:43:59Z

+oneshot(
+    model=model,
+    recipe=recipe)


The oneshot call is missing a dataset and processor. Providing a calibration dataset is standard for the oneshot workflow to ensure proper initialization of the quantization modifiers and to provide a complete example. Additionally, passing the processor explicitly is recommended since it has already been loaded.

Suggested change

oneshot(

model=model,

recipe=recipe)

oneshot(

model=model,

dataset="open_platypus",

recipe=recipe,

processor=processor,

)

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/quantization_w8a8_fp8/qwen3_5_example.py`:
- Around line 34-36: The example passes a QuantizationModifier to oneshot(), but
oneshot() and RecipeArguments only type recipe as str | list[str] | None (and
RecipeArguments.recipe: str | None), causing a type mismatch; update the type
hints to accept Modifier | list[Modifier] (or QuantizationModifier |
list[QuantizationModifier]) so the signatures align with
Recipe.create_instance() which accepts RecipeInput (Modifier | list[Modifier] |
str | list[str]), and update any related docstrings to mention Modifier
instances are supported; specifically modify the oneshot() parameter annotation,
RecipeArguments.recipe annotation, and any docs referencing recipe to include
Modifier (or use the Modifier union) so QuantizationModifier passes type
checking.
- Around line 11-15: MODEL_ID currently has a leading slash which makes
Qwen3_5MoeForConditionalGeneration.from_pretrained and
AutoProcessor.from_pretrained try to load from the local filesystem; change the
MODEL_ID string by removing the leading "/" so it references the Hugging Face
Hub package name (update the MODEL_ID constant used by from_pretrained calls).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 886ef430-4134-47e9-b7d7-0aee0b2e381a

📥 Commits

Reviewing files that changed from the base of the PR and between fe706bb and 16802c1.

📒 Files selected for processing (1)

examples/quantization_w8a8_fp8/qwen3_5_example.py

coderabbitai · 2026-04-20T08:48:53Z

+MODEL_ID = "/Qwen/Qwen3.5-122B-A10B"
+
+# Load model.
+model = Qwen3_5MoeForConditionalGeneration.from_pretrained(MODEL_ID, dtype="auto")
+processor = AutoProcessor.from_pretrained(MODEL_ID)


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check the actual file content at the specified lines cat -n examples/quantization_w8a8_fp8/qwen3_5_example.py | head -50

Repository: vllm-project/llm-compressor

Length of output: 1796

🏁 Script executed:

#!/bin/bash # Find the oneshot function to verify its signature find . -type f -name "*.py" -exec grep -l "def oneshot" {} \;

Repository: vllm-project/llm-compressor

Length of output: 174

🏁 Script executed:

#!/bin/bash # Search for oneshot import and usage patterns rg "from.*oneshot|import.*oneshot" -A 2 -B 2 | head -50

Repository: vllm-project/llm-compressor

Length of output: 3476

🏁 Script executed:

#!/bin/bash # Read the oneshot function signature head -100 src/llmcompressor/entrypoints/oneshot.py | cat -n

Repository: vllm-project/llm-compressor

Length of output: 4747

🏁 Script executed:

#!/bin/bash # Look for type hints on the recipe parameter rg "def oneshot" -A 30 src/llmcompressor/entrypoints/oneshot.py

Repository: vllm-project/llm-compressor

Length of output: 1234

🏁 Script executed (no clone):

#!/bin/bash # Check Hugging Face transformers documentation behavior # Create a test script to understand path handling python3 << 'EOF' # Simulate how HF from_pretrained might handle paths import os model_id_with_slash = "/Qwen/Qwen3.5-122B-A10B" model_id_no_slash = "Qwen/Qwen3.5-122B-A10B" # Check if leading slash makes it absolute path print(f"With slash is absolute: {os.path.isabs(model_id_with_slash)}") print(f"No slash is absolute: {os.path.isabs(model_id_no_slash)}") # This is how HF would interpret it print(f"With slash: {os.path.expanduser(model_id_with_slash)}") print(f"No slash: {os.path.expanduser(model_id_no_slash)}") EOF

Length of output: 156

🏁 Script executed:

#!/bin/bash # Check how QuantizationModifier is used in tests with oneshot rg "oneshot.*recipe=.*QuantizationModifier|QuantizationModifier.*recipe" -B 5 -A 5 | head -80

Repository: vllm-project/llm-compressor

Length of output: 3255

🏁 Script executed:

#!/bin/bash # Look at actual test examples to see how recipe is passed to oneshot rg "oneshot\(" -A 5 tests/ | grep -A 5 "recipe=" | head -60

Repository: vllm-project/llm-compressor

Length of output: 3785

🏁 Script executed:

#!/bin/bash # Check if there's parameter handling that converts QuantizationModifier to something else rg "QuantizationModifier" examples/ tests/ | head -20

Repository: vllm-project/llm-compressor

Length of output: 2340

🏁 Script executed:

#!/bin/bash # Check how oneshot handles the recipe parameter internally rg "def oneshot" -A 100 src/llmcompressor/entrypoints/oneshot.py | head -120

Repository: vllm-project/llm-compressor

Length of output: 4316

🏁 Script executed:

#!/bin/bash # Look at the parse_args function to see how recipe is processed rg "def parse_args|recipe" src/llmcompressor/args/ | head -40

Repository: vllm-project/llm-compressor

Length of output: 2087

🏁 Script executed:

#!/antml:bash # Check the actual implementation of how recipe is used in oneshot sed -n '200,400p' src/llmcompressor/entrypoints/oneshot.py | cat -n

Repository: vllm-project/llm-compressor

Length of output: 10924

🏁 Script executed:

#!/bin/bash # Look at the Oneshot.__init__ method to see how recipe is processed sed -n '150,300p' src/llmcompressor/entrypoints/oneshot.py | cat -n

Repository: vllm-project/llm-compressor

Length of output: 6353

🏁 Script executed:

#!/bin/bash # Search for how recipe parameter is converted/processed rg "self.recipe" src/llmcompressor/entrypoints/oneshot.py -B 2 -A 2

Repository: vllm-project/llm-compressor

Length of output: 1029

🏁 Script executed:

#!/bin/bash # Look at parse_args to see how it handles recipe parameter rg "def parse_args" -A 50 src/llmcompressor/args/utils.py | head -80

Repository: vllm-project/llm-compressor

Length of output: 1702

🏁 Script executed:

#!/antml:bash # Check RecipeArguments to see what recipe field accepts cat src/llmcompressor/args/recipe_arguments.py | head -100

Repository: vllm-project/llm-compressor

Length of output: 1231

🏁 Script executed:

#!/bin/bash # Search for how modifiers are handled when passed as recipe rg "isinstance.*recipe.*Modifier|Modifier.*recipe" src/ -A 3 -B 3

Repository: vllm-project/llm-compressor

Length of output: 53

🏁 Script executed:

#!/bin/bash # Check if there's special handling in parse_args or HfArgumentParser usage rg "recipe.*isinstance|type.*recipe" src/llmcompressor/args/ -A 3 -B 3

Repository: vllm-project/llm-compressor

Length of output: 53

🏁 Script executed:

#!/an/bash # Look for the actual test that passes QuantizationModifier to oneshot cat -n tests/e2e/e2e_utils.py | grep -A 10 -B 10 "recipe.*QuantizationModifier"

Repository: vllm-project/llm-compressor

Length of output: 993

🏁 Script executed:

#!/bin/bash # Check HfArgumentParser to see if it has special handling rg "HfArgumentParser|parse_dict" src/llmcompressor/args/utils.py -B 3 -A 10

Repository: vllm-project/llm-compressor

Length of output: 933

🏁 Script executed:

#!/bin/bash # Look for how recipe kwarg is handled when it's a Modifier object rg "if.*kwargs|isinstance.*kwargs.*recipe" src/llmcompressor/entrypoints/oneshot.py -B 3 -A 5

Repository: vllm-project/llm-compressor

Length of output: 682

🏁 Script executed:

#!/bin/bash # Check if parse_dict from transformers HfArgumentParser allows non-string values python3 << 'EOF' # Check what HfArgumentParser.parse_dict does with non-string values from transformers import HfArgumentParser from dataclasses import dataclass, field from typing import Optional `@dataclass` class TestArgs: recipe: Optional[str] = field(default=None) parser = HfArgumentParser(TestArgs) # Test 1: Passing a string result1 = parser.parse_dict({"recipe": "path/to/recipe.yaml"}) print(f"Test 1 (string): {result1}") # Test 2: Passing a non-string object class DummyModifier: pass result2 = parser.parse_dict({"recipe": DummyModifier()}) print(f"Test 2 (object): {result2}") print(f"Test 2 recipe type: {type(result2[0].recipe)}") EOF

Repository: vllm-project/llm-compressor

Length of output: 195

🏁 Script executed:

#!/bin/bash # Verify the exact model ID and leading slash behavior in HF python3 << 'EOF' import os # The claim is that leading slash breaks from_pretrained model_id_slash = "/Qwen/Qwen3.5-122B-A10B" model_id_no_slash = "Qwen/Qwen3.5-122B-A10B" print(f"Path with slash: {model_id_slash}") print(f"Is absolute: {os.path.isabs(model_id_slash)}") print() print(f"Path without slash: {model_id_no_slash}") print(f"Is absolute: {os.path.isabs(model_id_no_slash)}") print() print("HF transformers treats leading slash as absolute file path,") print("which would make from_pretrained look for /Qwen/Qwen3.5-122B-A10B") print("as a local directory instead of a Hub model ID.") EOF

Repository: vllm-project/llm-compressor

Length of output: 367

🏁 Script executed:

#!/bin/bash # Check if there are any other examples with similar patterns rg "MODEL_ID.*=.*/" examples/ | head -20

Repository: vllm-project/llm-compressor

Length of output: 2004

🏁 Script executed:

#!/antml:bash # Double check the exact code in the file one more time sed -n '10,16p' examples/quantization_w8a8_fp8/qwen3_5_example.py

Repository: vllm-project/llm-compressor

Length of output: 259

🏁 Script executed:

#!/bash # Final confirmation: check if there are integration tests that pass modifiers to oneshot successfully rg "oneshot.*QuantizationModifier|oneshot.*GPTQModifier" tests/ -l

Repository: vllm-project/llm-compressor

Length of output: 93

Remove the leading slash from the model ID so the example can load from Hugging Face Hub.

At Line 11, MODEL_ID starts with a /, making it an absolute file path. Hugging Face from_pretrained() will attempt to load from the local filesystem at /Qwen/Qwen3.5-122B-A10B and fail. Use "Qwen/Qwen3.5-122B-A10B" instead to load from Hub.

Suggested fix

-MODEL_ID = "/Qwen/Qwen3.5-122B-A10B" +MODEL_ID = "Qwen/Qwen3.5-122B-A10B"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

MODEL_ID = "/Qwen/Qwen3.5-122B-A10B"

# Load model.

model = Qwen3_5MoeForConditionalGeneration.from_pretrained(MODEL_ID, dtype="auto")

processor = AutoProcessor.from_pretrained(MODEL_ID)

MODEL_ID = "Qwen/Qwen3.5-122B-A10B"

# Load model.

model = Qwen3_5MoeForConditionalGeneration.from_pretrained(MODEL_ID, dtype="auto")

processor = AutoProcessor.from_pretrained(MODEL_ID)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/quantization_w8a8_fp8/qwen3_5_example.py` around lines 11 - 15, MODEL_ID currently has a leading slash which makes Qwen3_5MoeForConditionalGeneration.from_pretrained and AutoProcessor.from_pretrained try to load from the local filesystem; change the MODEL_ID string by removing the leading "/" so it references the Hugging Face Hub package name (update the MODEL_ID constant used by from_pretrained calls).

coderabbitai · 2026-04-20T08:48:54Z

+oneshot(
+    model=model,
+    recipe=recipe)


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Read-only verification of oneshot recipe handling and usage patterns rg -n -C4 '^def oneshot\(' src/llmcompressor/entrypoints/oneshot.py rg -n -C4 'recipe:' src/llmcompressor/entrypoints/oneshot.py rg -n -C4 'isinstance\(recipe|parse.*recipe|load.*recipe|Modifier' src/llmcompressor/entrypoints/oneshot.py rg -n --type=py -C2 'oneshot\(' examples

Repository: vllm-project/llm-compressor

Length of output: 38177

🏁 Script executed:

#!/bin/bash # Check the definition of recipe in qwen3_5_example.py head -40 examples/quantization_w8a8_fp8/qwen3_5_example.py | tail -20

Repository: vllm-project/llm-compressor

Length of output: 526

🏁 Script executed:

#!/bin/bash # Search for how recipe parameter is actually used inside oneshot function rg -A 20 'def oneshot\(' src/llmcompressor/entrypoints/oneshot.py | head -50

Repository: vllm-project/llm-compressor

Length of output: 850

🏁 Script executed:

#!/bin/bash # Check if recipe is coerced or converted to string, or if objects are accepted rg -n 'recipe' src/llmcompressor/entrypoints/oneshot.py | grep -E '(isinstance|type|convert|parse|Modifier)' | head -20

Repository: vllm-project/llm-compressor

Length of output: 241

🏁 Script executed:

#!/bin/bash # Read full qwen3_5_example.py to see recipe definition cat -n examples/quantization_w8a8_fp8/qwen3_5_example.py

Repository: vllm-project/llm-compressor

Length of output: 1796

🏁 Script executed:

#!/bin/bash # Find parse_args function to understand recipe parameter handling rg -n 'def parse_args' src/llmcompressor/entrypoints/oneshot.py

Repository: vllm-project/llm-compressor

Length of output: 53

🏁 Script executed:

#!/bin/bash # Check oneshot.py imports and look for parse_args head -100 src/llmcompressor/entrypoints/oneshot.py

Repository: vllm-project/llm-compressor

Length of output: 4047

🏁 Script executed:

#!/bin/bash # Search for parse_args in entire src directory rg -n 'def parse_args' src/

Repository: vllm-project/llm-compressor

Length of output: 277

🏁 Script executed:

#!/bin/bash # Read parse_args implementation sed -n '21,150p' src/llmcompressor/args/utils.py

Repository: vllm-project/llm-compressor

Length of output: 1702

🏁 Script executed:

#!/bin/bash # Check how recipe is handled after parse_args in oneshot function rg -A 150 'def oneshot\(' src/llmcompressor/entrypoints/oneshot.py | head -200

Repository: vllm-project/llm-compressor

Length of output: 7783

🏁 Script executed:

#!/bin/bash # Find RecipeArguments class to see how it handles recipe parameter rg -n 'class RecipeArguments' src/

Repository: vllm-project/llm-compressor

Length of output: 139

🏁 Script executed:

#!/bin/bash # Read RecipeArguments implementation find src -name "*recipe*args*" -type f

Repository: vllm-project/llm-compressor

Length of output: 53

🏁 Script executed:

#!/bin/bash # Read RecipeArguments class cat -n src/llmcompressor/args/recipe_arguments.py

Repository: vllm-project/llm-compressor

Length of output: 1532

🏁 Script executed:

#!/bin/bash # Look for how recipe is actually processed in Oneshot class rg -A 50 'class Oneshot:' src/llmcompressor/entrypoints/oneshot.py | head -80

Repository: vllm-project/llm-compressor

Length of output: 2226

🏁 Script executed:

#!/bin/bash # Read the Oneshot.__init__ method to see how recipe is actually handled rg -A 100 'def __init__' src/llmcompressor/entrypoints/oneshot.py | head -150

Repository: vllm-project/llm-compressor

Length of output: 4009

🏁 Script executed:

#!/bin/bash # Check how Oneshot uses recipe after initialization rg -n 'self\.recipe' src/llmcompressor/entrypoints/oneshot.py

Repository: vllm-project/llm-compressor

Length of output: 364

🏁 Script executed:

#!/bin/bash # Look at apply_recipe_modifiers to see how recipe is used rg -A 100 'def apply_recipe_modifiers' src/llmcompressor/entrypoints/oneshot.py

Repository: vllm-project/llm-compressor

Length of output: 3644

🏁 Script executed:

#!/bin/bash # Check if there's any special handling for Modifier objects in parse_args or elsewhere rg -n 'isinstance.*Modifier|Modifier.*recipe' src/llmcompressor/entrypoints/oneshot.py

Repository: vllm-project/llm-compressor

Length of output: 53

🏁 Script executed:

#!/bin/bash # Check if HfArgumentParser from transformers can handle non-string objects # Let's look at how parse_dict is actually used and if there's special handling rg -B 5 -A 15 'parser.parse_dict' src/llmcompressor/args/utils.py

Repository: vllm-project/llm-compressor

Length of output: 855

🏁 Script executed:

#!/bin/bash # Check if there's any special isinstance or type checking for recipe before parse_args rg -B 10 'parse_args\(' src/llmcompressor/entrypoints/oneshot.py | head -30

Repository: vllm-project/llm-compressor

Length of output: 457

🏁 Script executed:

#!/bin/bash # Check what HfArgumentParser.parse_dict does with non-matching types # Look for how recipe value is actually handled when it's not a string python3 << 'EOF' from transformers.hf_argparse import HfArgumentParser from dataclasses import dataclass from typing import Optional `@dataclass` class TestArgs: recipe: Optional[str] = None parser = HfArgumentParser(TestArgs) # Test 1: passing a string (normal case) try: result1 = parser.parse_dict({"recipe": "path/to/recipe.yaml"}) print(f"Test 1 (string): SUCCESS - {result1}") except Exception as e: print(f"Test 1 (string): FAILED - {type(e).__name__}: {e}") # Test 2: passing a non-string object class FakeModifier: def __init__(self): self.name = "test_modifier" try: result2 = parser.parse_dict({"recipe": FakeModifier()}) print(f"Test 2 (object): SUCCESS - recipe type = {type(result2[0].recipe)}") print(f" recipe value = {result2[0].recipe}") except Exception as e: print(f"Test 2 (object): FAILED - {type(e).__name__}: {e}") EOF

Repository: vllm-project/llm-compressor

Length of output: 195

🏁 Script executed:

#!/bin/bash # Check session.initialize to see how it handles recipe parameter rg -n 'def initialize' src/llmcompressor/core/session_functions.py

Repository: vllm-project/llm-compressor

Length of output: 53

🏁 Script executed:

#!/bin/bash # Search for how recipe is actually processed in CompressionSession or related classes rg -n 'def initialize' src/llmcompressor/core/ | head -20

Repository: vllm-project/llm-compressor

Length of output: 186

🏁 Script executed:

#!/bin/bash # Check session.initialize signature and how it handles recipe sed -n '74,150p' src/llmcompressor/core/session.py

Repository: vllm-project/llm-compressor

Length of output: 3509

🏁 Script executed:

#!/bin/bash # Check lifecycle.initialize sed -n '73,150p' src/llmcompressor/core/lifecycle.py

Repository: vllm-project/llm-compressor

Length of output: 2616

🏁 Script executed:

#!/bin/bash # Check RecipeInput type definition rg -n 'RecipeInput' src/llmcompressor/core/lifecycle.py

Repository: vllm-project/llm-compressor

Length of output: 207

🏁 Script executed:

#!/bin/bash # Find Recipe class and create_instance method rg -n 'class Recipe' src/ | head -5

Repository: vllm-project/llm-compressor

Length of output: 202

🏁 Script executed:

#!/bin/bash # Find RecipeInput definition rg -n 'RecipeInput' src/llmcompressor/recipe/__init__.py

Repository: vllm-project/llm-compressor

Length of output: 170

🏁 Script executed:

#!/bin/bash # Find RecipeInput definition in recipe module rg -n 'RecipeInput' src/llmcompressor/recipe/ | head -10

Repository: vllm-project/llm-compressor

Length of output: 424

🏁 Script executed:

#!/bin/bash # Check Recipe.create_instance to confirm it handles Modifier objects rg -A 50 'def create_instance' src/llmcompressor/recipe/recipe.py | head -80

Repository: vllm-project/llm-compressor

Length of output: 2198

🏁 Script executed:

#!/bin/bash # Verify QuantizationModifier is a subclass of Modifier rg -n 'class QuantizationModifier' src/llmcompressor/modifiers/quantization/

Repository: vllm-project/llm-compressor

Length of output: 192

🏁 Script executed:

#!/bin/bash # Check if there's any special handling of recipe parameter BEFORE it reaches parse_args # in the oneshot function sed -n '249,380p' src/llmcompressor/entrypoints/oneshot.py | tail -100

Repository: vllm-project/llm-compressor

Length of output: 5249

🏁 Script executed:

#!/bin/bash # Check the full oneshot function body after the signature to see if recipe is processed rg -A 200 'def oneshot\(' src/llmcompressor/entrypoints/oneshot.py | grep -A 50 'model_args, dataset_args'

Repository: vllm-project/llm-compressor

Length of output: 53

Update type hints or documentation to support QuantizationModifier objects in recipe parameter.

The qwen3_5_example.py passes a QuantizationModifier object as the recipe argument (line 19, 36), but oneshot() at line 262 declares recipe: str | list[str] | None = None, and RecipeArguments declares recipe: str | None. While Recipe.create_instance() does accept Modifier objects via its RecipeInput union type (which includes Modifier | List[Modifier]), the type hints along the call chain—particularly in oneshot() and RecipeArguments—do not reflect this capability. This creates a type safety inconsistency; the example works because Recipe.create_instance() internally handles Modifier objects, but the type annotations should either be updated to include Modifier | List[Modifier], or the docstring should explicitly document that Modifier instances are accepted.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/quantization_w8a8_fp8/qwen3_5_example.py` around lines 34 - 36, The example passes a QuantizationModifier to oneshot(), but oneshot() and RecipeArguments only type recipe as str | list[str] | None (and RecipeArguments.recipe: str | None), causing a type mismatch; update the type hints to accept Modifier | list[Modifier] (or QuantizationModifier | list[QuantizationModifier]) so the signatures align with Recipe.create_instance() which accepts RecipeInput (Modifier | list[Modifier] | str | list[str]), and update any related docstrings to mention Modifier instances are supported; specifically modify the oneshot() parameter annotation, RecipeArguments.recipe annotation, and any docs referencing recipe to include Modifier (or use the Modifier union) so QuantizationModifier passes type checking.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: zhangxin81 <115389973+zhangxin81@users.noreply.github.com>

brian-dellabetta

Hi @zhangxin81 , thanks for submitting this PR, but we have a similar Qwen3.5 example already available with lots of overlap with yours -- https://github.com/vllm-project/llm-compressor/blob/main/examples/quantization_w4a16_fp4/mxfp4/qwen3.5_example.py.

Did you have trouble finding the example in our examples/ folder? We are looking into sorting by model architecture

mergify · 2026-04-22T21:57:01Z

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🔴 Require two reviews

Waiting for:

#approved-reviews-by >= 2

This rule is failing.

PRs labelled "two-reviews" must have at least two approving reviews before merging.

#approved-reviews-by >= 2
#changes-requested-reviews-by = 0

add example of w8a8fp8 for qwen3.5

16802c1

Signed-off-by: zhangxin81 <115389973+zhangxin81@users.noreply.github.com>

mergify Bot added the documentation Improvements or additions to documentation label Apr 20, 2026

gemini-code-assist Bot reviewed Apr 20, 2026

View reviewed changes

coderabbitai Bot added enhancement New feature or request qwen For any PR / issue related to Qwen support fp8 For any issue / PR related to FP8 support labels Apr 20, 2026

coderabbitai Bot reviewed Apr 20, 2026

View reviewed changes

Apply suggestion from @gemini-code-assist[bot]

b754678

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: zhangxin81 <115389973+zhangxin81@users.noreply.github.com>

brian-dellabetta reviewed Apr 20, 2026

View reviewed changes

mergify Bot added the two-reviews When a PR requires two reviews label Apr 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add example of w8a8fp8 for qwen3.5#2631

add example of w8a8fp8 for qwen3.5#2631
zhangxin81 wants to merge 2 commits intovllm-project:mainfrom
zhangxin81:dev-zx

zhangxin81 commented Apr 20, 2026

Uh oh!

github-actions Bot commented Apr 20, 2026

Uh oh!

coderabbitai Bot commented Apr 20, 2026 •

edited

Loading

Review skipped

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Uh oh!

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 20, 2026

Uh oh!

coderabbitai Bot Apr 20, 2026

Uh oh!

brian-dellabetta left a comment

Uh oh!

mergify Bot commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zhangxin81 commented Apr 20, 2026

Uh oh!

github-actions Bot commented Apr 20, 2026

Uh oh!

coderabbitai Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Suggested labels

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

brian-dellabetta left a comment

Choose a reason for hiding this comment

Uh oh!

mergify Bot commented Apr 22, 2026

Merge Protections

🔴 Require two reviews

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented Apr 20, 2026 •

edited

Loading