Skip to content

Conversation

@mengniwang95
Copy link
Contributor

@mengniwang95 mengniwang95 commented Jan 13, 2026

User description

Type of Change

update example

Description

fix VLLM_FLASHINFER_DISABLE_Q_QUANTIZATION for fp8 kv


PR Type

Bug fix


Description

  • Fix VLLM_FLASHINFER_DISABLE_Q_QUANTIZATION for fp8 kv

Diagram Walkthrough

flowchart LR
  A["Set VLLM_FLASHINFER_DISABLE_Q_QUANTIZATION"] -- "from 0 to 1" --> B["For fp8 kv"]
Loading

File Walkthrough

Relevant files
Bug fix
run_benchmark.sh
Update VLLM_FLASHINFER_DISABLE_Q_QUANTIZATION for fp8 kv 

examples/pytorch/multimodal-modeling/quantization/auto_round/llama4/run_benchmark.sh

  • Update VLLM_FLASHINFER_DISABLE_Q_QUANTIZATION to 1 for fp8 kv
+1/-1     

@PRAgent4INC
Copy link
Collaborator

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Configuration Change

The change sets VLLM_FLASHINFER_DISABLE_Q_QUANTIZATION to 1 for fp8 kv. Ensure this change aligns with the intended behavior and does not introduce any unintended side effects.

export VLLM_FLASHINFER_DISABLE_Q_QUANTIZATION=1

@PRAgent4INC
Copy link
Collaborator

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Confirm quantization setting correctness

Verify that setting VLLM_FLASHINFER_DISABLE_Q_QUANTIZATION to 1 is correct for the
intended behavior when kv_cache_dtype is fp8.

examples/pytorch/multimodal-modeling/quantization/auto_round/llama4/run_benchmark.sh [55]

+export VLLM_FLASHINFER_DISABLE_Q_QUANTIZATION=1
 
-
Suggestion importance[1-10]: 5

__

Why: The suggestion asks to verify the correctness of the quantization setting, which is a reasonable request but does not provide a direct improvement to the code. It should not receive a score above 7.

Low

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants