Update llama4 example #2383

mengniwang95 · 2026-01-13T02:40:53Z

User description

Type of Change

update example

Description

fix VLLM_FLASHINFER_DISABLE_Q_QUANTIZATION for fp8 kv

PR Type

Bug fix

Description

Fix VLLM_FLASHINFER_DISABLE_Q_QUANTIZATION for fp8 kv

Diagram Walkthrough

flowchart LR
  A["Set VLLM_FLASHINFER_DISABLE_Q_QUANTIZATION"] -- "from 0 to 1" --> B["For fp8 kv"]

File Walkthrough

Relevant files

Bug fix

run_benchmark.sh `Update VLLM_FLASHINFER_DISABLE_Q_QUANTIZATION for fp8 kv` examples/pytorch/multimodal-modeling/quantization/auto_round/llama4/run_benchmark.sh Update VLLM_FLASHINFER_DISABLE_Q_QUANTIZATION to 1 for fp8 kv	+1/-1

PRAgent4INC · 2026-01-13T02:41:24Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Configuration Change The change sets `VLLM_FLASHINFER_DISABLE_Q_QUANTIZATION` to 1 for fp8 kv. Ensure this change aligns with the intended behavior and does not introduce any unintended side effects. export VLLM_FLASHINFER_DISABLE_Q_QUANTIZATION=1

PRAgent4INC · 2026-01-13T02:41:33Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
General	Confirm quantization setting correctness Verify that setting `VLLM_FLASHINFER_DISABLE_Q_QUANTIZATION` to `1` is correct for the intended behavior when `kv_cache_dtype` is `fp8`. examples/pytorch/multimodal-modeling/quantization/auto_round/llama4/run_benchmark.sh [55] +export VLLM_FLASHINFER_DISABLE_Q_QUANTIZATION=1 - Suggestion importance[1-10]: 5 __ Why: The suggestion asks to verify the correctness of the quantization setting, which is a reasonable request but does not provide a direct improvement to the code. It should not receive a score above 7.	Low

Update llama4 example

8a1b044

PRAgent4INC added the Review effort 2/5 label Jan 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update llama4 example #2383

Update llama4 example #2383

mengniwang95 commented Jan 13, 2026 •

edited

Loading

Uh oh!

PRAgent4INC commented Jan 13, 2026

Uh oh!

PRAgent4INC commented Jan 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Update llama4 example #2383

Are you sure you want to change the base?

Update llama4 example #2383

Conversation

mengniwang95 commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

Type of Change

Description

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

PRAgent4INC commented Jan 13, 2026

PR Reviewer Guide 🔍

Uh oh!

PRAgent4INC commented Jan 13, 2026

PR Code Suggestions ✨

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mengniwang95 commented Jan 13, 2026 •

edited

Loading