Skip to content

[rollout] fix Qwen3-VL video metadata handling in vLLM#662

Merged
hiyouga merged 1 commit intohiyouga:mainfrom
rank-Yu:fix
Mar 10, 2026
Merged

[rollout] fix Qwen3-VL video metadata handling in vLLM#662
hiyouga merged 1 commit intohiyouga:mainfrom
rank-Yu:fix

Conversation

@rank-Yu
Copy link
Copy Markdown
Contributor

@rank-Yu rank-Yu commented Mar 9, 2026

fix #646

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a specific bug related to video metadata handling for the Qwen3-VL model within the vLLM rollout system. It introduces a controlled mechanism to return video metadata, ensuring that the Qwen3VLProcessor correctly processes video inputs by enabling this feature only when necessary, thereby improving the model's robustness and accuracy with multimodal data.

Highlights

  • Video Metadata Handling: Introduced a new return_metadata parameter to the process_video function in verl/utils/dataset.py, allowing callers to specify whether video metadata should be returned.
  • Qwen3-VL Specific Fix: Modified verl/workers/rollout/vllm_rollout_spmd.py to conditionally enable the return of video metadata specifically when the Qwen3VLProcessor is in use, addressing a bug in its video processing.
  • Processing Pipeline Update: Updated the _process_multi_modal_data and generate_sequences functions in verl/workers/rollout/vllm_rollout_spmd.py to propagate and utilize the new return_video_metadata flag throughout the video processing pipeline.
Changelog
  • verl/utils/dataset.py
    • Added a return_metadata parameter to the process_video function signature.
    • Passed the new return_metadata parameter to the fetch_video function call.
  • verl/workers/rollout/vllm_rollout_spmd.py
    • Added a return_video_metadata parameter to the _process_multi_modal_data function.
    • Updated the call to process_video within _process_multi_modal_data to include the return_video_metadata argument.
    • Introduced a new instance attribute self.return_video_metadata in the __init__ method, which is set to True if the processor is identified as Qwen3VLProcessor.
    • Modified the generate_sequences method to pass self.return_video_metadata to the _process_multi_modal_data function call.
Activity
  • No specific activity has been recorded for this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces changes to handle video metadata for Qwen3-VL models within vLLM, primarily by adding a return_metadata flag that is propagated through the video processing pipeline. The logic seems sound and addresses the issue described. I have a couple of suggestions to enhance code quality and maintainability. Specifically, I recommend replacing the generic Any type hint with a more specific Union type to improve type safety, and using isinstance for type checking instead of relying on class name strings, which can be brittle.

Comment thread verl/utils/dataset.py
Comment thread verl/workers/rollout/vllm_rollout_spmd.py
@hiyouga hiyouga merged commit 6842bb2 into hiyouga:main Mar 10, 2026
1 check passed
@rank-Yu rank-Yu deleted the fix branch March 10, 2026 09:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

metadata is None

2 participants