[rollout] fix Qwen3-VL video metadata handling in vLLM by rank-Yu · Pull Request #662 · hiyouga/EasyR1

rank-Yu · 2026-03-09T17:03:33Z

gemini-code-assist · 2026-03-09T17:03:49Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a specific bug related to video metadata handling for the Qwen3-VL model within the vLLM rollout system. It introduces a controlled mechanism to return video metadata, ensuring that the Qwen3VLProcessor correctly processes video inputs by enabling this feature only when necessary, thereby improving the model's robustness and accuracy with multimodal data.

Highlights

Video Metadata Handling: Introduced a new return_metadata parameter to the process_video function in verl/utils/dataset.py, allowing callers to specify whether video metadata should be returned.
Qwen3-VL Specific Fix: Modified verl/workers/rollout/vllm_rollout_spmd.py to conditionally enable the return of video metadata specifically when the Qwen3VLProcessor is in use, addressing a bug in its video processing.
Processing Pipeline Update: Updated the _process_multi_modal_data and generate_sequences functions in verl/workers/rollout/vllm_rollout_spmd.py to propagate and utilize the new return_video_metadata flag throughout the video processing pipeline.

Changelog

verl/utils/dataset.py
- Added a return_metadata parameter to the process_video function signature.
- Passed the new return_metadata parameter to the fetch_video function call.
verl/workers/rollout/vllm_rollout_spmd.py
- Added a return_video_metadata parameter to the _process_multi_modal_data function.
- Updated the call to process_video within _process_multi_modal_data to include the return_video_metadata argument.
- Introduced a new instance attribute self.return_video_metadata in the __init__ method, which is set to True if the processor is identified as Qwen3VLProcessor.
- Modified the generate_sequences method to pass self.return_video_metadata to the _process_multi_modal_data function call.

Activity

No specific activity has been recorded for this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces changes to handle video metadata for Qwen3-VL models within vLLM, primarily by adding a return_metadata flag that is propagated through the video processing pipeline. The logic seems sound and addresses the issue described. I have a couple of suggestions to enhance code quality and maintainability. Specifically, I recommend replacing the generic Any type hint with a more specific Union type to improve type safety, and using isinstance for type checking instead of relying on class name strings, which can be brittle.

Fix Qwen3-VL video metadata handling in vLLM

48c1a34

gemini-code-assist Bot reviewed Mar 9, 2026

View reviewed changes

Comment thread verl/utils/dataset.py

Comment thread verl/workers/rollout/vllm_rollout_spmd.py

hiyouga approved these changes Mar 10, 2026

View reviewed changes

hiyouga merged commit 6842bb2 into hiyouga:main Mar 10, 2026
1 check passed

rank-Yu deleted the fix branch March 10, 2026 09:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[rollout] fix Qwen3-VL video metadata handling in vLLM#662

[rollout] fix Qwen3-VL video metadata handling in vLLM#662
hiyouga merged 1 commit intohiyouga:mainfrom
rank-Yu:fix

rank-Yu commented Mar 9, 2026

Uh oh!

gemini-code-assist Bot commented Mar 9, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rank-Yu commented Mar 9, 2026

Uh oh!

gemini-code-assist Bot commented Mar 9, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants