Change input shape for 3D position_ids of Qwen 2.5 VL with M-RoPE#3400
Change input shape for 3D position_ids of Qwen 2.5 VL with M-RoPE#3400yatarkan merged 4 commits intoopenvinotoolkit:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR modifies the handling of 3D position_ids in continuous batching for models that use M-RoPE (Multi-dimensional Rotary Position Embedding), specifically Qwen2-VL models. The change fixes incorrect position_ids reshaping from flattening [3, 1, N] → [3*N] to properly squeezing [3, 1, N] → [3, N].
Changes:
- Fixed M-RoPE position_ids handling by squeezing the pseudo-batch dimension instead of flattening all dimensions
- Added more specific shape validation (checking for shape[0]==3 and shape[1]==1) before reshaping
| if (position_ids.get_shape().size() == 3 && position_ids.get_shape()[0] == 3 && | ||
| position_ids.get_shape()[1] == 1) { | ||
| // M-RoPE: squeeze pseudo-batch dim [3, 1, total_token_num] -> [3, total_token_num] | ||
| const auto& position_ids_shape = position_ids.get_shape(); | ||
| position_ids.set_shape({position_ids_shape[0], position_ids_shape[2]}); | ||
| } |
There was a problem hiding this comment.
This change affects M-RoPE position_ids handling for models like Qwen2-VL, but there's no test coverage for Qwen2-VL models in continuous batching mode. The test test_vlm_continuous_batching_generate_vs_add_request uses @parametrize_one_model_pa which only tests MODEL_IDS[0] (tiny-random-minicpmv-2_6), not the Qwen2-VL models that use M-RoPE with 3D position_ids.
According to custom coding guideline 1000000 rule 9: "Tests have been updated or added to cover the new code." Since this fix addresses M-RoPE-specific position_ids handling, tests should be added or updated to verify that Qwen2-VL models work correctly with continuous batching, particularly in the generation phase where 3D position_ids [3, 1, N] are created.
| if (position_ids.get_shape().size() == 3 && position_ids.get_shape()[0] == 3 && | ||
| position_ids.get_shape()[1] == 1) { | ||
| // M-RoPE: squeeze pseudo-batch dim [3, 1, total_token_num] -> [3, total_token_num] | ||
| const auto& position_ids_shape = position_ids.get_shape(); |
There was a problem hiding this comment.
The hardcoded check for shape[0] == 3 and shape[1] == 1 is specific to M-RoPE (Multi-dimensional Rotary Position Embedding) used by Qwen2-VL models. This is fragile because:
- It assumes M-RoPE always uses exactly 3 dimensions and a batch size of 1
- Other models with 3D position_ids but different shapes would bypass this logic
- There's no documentation explaining why these specific values are checked
Consider:
- Adding a comment explaining that these values are M-RoPE specific constants
- Adding a model capability flag or configuration to indicate M-RoPE support rather than relying on shape inference
- Validating that these constants match the actual M-RoPE implementation in Qwen2VL (currently hardcoded as shape [3, 1, N] in qwen2vl/classes.cpp:1208)
| if (position_ids.get_shape().size() == 3 && position_ids.get_shape()[0] == 3 && | |
| position_ids.get_shape()[1] == 1) { | |
| // M-RoPE: squeeze pseudo-batch dim [3, 1, total_token_num] -> [3, total_token_num] | |
| const auto& position_ids_shape = position_ids.get_shape(); | |
| // Qwen2-VL M-RoPE compatibility: | |
| // Qwen2-VL uses Multi-dimensional Rotary Position Embedding (M-RoPE), where position_ids | |
| // are produced with a fixed 3D layout [3, 1, total_token_num]. The leading dimension (3) | |
| // corresponds to M-RoPE's internal components and the middle dimension is a pseudo-batch | |
| // of size 1. The model implementation (see qwen2vl/classes.cpp, around line 1208) expects | |
| // a squeezed layout [3, total_token_num] at inference time. | |
| // | |
| // To support Qwen2-VL without changing other models, handle only this specific M-RoPE | |
| // shape by removing the pseudo-batch dimension. | |
| constexpr std::size_t qwen2vl_mrope_components = 3; // M-RoPE internal components | |
| constexpr std::size_t qwen2vl_mrope_batch = 1; // pseudo-batch dimension | |
| const auto& position_ids_shape = position_ids.get_shape(); | |
| if (position_ids_shape.size() == 3 && | |
| position_ids_shape[0] == qwen2vl_mrope_components && | |
| position_ids_shape[1] == qwen2vl_mrope_batch) { | |
| // M-RoPE: squeeze pseudo-batch dim [3, 1, total_token_num] -> [3, total_token_num] |
| if (position_ids.get_shape().size() == 3 && position_ids.get_shape()[0] == 3 && | ||
| position_ids.get_shape()[1] == 1) { | ||
| // M-RoPE: squeeze pseudo-batch dim [3, 1, total_token_num] -> [3, total_token_num] | ||
| const auto& position_ids_shape = position_ids.get_shape(); | ||
| position_ids.set_shape({position_ids_shape[0], position_ids_shape[2]}); | ||
| } |
There was a problem hiding this comment.
The PR description is incomplete and does not follow the template. According to the custom coding guideline 1000000, rule 1: "PR description must be aligned with pull_request_template.md and its checklist must be filled out."
The description should include:
- A summary of the change (M-RoPE position_ids shape handling fix)
- Why this change is needed
- Whether tests have been added or updated
- Ticket information if applicable
- All checklist items must be completed
|
Requires: openvinotoolkit/openvino#34365 |
|
The tests will pass once we merge the runtime PR: openvinotoolkit/openvino#34365 |
| // TODO: remove it when GEMMA3 ticket-171180 is fixed | ||
| bool requires_sdpa(const std::filesystem::path& models_dir) { | ||
| auto vlm_config = utils::from_config_json_if_exists<VLMConfig>(models_dir, "config.json"); | ||
| return vlm_config.model_type == VLMModelType::QWEN2_VL || | ||
| vlm_config.model_type == VLMModelType::QWEN2_5_VL || | ||
| vlm_config.model_type == VLMModelType::GEMMA3; | ||
| return vlm_config.model_type == VLMModelType::GEMMA3; | ||
| } |
There was a problem hiding this comment.
PR description does not follow the repository PR template: the checklist items are missing and the ticket placeholder format (CVS-###) is not used. Please update the PR description to match .github/pull_request_template.md (include the checklist with checked/unchecked items and the CVS ticket line).
| if (position_ids.get_shape().size() == 3 && position_ids.get_shape()[0] == 3 && | ||
| position_ids.get_shape()[1] == 1) { | ||
| // M-RoPE: squeeze pseudo-batch dim [3, 1, total_token_num] -> [3, total_token_num] | ||
| const auto& position_ids_shape = position_ids.get_shape(); | ||
| position_ids.set_shape({position_ids_shape[0], position_ids_shape[2]}); | ||
| } |
There was a problem hiding this comment.
The new M-RoPE position_ids shape adjustment ([3, 1, total_token_num] -> [3, total_token_num]) is a behavior change in the ContinuousBatching inference path, but there does not appear to be any C++ test coverage asserting the expected position_ids rank/shape for VLM/Qwen CB runs. Please add a regression test that fails if 3D position_ids are flattened or otherwise reshaped incorrectly.
603d66f
The analysis has shown that the correct shape for the 3D position_ids tensor of Qwen 2.5 VL in the ContinuousBatching mode is not flattening, but [3, total_token_num]. It allows to preserve the correct 3D semantics for M-RoPE of the model. Instead of Reshaping on the transformation side, it is correct and logical to provide proper tensor from the GenAI side.
Signed-off-by: Andrii Staikov andrii.staikov@intel.com