Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 5 additions & 3 deletions src/cpp/src/continuous_batching/model_runner.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -514,9 +514,11 @@ class ModelRunner {
if (hidden_state_input && hidden_state_input.get_size() > 0) {
m_request.set_tensor("hidden_states", hidden_state_input);
}
if (position_ids.get_shape().size() == 3) {
// flatten positions ids for 3D position ids case
position_ids.set_shape({ov::shape_size(position_ids.get_shape())});
if (position_ids.get_shape().size() == 3 && position_ids.get_shape()[0] == 3 &&
position_ids.get_shape()[1] == 1) {
// M-RoPE: squeeze pseudo-batch dim [3, 1, total_token_num] -> [3, total_token_num]
const auto& position_ids_shape = position_ids.get_shape();
Comment on lines +517 to +520
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardcoded check for shape[0] == 3 and shape[1] == 1 is specific to M-RoPE (Multi-dimensional Rotary Position Embedding) used by Qwen2-VL models. This is fragile because:

  1. It assumes M-RoPE always uses exactly 3 dimensions and a batch size of 1
  2. Other models with 3D position_ids but different shapes would bypass this logic
  3. There's no documentation explaining why these specific values are checked

Consider:

  • Adding a comment explaining that these values are M-RoPE specific constants
  • Adding a model capability flag or configuration to indicate M-RoPE support rather than relying on shape inference
  • Validating that these constants match the actual M-RoPE implementation in Qwen2VL (currently hardcoded as shape [3, 1, N] in qwen2vl/classes.cpp:1208)
Suggested change
if (position_ids.get_shape().size() == 3 && position_ids.get_shape()[0] == 3 &&
position_ids.get_shape()[1] == 1) {
// M-RoPE: squeeze pseudo-batch dim [3, 1, total_token_num] -> [3, total_token_num]
const auto& position_ids_shape = position_ids.get_shape();
// Qwen2-VL M-RoPE compatibility:
// Qwen2-VL uses Multi-dimensional Rotary Position Embedding (M-RoPE), where position_ids
// are produced with a fixed 3D layout [3, 1, total_token_num]. The leading dimension (3)
// corresponds to M-RoPE's internal components and the middle dimension is a pseudo-batch
// of size 1. The model implementation (see qwen2vl/classes.cpp, around line 1208) expects
// a squeezed layout [3, total_token_num] at inference time.
//
// To support Qwen2-VL without changing other models, handle only this specific M-RoPE
// shape by removing the pseudo-batch dimension.
constexpr std::size_t qwen2vl_mrope_components = 3; // M-RoPE internal components
constexpr std::size_t qwen2vl_mrope_batch = 1; // pseudo-batch dimension
const auto& position_ids_shape = position_ids.get_shape();
if (position_ids_shape.size() == 3 &&
position_ids_shape[0] == qwen2vl_mrope_components &&
position_ids_shape[1] == qwen2vl_mrope_batch) {
// M-RoPE: squeeze pseudo-batch dim [3, 1, total_token_num] -> [3, total_token_num]

Copilot uses AI. Check for mistakes.
position_ids.set_shape({position_ids_shape[0], position_ids_shape[2]});
}
Comment on lines +517 to 522
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change affects M-RoPE position_ids handling for models like Qwen2-VL, but there's no test coverage for Qwen2-VL models in continuous batching mode. The test test_vlm_continuous_batching_generate_vs_add_request uses @parametrize_one_model_pa which only tests MODEL_IDS[0] (tiny-random-minicpmv-2_6), not the Qwen2-VL models that use M-RoPE with 3D position_ids.

According to custom coding guideline 1000000 rule 9: "Tests have been updated or added to cover the new code." Since this fix addresses M-RoPE-specific position_ids handling, tests should be added or updated to verify that Qwen2-VL models work correctly with continuous batching, particularly in the generation phase where 3D position_ids [3, 1, N] are created.

Copilot generated this review using guidance from repository custom instructions.
Comment on lines +517 to 522
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description is incomplete and does not follow the template. According to the custom coding guideline 1000000, rule 1: "PR description must be aligned with pull_request_template.md and its checklist must be filled out."

The description should include:

  • A summary of the change (M-RoPE position_ids shape handling fix)
  • Why this change is needed
  • Whether tests have been added or updated
  • Ticket information if applicable
  • All checklist items must be completed

Copilot generated this review using guidance from repository custom instructions.
Comment on lines +517 to 522
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new M-RoPE position_ids shape adjustment ([3, 1, total_token_num] -> [3, total_token_num]) is a behavior change in the ContinuousBatching inference path, but there does not appear to be any C++ test coverage asserting the expected position_ids rank/shape for VLM/Qwen CB runs. Please add a regression test that fails if 3D position_ids are flattened or otherwise reshaped incorrectly.

Copilot generated this review using guidance from repository custom instructions.
// typical LLM parameters
if (!m_cached_position_ids) {
Expand Down
6 changes: 2 additions & 4 deletions src/cpp/src/visual_language/pipeline.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -637,12 +637,10 @@ class VLMPipeline::VLMPipelineImpl : public VLMPipelineBase{
}
};

// TODO: remove it when QWEN ticket-167316/GEMMA3 ticket-171180 is fixed
// TODO: remove it when GEMMA3 ticket-171180 is fixed
bool requires_sdpa(const std::filesystem::path& models_dir) {
auto vlm_config = utils::from_config_json_if_exists<VLMConfig>(models_dir, "config.json");
return vlm_config.model_type == VLMModelType::QWEN2_VL ||
vlm_config.model_type == VLMModelType::QWEN2_5_VL ||
vlm_config.model_type == VLMModelType::GEMMA3;
return vlm_config.model_type == VLMModelType::GEMMA3;
}
Comment on lines +640 to 644
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description does not follow the repository PR template: the checklist items are missing and the ticket placeholder format (CVS-###) is not used. Please update the PR description to match .github/pull_request_template.md (include the checklist with checked/unchecked items and the CVS ticket line).

Copilot uses AI. Check for mistakes.

VLMPipeline::VLMPipeline(
Expand Down
Loading