Skip to content

Commit 603d66f

Browse files
Change input shape for 3D position_ids of Qwen 2.5 VL with M-RoPE (openvinotoolkit#3400)
The analysis has shown that the correct shape for the 3D position_ids tensor of Qwen 2.5 VL in the ContinuousBatching mode is not flattening, but [3, total_token_num]. It allows to preserve the correct 3D semantics for M-RoPE of the model. Instead of Reshaping on the transformation side, it is correct and logical to provide proper tensor from the GenAI side. - Ticket: [CVS-167316](https://jira.devtools.intel.com/browse/CVS-167316) Signed-off-by: Andrii Staikov <andrii.staikov@intel.com> --------- Co-authored-by: Yaroslav Tarkan <yaroslav.tarkan@intel.com>
1 parent 0e5f39c commit 603d66f

File tree

2 files changed

+7
-7
lines changed

2 files changed

+7
-7
lines changed

src/cpp/src/continuous_batching/model_runner.hpp

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -514,9 +514,11 @@ class ModelRunner {
514514
if (hidden_state_input && hidden_state_input.get_size() > 0) {
515515
m_request.set_tensor("hidden_states", hidden_state_input);
516516
}
517-
if (position_ids.get_shape().size() == 3) {
518-
// flatten positions ids for 3D position ids case
519-
position_ids.set_shape({ov::shape_size(position_ids.get_shape())});
517+
if (position_ids.get_shape().size() == 3 && position_ids.get_shape()[0] == 3 &&
518+
position_ids.get_shape()[1] == 1) {
519+
// M-RoPE: squeeze pseudo-batch dim [3, 1, total_token_num] -> [3, total_token_num]
520+
const auto& position_ids_shape = position_ids.get_shape();
521+
position_ids.set_shape({position_ids_shape[0], position_ids_shape[2]});
520522
}
521523
// typical LLM parameters
522524
if (!m_cached_position_ids) {

src/cpp/src/visual_language/pipeline.cpp

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -637,12 +637,10 @@ class VLMPipeline::VLMPipelineImpl : public VLMPipelineBase{
637637
}
638638
};
639639

640-
// TODO: remove it when QWEN ticket-167316/GEMMA3 ticket-171180 is fixed
640+
// TODO: remove it when GEMMA3 ticket-171180 is fixed
641641
bool requires_sdpa(const std::filesystem::path& models_dir) {
642642
auto vlm_config = utils::from_config_json_if_exists<VLMConfig>(models_dir, "config.json");
643-
return vlm_config.model_type == VLMModelType::QWEN2_VL ||
644-
vlm_config.model_type == VLMModelType::QWEN2_5_VL ||
645-
vlm_config.model_type == VLMModelType::GEMMA3;
643+
return vlm_config.model_type == VLMModelType::GEMMA3;
646644
}
647645

648646
VLMPipeline::VLMPipeline(

0 commit comments

Comments
 (0)