-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Open
Labels
Description
OpenVINO Version
2026.0.0
Operating System
Windows 11
Device used for inference
GPU
Framework
None
Model used
No response
Issue description
In multi_tensor_variable_state.cpp in intel_gpu plugin, the rearrange_cache function contains a duplicate condition that causes FP32 KV cache data to never be copied, silently producing garbage output.
Buggy code:
if (ov::element::Type(kv_layout.data_type).size() == 2)
copy_element<uint16_t>(...);
else if (ov::element::Type(kv_layout.data_type).size() == 2) // ← duplicate, never true
copy_element<uint32_t>(...);Expected: Second condition should be == 4 to handle 4-byte data like fp32 or int32
Impact: When running beam search on GPU with FP32 precision and calling get_state(), the output buffer is never written and contains uninitialized memory. This causes silently corrupted results with no error or warning.
Fix:
if (ov::element::Type(kv_layout.data_type).size() == 2)
copy_element<uint16_t>(...);
else if (ov::element::Type(kv_layout.data_type).size() == 4)
copy_element<uint32_t>(...);If requested I can open a related PR.
Step-by-step reproduction
No response
Relevant log output
Issue submission checklist
- I'm reporting an issue. It's not a question.
- I checked the problem with the documentation, FAQ, open issues, Stack Overflow, etc., and have not found a solution.
- There is reproducer code and related data files such as images, videos, models, etc.
Reactions are currently unavailable