[OV] Recover cache_position for Whisper #1469

rkazants · 2025-10-13T11:40:36Z

What does this PR do?

It recovers cache_position input in the exported whisper model. Regression happened in #1457

Fixes https://jira.devtools.intel.com/browse/CVS-174805

Before submitting

[N/A] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[N/A] Did you make sure to update the documentation with your changes?
[N/A] Did you write any new necessary tests?

Signed-off-by: Kazantsev, Roman <[email protected]>

IlyasMoutawwakil · 2025-10-13T12:02:07Z

thanks @rkazants i don't have access to the jira ticket, can you please elaborate on why is cache_position necessary for you and why our testing didn't catch it / fail without it ?

nikita-savelyevv · 2025-10-13T12:03:41Z

optimum/exporters/openvino/model_configs.py

+        common_inputs = super().inputs
+        if self._behavior is not ConfigBehavior.ENCODER and self.use_past_in_inputs:
+            # since https://github.com/huggingface/transformers/pull/31166
+            common_inputs["cache_position"] = {0: "decoder_sequence_length"}


Could you please add a condition somewhere in tests that would check that the exported OpenVINO decoder model has cache_position input? Also, are stateless whisper models also affected? If so, we should check for that case too.

nikita-savelyevv

LGTM. I have one comment about test coverage

rkazants · 2025-10-13T12:06:53Z

thanks @rkazants i don't have access to the jira ticket, can you please elaborate on why is cache_position necessary for you

It is important for NPU device. This affects both static(NPU)/stateful(CPU/GPU) Whisper GenAI pipelines.

why our testing didn't catch it / fail without it ?

No tests:)

IlyasMoutawwakil · 2025-10-13T12:18:10Z

This affects both static(NPU)/stateful(CPU/GPU) Whisper GenAI pipelines.

how ? 😅 please elaborate.
correct me if i'm wrong, but the idea of the inputs attribute is to only have necessary inputs, i.e. the ones needed to do correct inference ?
for example all encoder-decoder models can take a decoder attention mask tensor but we only add that to the ones that truly need it / use it / can't generate it correctly internally (pix2struct for example).
the cache position argument work the same way and is generated internally in the model correctly with a range using the shape of input ids and the shape of kv cache (except in transformers 4.43 to 4.45 where it couldn't be generated internally correctly).
my question is, why is it needed here all the time in the case of npu ? is it really an npu thing, or an openvino.genai thing, i.e. in openvino genai they use this input and that's the real reason why we need to keep it ? because from an inference / traced graph stand point , the cache position input is not needed 🤔

IlyasMoutawwakil · 2025-10-13T12:26:32Z

optimum/exporters/openvino/model_configs.py

+    def inputs(self):
+        common_inputs = super().inputs
+        if self._behavior is not ConfigBehavior.ENCODER and self.use_past_in_inputs:
+            # since https://github.com/huggingface/transformers/pull/31166


this comment only explains why it's needed from version 4.43 to 4.45, which is already covered in https://github.com/huggingface/optimum-onnx/blob/main/optimum/exporters/onnx/model_configs.py#L2009. it doesn't explain why it's always needed with openvino

IlyasMoutawwakil · 2025-10-13T12:33:50Z

checking the openvino.genai code, it seems that this line is executed whether the model has a cache position input/tensor or not
https://github.com/openvinotoolkit/openvino.genai/blob/696abc354dbe005af6b4c760aafc1c1921c02319/src/cpp/src/whisper/models/statefull_decoder.cpp#L32 i.e. it's an inference issue, not an export issue
this problem can be solved there by simply checking for the existence of cache position before applying the function 🤔

IlyasMoutawwakil · 2025-10-13T12:35:45Z

No tests:)

nothing failed in our tests, not because there are no tests, but because our inference code supports both having and not having the cache position input https://github.com/huggingface/optimum-intel/blob/main/optimum/intel/openvino/modeling_seq2seq.py#L1017 😉

echarlaix · 2025-10-13T12:59:39Z

this problem can be solved there by simply checking for the existence of cache position before applying the function 🤔

Yes makes sense to fix this directly in openvino genai, not sure to understand why we would need this here

rkazants · 2025-10-13T16:15:18Z

Decision is to fix on GenAI side to handle IRs without cache_position input

[OV] Recover cache_position for Whisper

4e406c0

rkazants requested review from IlyasMoutawwakil, echarlaix and nikita-savelyevv October 13, 2025 11:40

Fix formatting

155dbcb

Signed-off-by: Kazantsev, Roman <[email protected]>

nikita-savelyevv reviewed Oct 13, 2025

View reviewed changes

IlyasMoutawwakil reviewed Oct 13, 2025

View reviewed changes

rkazants closed this Oct 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[OV] Recover cache_position for Whisper #1469

[OV] Recover cache_position for Whisper #1469

Uh oh!

rkazants commented Oct 13, 2025 •

edited

Loading

Uh oh!

IlyasMoutawwakil commented Oct 13, 2025

Uh oh!

nikita-savelyevv Oct 13, 2025

Uh oh!

nikita-savelyevv left a comment •

edited

Loading

Uh oh!

rkazants commented Oct 13, 2025 •

edited

Loading

Uh oh!

IlyasMoutawwakil commented Oct 13, 2025 •

edited

Loading

Uh oh!

IlyasMoutawwakil Oct 13, 2025 •

edited

Loading

Uh oh!

IlyasMoutawwakil commented Oct 13, 2025 •

edited

Loading

Uh oh!

IlyasMoutawwakil commented Oct 13, 2025

Uh oh!

echarlaix commented Oct 13, 2025

Uh oh!

rkazants commented Oct 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[OV] Recover cache_position for Whisper #1469

[OV] Recover cache_position for Whisper #1469

Uh oh!

Conversation

rkazants commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Uh oh!

IlyasMoutawwakil commented Oct 13, 2025

Uh oh!

nikita-savelyevv Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

nikita-savelyevv left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rkazants commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

IlyasMoutawwakil commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

IlyasMoutawwakil Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

IlyasMoutawwakil commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

IlyasMoutawwakil commented Oct 13, 2025

Uh oh!

echarlaix commented Oct 13, 2025

Uh oh!

rkazants commented Oct 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rkazants commented Oct 13, 2025 •

edited

Loading

nikita-savelyevv left a comment •

edited

Loading

rkazants commented Oct 13, 2025 •

edited

Loading

IlyasMoutawwakil commented Oct 13, 2025 •

edited

Loading

IlyasMoutawwakil Oct 13, 2025 •

edited

Loading

IlyasMoutawwakil commented Oct 13, 2025 •

edited

Loading