Skip to content

Conversation

rkazants
Copy link
Collaborator

@rkazants rkazants commented Oct 13, 2025

What does this PR do?

It recovers cache_position input in the exported whisper model. Regression happened in #1457

Fixes https://jira.devtools.intel.com/browse/CVS-174805

Before submitting

  • [N/A] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • [N/A] Did you make sure to update the documentation with your changes?
  • [N/A] Did you write any new necessary tests?

Signed-off-by: Kazantsev, Roman <[email protected]>
@IlyasMoutawwakil
Copy link
Member

thanks @rkazants i don't have access to the jira ticket, can you please elaborate on why is cache_position necessary for you and why our testing didn't catch it / fail without it ?

common_inputs = super().inputs
if self._behavior is not ConfigBehavior.ENCODER and self.use_past_in_inputs:
# since https://github.com/huggingface/transformers/pull/31166
common_inputs["cache_position"] = {0: "decoder_sequence_length"}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a condition somewhere in tests that would check that the exported OpenVINO decoder model has cache_position input? Also, are stateless whisper models also affected? If so, we should check for that case too.

Copy link
Collaborator

@nikita-savelyevv nikita-savelyevv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I have one comment about test coverage

@rkazants
Copy link
Collaborator Author

rkazants commented Oct 13, 2025

thanks @rkazants i don't have access to the jira ticket, can you please elaborate on why is cache_position necessary for you

It is important for NPU device. This affects both static(NPU)/stateful(CPU/GPU) Whisper GenAI pipelines.

why our testing didn't catch it / fail without it ?

No tests:)

@IlyasMoutawwakil
Copy link
Member

IlyasMoutawwakil commented Oct 13, 2025

This affects both static(NPU)/stateful(CPU/GPU) Whisper GenAI pipelines.

how ? 😅 please elaborate.
correct me if i'm wrong, but the idea of the inputs attribute is to only have necessary inputs, i.e. the ones needed to do correct inference ?
for example all encoder-decoder models can take a decoder attention mask tensor but we only add that to the ones that truly need it / use it / can't generate it correctly internally (pix2struct for example).
the cache position argument work the same way and is generated internally in the model correctly with a range using the shape of input ids and the shape of kv cache (except in transformers 4.43 to 4.45 where it couldn't be generated internally correctly).
my question is, why is it needed here all the time in the case of npu ? is it really an npu thing, or an openvino.genai thing, i.e. in openvino genai they use this input and that's the real reason why we need to keep it ? because from an inference / traced graph stand point , the cache position input is not needed 🤔

def inputs(self):
common_inputs = super().inputs
if self._behavior is not ConfigBehavior.ENCODER and self.use_past_in_inputs:
# since https://github.com/huggingface/transformers/pull/31166
Copy link
Member

@IlyasMoutawwakil IlyasMoutawwakil Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment only explains why it's needed from version 4.43 to 4.45, which is already covered in https://github.com/huggingface/optimum-onnx/blob/main/optimum/exporters/onnx/model_configs.py#L2009. it doesn't explain why it's always needed with openvino

@IlyasMoutawwakil
Copy link
Member

IlyasMoutawwakil commented Oct 13, 2025

checking the openvino.genai code, it seems that this line is executed whether the model has a cache position input/tensor or not
https://github.com/openvinotoolkit/openvino.genai/blob/696abc354dbe005af6b4c760aafc1c1921c02319/src/cpp/src/whisper/models/statefull_decoder.cpp#L32 i.e. it's an inference issue, not an export issue
this problem can be solved there by simply checking for the existence of cache position before applying the function 🤔

@IlyasMoutawwakil
Copy link
Member

No tests:)

nothing failed in our tests, not because there are no tests, but because our inference code supports both having and not having the cache position input https://github.com/huggingface/optimum-intel/blob/main/optimum/intel/openvino/modeling_seq2seq.py#L1017 😉

@echarlaix
Copy link
Collaborator

this problem can be solved there by simply checking for the existence of cache position before applying the function 🤔

Yes makes sense to fix this directly in openvino genai, not sure to understand why we would need this here

@rkazants
Copy link
Collaborator Author

Decision is to fix on GenAI side to handle IRs without cache_position input

@rkazants rkazants closed this Oct 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants