-
Notifications
You must be signed in to change notification settings - Fork 146
[OV] Recover cache_position for Whisper #1469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Kazantsev, Roman <[email protected]>
thanks @rkazants i don't have access to the jira ticket, can you please elaborate on why is cache_position necessary for you and why our testing didn't catch it / fail without it ? |
common_inputs = super().inputs | ||
if self._behavior is not ConfigBehavior.ENCODER and self.use_past_in_inputs: | ||
# since https://github.com/huggingface/transformers/pull/31166 | ||
common_inputs["cache_position"] = {0: "decoder_sequence_length"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add a condition somewhere in tests that would check that the exported OpenVINO decoder model has cache_position
input? Also, are stateless whisper models also affected? If so, we should check for that case too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I have one comment about test coverage
It is important for NPU device. This affects both static(NPU)/stateful(CPU/GPU) Whisper GenAI pipelines.
No tests:) |
how ? 😅 please elaborate. |
def inputs(self): | ||
common_inputs = super().inputs | ||
if self._behavior is not ConfigBehavior.ENCODER and self.use_past_in_inputs: | ||
# since https://github.com/huggingface/transformers/pull/31166 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment only explains why it's needed from version 4.43 to 4.45, which is already covered in https://github.com/huggingface/optimum-onnx/blob/main/optimum/exporters/onnx/model_configs.py#L2009. it doesn't explain why it's always needed with openvino
checking the openvino.genai code, it seems that this line is executed whether the model has a cache position input/tensor or not |
nothing failed in our tests, not because there are no tests, but because our inference code supports both having and not having the cache position input https://github.com/huggingface/optimum-intel/blob/main/optimum/intel/openvino/modeling_seq2seq.py#L1017 😉 |
Yes makes sense to fix this directly in openvino genai, not sure to understand why we would need this here |
Decision is to fix on GenAI side to handle IRs without cache_position input |
What does this PR do?
It recovers cache_position input in the exported whisper model. Regression happened in #1457
Fixes https://jira.devtools.intel.com/browse/CVS-174805
Before submitting