Skip to content

Conversation

nikita-savelyevv
Copy link
Collaborator

@nikita-savelyevv nikita-savelyevv commented Oct 8, 2025

What does this PR do?

Executing VLMs after static quantization but before serialization leads to original non-quantized models being inferred instead. For example in case of vision embeddings model within a VLM pipeline, the model is replaced at OVModelForVisualCausalLM level, but not at OVVisionEmbedding level.

This is a fast fix for VLMs only, full fix is here #1461

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Copy link
Member

@IlyasMoutawwakil IlyasMoutawwakil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ! Thanks !

Copy link
Collaborator

@echarlaix echarlaix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix !

@echarlaix echarlaix merged commit f9cff03 into huggingface:main Oct 8, 2025
23 of 37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants