Skip to content

Qwen3 OpenVINO support failing #1446

@tomaarsen

Description

@tomaarsen

Hello!

Bug Report overview

  • Exporting Qwen/Qwen3-Embedding-0.6B to OpenVINO results in nan and various warnings.

Details

Running the following script results in all nan's:

from optimum.intel.openvino import OVModelForFeatureExtraction
from transformers import AutoTokenizer

model = OVModelForFeatureExtraction.from_pretrained("Qwen/Qwen3-Embedding-0.6B")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-Embedding-0.6B")

sentences = ["This is an example sentence", "Each sentence is converted"]

for sentence in sentences:
    inputs = tokenizer(sentence, return_tensors="pt")
    outputs = model(**inputs)
    print(outputs.last_hidden_state)
Importing `MambaCache` from `transformers.cache_utils` is deprecated and will be removed in a future version. Please import it from `transformers` or `transformers.models.mamba.cache_mamba` instead.
No OpenVINO files were found for Qwen/Qwen3-Embedding-0.6B, setting `export=True` to convert the model to the OpenVINO IR. Don't forget to save the resulting model with `.save_pretrained()`
`loss_type=None` was set in the config but it is unrecognised.Using the default loss: `ForCausalLMLoss`.
[sic]\transformers\masking_utils.py:190: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if (padding_length := kv_length + kv_offset - attention_mask.shape[-1]) > 0:
[sic]\transformers\masking_utils.py:218: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if padding_mask is not None and padding_mask.shape[-1] > kv_length:
[sic]\transformers\integrations\sdpa_attention.py:82: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!   
  is_causal = query.shape[2] > 1 and attention_mask is None and getattr(module, "is_causal", True)
tensor([[[nan, nan, nan,  ..., nan, nan, nan],
         [nan, nan, nan,  ..., nan, nan, nan],
         [nan, nan, nan,  ..., nan, nan, nan],
         [nan, nan, nan,  ..., nan, nan, nan],
         [nan, nan, nan,  ..., nan, nan, nan],
         [nan, nan, nan,  ..., nan, nan, nan]]])
tensor([[[nan, nan, nan,  ..., nan, nan, nan],
         [nan, nan, nan,  ..., nan, nan, nan],
         [nan, nan, nan,  ..., nan, nan, nan],
         [nan, nan, nan,  ..., nan, nan, nan],
         [nan, nan, nan,  ..., nan, nan, nan]]])

I would expect this to approximately match the results with AutoModel:

from transformers import AutoTokenizer
from transformers import AutoModel

model = AutoModel.from_pretrained("Qwen/Qwen3-Embedding-0.6B")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-Embedding-0.6B")

sentences = ["This is an example sentence", "Each sentence is converted"]

for sentence in sentences:
    inputs = tokenizer(sentence, return_tensors="pt")
    outputs = model(**inputs)
    print(outputs.last_hidden_state)
tensor([[[  2.8325, -17.7200,   0.1776,  ...,  -6.9551, -12.0463,   0.8086],
         [ -0.2501,  -5.6465,  -1.3577,  ...,   1.5168,  -0.8063,  -3.2892],
         [  1.7796,   0.4740,  -1.3742,  ...,  -1.9026,  -0.6497,  -0.7862],
         [  3.3753,  -5.5741,  -1.3082,  ...,  -2.4205,  -0.3497,  -2.8036],
         [ -0.5573,  -7.5063,  -0.8194,  ...,  -0.1285,   2.5724,  -3.2815],
         [ -4.4455,  -1.7790,  -0.9880,  ...,   1.1086,   4.6749,  -1.2746]]],
       grad_fn=<MulBackward0>)
tensor([[[ 2.5925e+00, -6.6789e+00,  9.9559e-03,  ..., -6.1397e+00,
          -1.2755e+01,  3.6784e-01],
         [-5.9988e-01, -1.1165e+01, -8.8903e-01,  ...,  1.1954e+00,
           4.8131e-01, -9.6972e-01],
         [-1.5776e+00, -7.8559e+00, -1.2252e+00,  ..., -1.7063e+00,
           9.8415e-01,  2.0721e+00],
         [ 2.1419e+00, -1.1065e+01, -1.0917e+00,  ..., -5.8802e-01,
          -3.2069e+00, -4.4548e+00],
         [ 2.0169e-01,  1.6636e+00, -1.0134e+00,  ..., -6.3104e-01,
           4.3270e+00, -1.5783e+00]]], grad_fn=<MulBackward0>)

I'm using transformers==4.55.4 and optimuml-intel==1.25.2.

See UKPLab/sentence-transformers#3515 for more details. This is affecting an attempt to convert a Sentence Transformer model to OpenVINO.

cc @santhoshtr

  • Tom Aarsen

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions