[OpenVINO] Support Zamba2 by OpenVINO #1354

rkazants · 2025-06-20T08:37:14Z

What does this PR do?

Support Zyphra/Zamba2-1.2B-Instruct-v2

from transformers import AutoTokenizer
from optimum.intel.openvino import OVModelForCausalLM

# Load tokenizer and OpenVINO model
model_dir = "Zyphra/Zamba2-1.2B-Instruct-v2"
tokenizer = AutoTokenizer.from_pretrained(model_dir)
model = OVModelForCausalLM.from_pretrained(model_dir)

# Prepare input
prompt = "What is the capital of France?"
inputs = tokenizer(prompt, return_tensors="pt")

# Run inference
output_ids = model.generate(**inputs, max_new_tokens=50)
output_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)

print(output_text)

Before submitting

[N/A] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Signed-off-by: Kazantsev, Roman <[email protected]>

HuggingFaceDocBuilderDev · 2025-06-20T08:42:27Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

echarlaix

Thanks for the addition @rkazants

optimum/exporters/openvino/model_configs.py

optimum/exporters/openvino/model_patcher.py

optimum/exporters/openvino/model_configs.py

examples/neural_compressor/language-modeling/run_clm.py

examples/neural_compressor/question-answering/run_qa.py

examples/neural_compressor/question-answering/trainer_qa.py

examples/neural_compressor/question-answering/utils_qa.py

examples/neural_compressor/text-classification/intent-classification/run_clinc.py

examples/neural_compressor/text-classification/run_glue.py

examples/neural_compressor/text-classification/run_glue_post_training.py

examples/neural_compressor/text-generation/run_generation.py

examples/neural_compressor/text-to-image/run_diffusion_post_training.py

notebooks/ipex/langchain_hf_pipelines.ipynb

notebooks/ipex/text_generation.ipynb

notebooks/openvino/demos/quantized_generation_demo.ipynb

Signed-off-by: Kazantsev, Roman <[email protected]>

echarlaix

Thanks a lot for iterating @rkazants, looks like the tests are not passing would you mind taking a look ?

optimum/exporters/openvino/model_configs.py

echarlaix · 2025-08-05T14:16:36Z

optimum/exporters/openvino/model_configs.py

+            past_key_values = []
+            # generate tuples of (key, value, conv_state, ssm_state)
+            for i in range(self.num_hidden_layers):
+                kv_shape = (self.batch_size, self.num_attention_heads, 1, self.head_dim)


here why not

Suggested change

kv_shape = (self.batch_size, self.num_attention_heads, 1, self.head_dim)

kv_shape = (self.batch_size, self.num_attention_heads, self.sequence_length, self.head_dim)

echarlaix · 2025-08-05T14:17:34Z

optimum/exporters/openvino/model_patcher.py

+                value_cache = []
+                # inputs passed in an order of (key, value, conv_state, ssm_state)
+                for idx in range(num_hidden_layers):
+                    batch_size = past_key_values[4 * idx].size(0)


not very important but could be moved outside from loop

echarlaix · 2025-08-05T14:17:53Z

optimum/exporters/openvino/model_patcher.py

+                attention_mask=attention_mask,
+                position_ids=position_ids,
+                past_key_values=wrapped_cache_params,
+                # cache_position=cache_position,


echarlaix · 2025-08-05T14:35:34Z

optimum/exporters/openvino/model_patcher.py

+                ssm_states = []
+                key_cache = []
+                value_cache = []
+                # inputs passed in an order of (key, value, conv_state, ssm_state)


ok so length of past_key_values is always 4 * num_hidden_layers is this correct ? if yes would you mind adding ?

Signed-off-by: Kazantsev, Roman <[email protected]>

optimum/exporters/openvino/model_patcher.py

tests/openvino/test_export.py

tests/openvino/test_exporters_cli.py

Co-authored-by: Ella Charlaix <[email protected]>

tests/openvino/test_exporters_cli.py

optimum/exporters/openvino/model_patcher.py

Signed-off-by: Kazantsev, Roman <[email protected]>

IlyasMoutawwakil · 2025-10-13T09:37:52Z

@rkazants can you please rebase your branch, run styling and make sure the tests are passing 🙏

^X Revert "" This reverts commit b11d517.

…zamba2_ov

[OpenVINO] Support Zamba2 by OpenVINO

283403f

Signed-off-by: Kazantsev, Roman <[email protected]>

rkazants requested review from IlyasMoutawwakil and echarlaix June 20, 2025 08:38

echarlaix reviewed Jun 24, 2025

View reviewed changes

Merge remote-tracking branch 'upstream/main' into support_zamba2_ov

e6ef129