Skip to content

Conversation

rkazants
Copy link
Collaborator

@rkazants rkazants commented Sep 29, 2025

What does this PR do?

Command to export the model:

optimum-cli export openvino -m openbmb/MiniCPM-o-2_6 MiniCPM-o-2_6 --task=image-text-to-text --trust-remote-code

Example of inference:

from optimum.intel.openvino import OVModelForVisualCausalLM
from transformers import AutoProcessor
from PIL import Image
import requests

model_id="openbmb/MiniCPM-o-2_6"

processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
prompt= "<|im_start|>user\n(<image>./</image>)\nWhat is in the image?<|im_end|>\n<|im_start|>assistant\n"
image = Image.open(requests.get("https://github.com/openvinotoolkit/openvino_notebooks/assets/29454499/d5fbbd1a-d484-415c-88cb-9986625b7b11", stream=True).raw).convert('RGB')

model = OVModelForVisualCausalLM.from_pretrained(model_id, trust_remote_code=True)

inputs = processor([prompt], [image], return_tensors="pt")

result  = model.generate(**inputs, max_new_tokens=20)

print(processor.tokenizer.batch_decode(result[:, inputs["input_ids"].shape[1]:]))

Before submitting

  • [N/A] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@rkazants rkazants requested a review from echarlaix October 2, 2025 11:43
Copy link
Member

@IlyasMoutawwakil IlyasMoutawwakil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ! left a question and a nit suggestion
thanks for the addition !

"minicpm": "katuni4ka/tiny-random-minicpm",
"minicpm3": "katuni4ka/tiny-random-minicpm3",
"minicpmv": "katuni4ka/tiny-random-minicpmv-2_6",
"minicpmo": "rkazants/tiny-random-MiniCPM-o-2_6",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this model will slow down our ci greatly, it is 400MB 🫨
https://huggingface.co/rkazants/tiny-random-MiniCPM-o-2_6/tree/main

Copy link
Collaborator Author

@rkazants rkazants Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a minimal size I managed to receive. minicpmv is about ~300MB and it is tested: https://huggingface.co/katuni4ka/tiny-random-minicpmv-2_6/tree/main

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be reduced as well

Copy link
Collaborator Author

@rkazants rkazants Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reduced to 144MB. Minimal hidden_size is 128 for llm part: https://huggingface.co/rkazants/tiny-random-MiniCPM-o-2_6/blob/main/modeling_minicpmo.py#L209
That also impacts apm and tts module size.

@IlyasMoutawwakil, @echarlaix, I propose to do further reduction in further PR(s) if any ideas. Now my other colleagues anticipate this PR merge, let us not block PR merge due to tiny model size. We know that the implemented logic are passing the tests in GHA.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

completely agree with @IlyasMoutawwakil comment, we should be super careful with our tiny random models size to not slow down the ci, could you extend on the different models parameters constraint @rkazants https://huggingface.co/rkazants/tiny-random-MiniCPM-o-2_6/blob/main/config.json#L20 for example I see d_model / decoder_ffn_dim / encoder_ffn_dim respectively set to 1024, 1024 and 4096

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also if the PR really needs to be merged asap I'm ok with keeping this model but would like to have a following PR to change it to a smaller model or if that cannnot be done due to modeling constraint then would like to have more information on what are the constraints / why it cannot be done, would that sound reasonable @rkazants ?

Copy link
Collaborator Author

@rkazants rkazants Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline with @echarlaix to proceed with the merge.
I will take this AR for further optimization. Indeed, there is a room for optimization such as d_model, encoder_ffn_dim but it will take some time because varying these parameters values needs to adjust several parameters from other modalities. It requires a bit deeper model understanding.
Thanks!

@echarlaix echarlaix merged commit 82a9ed7 into huggingface:main Oct 7, 2025
33 of 37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants