-
Notifications
You must be signed in to change notification settings - Fork 3k
Description
OpenVINO Version
2025.4.1
Operating System
Other (Please specify in description)
Device used for inference
GPU
Framework
None
Model used
qwen3-30b-a3b
Issue description
On Intel Ultra 7 265K with 32GB RAM + 128GB swap file, Kbuntu 25.10.
The model works well on the CPU with about 16GB RAM usage. When running on the iGPU, memory usage goes up until it fills the entire 32GB and then everything is killed.
The buffer length was reduced to 2048 but it didn't help.
The model is on the list of "AI Models verified for OpenVINO".
Step-by-step reproduction
-
Exported from optimum-cli
-
On CPU it worked:
LD_PRELOAD="/opt/openvino/ovms/lib/libopenvino_tokenizers.so
/opt/openvino/ovms/lib/libopenvino_genai.so"
/opt/openvino/ovms/bin/ovms
--model_repository_path /opt/openvino/models
--model_name Qwen3-30B-int4
--task text_generation
--port 9001
--rest_port 8000
--target_device CPU -
On GPU it crashed:
LD_PRELOAD="/opt/openvino/ovms/lib/libopenvino_tokenizers.so
/opt/openvino/ovms/lib/libopenvino_genai.so"
/opt/openvino/ovms/bin/ovms
--model_repository_path /opt/openvino/models
--model_name Qwen3-30B-int4
--task text_generation
--port 9001
--rest_port 8000
--target_device GPU
Relevant log output
Issue submission checklist
- I'm reporting an issue. It's not a question.
- I checked the problem with the documentation, FAQ, open issues, Stack Overflow, etc., and have not found a solution.
- There is reproducer code and related data files such as images, videos, models, etc.