Skip to content

[Bug]: qwen3-30b-a3b on ovms, works on CPU, crashs with out of memory on iGPU #34187

@ikirsh

Description

@ikirsh

OpenVINO Version

2025.4.1

Operating System

Other (Please specify in description)

Device used for inference

GPU

Framework

None

Model used

qwen3-30b-a3b

Issue description

On Intel Ultra 7 265K with 32GB RAM + 128GB swap file, Kbuntu 25.10.

The model works well on the CPU with about 16GB RAM usage. When running on the iGPU, memory usage goes up until it fills the entire 32GB and then everything is killed.

The buffer length was reduced to 2048 but it didn't help.

The model is on the list of "AI Models verified for OpenVINO".

Step-by-step reproduction

  1. Exported from optimum-cli

  2. On CPU it worked:
    LD_PRELOAD="/opt/openvino/ovms/lib/libopenvino_tokenizers.so
    /opt/openvino/ovms/lib/libopenvino_genai.so"
    /opt/openvino/ovms/bin/ovms
    --model_repository_path /opt/openvino/models
    --model_name Qwen3-30B-int4
    --task text_generation
    --port 9001
    --rest_port 8000
    --target_device CPU

  3. On GPU it crashed:
    LD_PRELOAD="/opt/openvino/ovms/lib/libopenvino_tokenizers.so
    /opt/openvino/ovms/lib/libopenvino_genai.so"
    /opt/openvino/ovms/bin/ovms
    --model_repository_path /opt/openvino/models
    --model_name Qwen3-30B-int4
    --task text_generation
    --port 9001
    --rest_port 8000
    --target_device GPU

Relevant log output

Issue submission checklist

  • I'm reporting an issue. It's not a question.
  • I checked the problem with the documentation, FAQ, open issues, Stack Overflow, etc., and have not found a solution.
  • There is reproducer code and related data files such as images, videos, models, etc.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions