Release OpenVINO Model Server 2025.4.1 · openvinotoolkit/model_server

2025.4.1 is a minor release with bug fixes and improvements based on OpenVINO 2025.4.1.

Preview:

Added preview support for GPT-OSS agentic use case.
As of 2025.4.1, the best accuracy setting is achieved with:

--pipeline_type LM (without continuous batching and concurrency)
--target_device GPU (this configuration was validated on Lunar Lake, Arrow Lake-H, and Intel Arc Battlemage dGPU with >=16 GB VRAM)
It is also required to use INT4 precision.

Bug fixes:

Fixed escaping for whitespace characters in string arguments for qwen3coder tool-call parser.
Changed requests handling to chat/completions endpoint with streaming and usage tracking to LLM pipelines without continuous batching. Such pipelines do not track generated tokens. So far the last chunk wasn't delivered to the client which could result in a missing token in the response. Now the last chunk is delivered with token usage set as 0 which should be ignored.
Minor documentation and demos fixes

You can use an OpenVINO Model Server public Docker images based on Ubuntu via the following command:

docker pull openvino/model_server:2025.4.1- CPU device support with image based on Ubuntu 24.04
docker pull openvino/model_server:2025.4.1-gpu - GPU, NPU and CPU device support with image based on Ubuntu 24.04
or use provided binary packages. Only packages with sufffix _python_on have support for python.

There is also additional distribution channel via https://storage.openvinotoolkit.org/repositories/openvino_model_server/packages/2025.4.1/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenVINO Model Server 2025.4.1

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Preview:

Bug fixes:

Uh oh!