Release OpenVINO Model Server 2026.1 · openvinotoolkit/model_server

Enhanced support for Qwen3-MOE models and gpt-oss-20b

They deliver now improved performance, accuracy, and robust concurrent request handling with continuous batching capabilities. These models are now available in pre-optimized OpenVINO™ format directly on the Hugging Face hub, making it very easy to deploy them. Check the demos how to used them

Integration with agentic framework
Integration with Visual Studio Code
Integration with OpenWebUI

Added support for Qwen3-VL

This models family gives function calling capabilities, enabling this vision language model in agentic scenarios. Use examples are included in the demos mentioned above.

Extended `/image` endpoint to support inpainting and outpainting capabilities.

It is now possible to pass the input image along with a mask to edit parts of the image or to extend the input image.
Check how to use those capabilities in the image generation demo

Other improvements and fixes:

Server logs now report current KV cache allocation alongside current usage metrics. With dynamic cache size (default setting), allocation automatically scales during runtime based on the request’s concurrency and processed context length.
Generation request cancellation is now supported for NPU devices, where requests from disconnected clients will be cancelled.
The finish reason now returns tool_calls when the model generates a function call, in line with OpenAI API standards.
Corrected tokens usage reporting in the text generation last streaming event with NPU execution
Added extra streaming event right after the first token is generated, in line with OpenAI API. This will correct TTFT metric benchmarking using tools relying on streaming events.
Enhanced error handling for Hugging Face Hub model pulling/downloads includes retry and resume capabilities to address network connectivity issues with large model files. Download operations can now recover from previous network connectivity errors or be reported in logs when recovery is not possible.

You can use an OpenVINO Model Server public Docker images based on Ubuntu via the following command:

docker pull openvino/model_server:2026.1 - CPU device support with image based on Ubuntu 24.04
docker pull openvino/model_server:2026.1-gpu - GPU, NPU and CPU device support with image based on Ubuntu 24.04

or use provided binary packages. Only packages with suffix _python_on have support for python.

There is also additional distribution channel via https://storage.openvinotoolkit.org/repositories/openvino_model_server/packages/2026.1.0/

Check the instructions how to install the binary package. The prebuilt image is available also on RedHat Ecosystem Catalog

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenVINO Model Server 2026.1

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Enhanced support for Qwen3-MOE models and gpt-oss-20b

Added support for Qwen3-VL

Extended `/image` endpoint to support inpainting and outpainting capabilities.

Other improvements and fixes:

Uh oh!

OpenVINO Model Server 2026.1

Enhanced support for Qwen3-MOE models and gpt-oss-20b

Added support for Qwen3-VL

Extended /image endpoint to support inpainting and outpainting capabilities.

Other improvements and fixes:

Uh oh!

Extended `/image` endpoint to support inpainting and outpainting capabilities.