Skip to content

OpenVINO Model Server 2026.1

Latest

Choose a tag to compare

@dtrawins dtrawins released this 07 Apr 15:28
· 6 commits to releases/2026/1 since this release
3e5cbf5

Enhanced support for Qwen3-MOE models and gpt-oss-20b

They deliver now improved performance, accuracy, and robust concurrent request handling with continuous batching capabilities. These models are now available in pre-optimized OpenVINO™ format directly on the Hugging Face hub, making it very easy to deploy them. Check the demos how to used them

Added support for Qwen3-VL

This models family gives function calling capabilities, enabling this vision language model in agentic scenarios. Use examples are included in the demos mentioned above.

Extended /image endpoint to support inpainting and outpainting capabilities.

It is now possible to pass the input image along with a mask to edit parts of the image or to extend the input image.
Check how to use those capabilities in the image generation demo

Other improvements and fixes:

  • Server logs now report current KV cache allocation alongside current usage metrics. With dynamic cache size (default setting), allocation automatically scales during runtime based on the request’s concurrency and processed context length.
  • Generation request cancellation is now supported for NPU devices, where requests from disconnected clients will be cancelled.
  • The finish reason now returns tool_calls when the model generates a function call, in line with OpenAI API standards.
  • Corrected tokens usage reporting in the text generation last streaming event with NPU execution
  • Added extra streaming event right after the first token is generated, in line with OpenAI API. This will correct TTFT metric benchmarking using tools relying on streaming events.
  • Enhanced error handling for Hugging Face Hub model pulling/downloads includes retry and resume capabilities to address network connectivity issues with large model files. Download operations can now recover from previous network connectivity errors or be reported in logs when recovery is not possible.

You can use an OpenVINO Model Server public Docker images based on Ubuntu via the following command:

  • docker pull openvino/model_server:2026.1 - CPU device support with image based on Ubuntu 24.04
  • docker pull openvino/model_server:2026.1-gpu - GPU, NPU and CPU device support with image based on Ubuntu 24.04

or use provided binary packages. Only packages with suffix _python_on have support for python.

There is also additional distribution channel via https://storage.openvinotoolkit.org/repositories/openvino_model_server/packages/2026.1.0/

Check the instructions how to install the binary package. The prebuilt image is available also on RedHat Ecosystem Catalog