Skip to content

Commit 5203745

Browse files
authored
[release branch] Correct gpt export param in case export_model.py step is omitted (#3870)
If user skips manual export, the demo fall backs to model conversion via ovms.exe optimum-cli. In this scenario, it was missing int4 precision #3868
1 parent 7bc56cf commit 5203745

File tree

1 file changed

+2
-2
lines changed
  • demos/continuous_batching/agentic_ai

1 file changed

+2
-2
lines changed

demos/continuous_batching/agentic_ai/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -242,7 +242,7 @@ ovms.exe --rest_port 8000 --source_model Qwen/Qwen3-Coder-30B-A3B-Instruct --mod
242242
:::{tab-item} gpt-oss-20b
243243
:sync: gpt-oss-20b
244244
```bat
245-
ovms.exe --rest_port 8000 --source_model openai/gpt-oss-20b --model_repository_path models --tool_parser gptoss --reasoning_parser gptoss --target_device GPU --task text_generation --pipeline_type LM
245+
ovms.exe --rest_port 8000 --source_model openai/gpt-oss-20b --model_repository_path models --tool_parser gptoss --reasoning_parser gptoss --target_device GPU --task text_generation --pipeline_type LM --weight_format int4
246246
```
247247
> **Note:**: Use `--pipeline_type LM` for export and `--target device GPU` for deployment. Expect continuous batching and CPU support in weekly or 2026.0+ releases.
248248
:::
@@ -475,7 +475,7 @@ docker run -d --user $(id -u):$(id -g) --rm -p 8000:8000 -v $(pwd)/models:/model
475475
```bash
476476
docker run -d --user $(id -u):$(id -g) --rm -p 8000:8000 -v $(pwd)/models:/models --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) openvino/model_server:weekly \
477477
--rest_port 8000 --source_model openai/gpt-oss-20b --model_repository_path models \
478-
--tool_parser gptoss --reasoning_parser gptoss --target_device GPU --task text_generation --enable_prefix_caching true --pipeline_type LM
478+
--tool_parser gptoss --reasoning_parser gptoss --target_device GPU --task text_generation --enable_prefix_caching true --pipeline_type LM --weight_format int4
479479
```
480480
> **Note:**: Use `--pipeline_type LM` for export and `--target device GPU` for deployment. Expect continuous batching and CPU support in weekly or 2026.0+ releases.
481481
:::

0 commit comments

Comments
 (0)