Add vLLM CPU inference support for docker compose setup by zahidulhaque · Pull Request #1967 · open-edge-platform/edge-ai-libraries

zahidulhaque · 2026-03-17T06:34:58Z

Description

Add support for vLLM as an alternative inference backend for the Video Search and Summarization application. This change allows users to run both VLM captioning and LLM summarization tasks using vLLM on CPU without requiring GPU resources or OpenVINO model server (OVMS) microservices.

Fixes # (issue)

Key Changes:

Created a new Docker Compose overlay file (compose.vllm.yaml) that configures vLLM CPU service with optimized settings for video processing
Added profile-based service management for clean isolation between inference backends (vlm, ovms, vllm)
Added environment variables (ENABLE_VLLM, VLLM_HOST, VLLM_ENDPOINT, etc.) for vLLM configuration
Updated setup.sh to handle vLLM backend selection and disable conflicting configurations
Enhanced model validation to check for specific OpenVINO artifact files (.xml and .bin) rather than directory existence
Updated documentation with vLLM deployment option and usage instructions
Made vlm-openvino-serving dependency optional in pipeline-manager to support profiles-based service selection

Benefits:

Enables CPU-only deployments without GPU or microservices overhead
Provides users with more deployment flexibility and options
Maintains backward compatibility with existing OVMS and VLM configurations
Leverages vLLM's performance optimizations for efficient CPU inference

Any Newly Introduced Dependencies

No new 3rd party dependencies introduced. The vLLM service uses a pre-built Docker image (public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:v0.13.0) that is already compiled and optimized. The solution leverages existing environment configurations and does not add any new library dependencies to the project.

How Has This Been Tested?

# Test vLLM deployment:
cd sample-applications/video-search-and-summarization
ENABLE_VLLM=true source setup.sh --summary
# Verify configuration:
ENABLE_VLLM=true source setup.sh --summary config

# Standard OVMS setup should still work
ENABLE_OVMS_LLM_SUMMARY=true source setup.sh --summary

# VLM-only setup should still work
source setup.sh --summary

# Cleanup:  This will call stop_containers

ENABLE_VLLM=true source setup.sh --clean-data

Checklist:

I agree to use the APACHE-2.0 license for my code changes.
I have not introduced any 3rd party components incompatible with APACHE-2.0.
I have not included any company confidential information, trade secret, password or security token.
I have performed a self-review of my code.

Signed-off-by: Zahidul Haque <zahidul.haque@intel.com>

bhardwaj-nakul · 2026-03-18T04:15:56Z

sample-applications/video-search-and-summarization/docker/compose.vllm.yaml

+  vllm-cpu-service:
+    profiles:
+      - vllm
+    image: ${VLLM_IMAGE:-public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:v0.13.0}


This tag works on IceLake device but does not work on Arrow lake device. Container always in restarting state. Tried latest tag v0.17.1 which works after removing - "--disable-log-requests"

bhardwaj-nakul · 2026-03-18T04:18:25Z

sample-applications/video-search-and-summarization/docs/user-guide/get-started.md

Should we also add note about vllm related params open to be overridden like VLLM_MAX_NUM_BATCHED_TOKENS, VLLM_BLOCK_SIZE etc. -> refer user to vllm docs for description on params

bhardwaj-nakul · 2026-03-18T04:28:53Z

sample-applications/video-search-and-summarization/docker/compose.vllm.yaml

+      - "${VLLM_HOST_PORT:-8200}:8000"
+    ipc: "host"
+    environment:
+      no_proxy: ${no_proxy},localhost


Missing minio from no_proxy

bhardwaj-nakul · 2026-03-18T04:30:01Z

sample-applications/video-search-and-summarization/docker/compose.vllm.yaml

+      VLLM_RPC_TIMEOUT: ${VLLM_RPC_TIMEOUT:-100000}
+      VLLM_ALLOW_LONG_MAX_MODEL_LEN: ${VLLM_ALLOW_LONG_MAX_MODEL_LEN:-1}
+      VLLM_ENGINE_ITERATION_TIMEOUT_S: ${VLLM_ENGINE_ITERATION_TIMEOUT_S:-120}
+      VLLM_CPU_NUM_OF_RESERVED_CPU: ${VLLM_CPU_NUM_OF_RESERVED_CPU:-0}


add VLLM_LOGGING_LEVEL param to be overriden by user for debug logs

bhardwaj-nakul · 2026-03-18T06:18:29Z

sample-applications/video-search-and-summarization/docker/compose.vllm.yaml

+      - "${VLLM_MAX_NUM_BATCHED_TOKENS:-2048}"
+      - "--max-num-seqs"
+      - "${VLLM_MAX_NUM_SEQS:-256}"
+      - "--disable-log-requests"


add override for max_model_len as default 4096 token context length will block summary of summaries

bhardwaj-nakul

export VLM_MODEL_NAME="Qwen/Qwen2.5-VL-3B-Instruct" is not working for vllm on xeon or arrow lake device.
Need to also validate against export VLM_MODEL_NAME="microsoft/Phi-3.5-vision-instruct" - works on arrow lake but not on IceLake

zahidulhaque added 10 commits March 10, 2026 14:53

Enable vLLM as backend inference engine

ee47c57

Signed-off-by: Zahidul Haque <zahidul.haque@intel.com>

Merge branch 'open-edge-platform:main' into main

ae117f9

add vLLM configuration options and deployment instructions to Helm guide

023ecda

Signed-off-by: Zahidul Haque <zahidul.haque@intel.com>

Update the Prerequisites section

bff8113

Signed-off-by: Zahidul Haque <zahidul.haque@intel.com>

Merge branch 'main' into main

d8ec8b8

Remove unwanted overrides and update vLLM helm chart template file

538e3c6

Signed-off-by: Zahidul Haque <zahidul.haque@intel.com>

Remove unwanted overrides and update vLLM helm chart template file

476b070

Signed-off-by: Zahidul Haque <zahidul.haque@intel.com>

Fix vLLM chart helm dependency issue

0ec8c80

Signed-off-by: Zahidul Haque <zahidul.haque@intel.com>

Merge branch 'open-edge-platform:main' into main

f5a2914

Add docker compose setup for vLLM backend

c49d6e8

Signed-off-by: Zahidul Haque <zahidul.haque@intel.com>

zahidulhaque requested review from a team, bharagha, bhardwaj-nakul, krish918 and yogeshmpandey as code owners March 17, 2026 06:34

bhardwaj-nakul reviewed Mar 18, 2026

View reviewed changes

bhardwaj-nakul requested changes Mar 18, 2026

View reviewed changes

Merge branch 'main' into feat/vllm-compose-support

b5b699d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add vLLM CPU inference support for docker compose setup#1967

Add vLLM CPU inference support for docker compose setup#1967
zahidulhaque wants to merge 11 commits intoopen-edge-platform:mainfrom
zahidulhaque:feat/vllm-compose-support

zahidulhaque commented Mar 17, 2026

Uh oh!

bhardwaj-nakul Mar 18, 2026

Uh oh!

bhardwaj-nakul Mar 18, 2026

Uh oh!

bhardwaj-nakul Mar 18, 2026

Uh oh!

bhardwaj-nakul Mar 18, 2026

Uh oh!

bhardwaj-nakul Mar 18, 2026

Uh oh!

bhardwaj-nakul left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zahidulhaque commented Mar 17, 2026

Description

Any Newly Introduced Dependencies

How Has This Been Tested?

Checklist:

Uh oh!

bhardwaj-nakul Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

bhardwaj-nakul Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

bhardwaj-nakul Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

bhardwaj-nakul Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

bhardwaj-nakul Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

bhardwaj-nakul left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bhardwaj-nakul left a comment •

edited

Loading