ci(test): use vLLM for embeddings in CI by nathan-weinberg · Pull Request #193 · opendatahub-io/llama-stack-distribution

nathan-weinberg · 2026-01-09T16:50:33Z

What does this PR do?

we have up until now used sentence-transformers
for embeddings within our CI environment

this commit migrates this to a vLLM container as
this is our primary targeted usecase

Closes #171

Test Plan

CI should pass if all is well

Summary by CodeRabbit

Chores
- Split inference and embedding into separate vLLM containers with distinct startup and health checks for clearer isolation.
- Added VLLM_EMBEDDING_URL and updated embedding model reference to use the vllm-embedding path.
- Updated startup/cleanup steps, logs, and labels to reference vLLM inference and embedding containers separately.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-09T16:50:46Z

📝 Walkthrough

Walkthrough

Split single vLLM setup into two containers: vllm-inference and vllm-embedding, add CI env var VLLM_EMBEDDING_URL, update workflow logging/cleanup, and adjust smoke test environment to use the new embedding service endpoint.

Changes

Cohort / File(s)	Summary
GitHub Actions setup `/.github/actions/setup-vllm/action.yml`	Rename action metadata to "vLLM"; replace single `vllm` run with parallel `vllm-inference` and `vllm-embedding` docker runs; add readiness loops and update related log messages and container names/ports (8000 inference, 8001 embeddings).
Workflow configuration `/.github/workflows/redhat-distro-container.yml`	Change `EMBEDDING_MODEL` to `vllm-embedding/...`; add `VLLM_EMBEDDING_URL=http://localhost:8001/v1` to job env; update log collection and cleanup to reference `vllm-inference` and `vllm-embedding`.
Tests / Smoke script `tests/smoke.sh`	Remove sentence-transformers envs (`ENABLE_SENTENCE_TRANSFORMERS`, `EMBEDDING_PROVIDER`); add `VLLM_EMBEDDING_URL` into docker run envs so tests target the new embedding container.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant GH as GitHub Actions
participant Docker as Docker Engine
participant Inf as vllm-inference (container)
participant Emb as vllm-embedding (container)
participant Test as smoke tests

GH->>Docker: run `vllm-inference` (expose :8000 -> /v1)
Docker-->>Inf: container starts
GH->>Docker: run `vllm-embedding` (expose :8001 -> /v1)
Docker-->>Emb: container starts
GH->>Inf: readiness probe GET /health
GH->>Emb: readiness probe GET /health
GH->>Test: trigger smoke tests (VLLM_EMBEDDING_URL=http://localhost:8001/v1)
Test->>Inf: send inference requests
Test->>Emb: send embedding requests

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 A hop to ports eight-thousand and one,

Two tubs of vLLM basking in the sun,
One for answers, one for embedding art,
Tests dance lively — each plays its part,
I twitch my nose and bless this clever start.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'ci(test): use vLLM for embeddings in CI' clearly and concisely summarizes the main change: migrating CI to use vLLM containers for embeddings instead of sentence-transformers.
Linked Issues check	✅ Passed	The PR successfully implements all coding requirements from issue #171: updates setup-vllm action to start a vLLM embeddings container, configures the workflow to use vLLM for embeddings, and removes sentence-transformers configuration.
Out of Scope Changes check	✅ Passed	All changes are directly aligned with issue #171 objectives: vLLM container setup, embeddings configuration, and removal of sentence-transformers dependencies. No unrelated modifications detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In @.github/actions/setup-vllm/action.yml:
- Around line 30-47: Update the "Start vLLM container for embeddings" step to
use the organization image (replace quay.io/nweinber/... with your org registry
like quay.io/<org>/vllm-cpu:ac9f933-granite125m), add explicit model flags to
mirror the inference container (include --model /root/.cache/Qwen3-0.6B and
--served-model-name Qwen/Qwen3-0.6B), and remove the overly broad
--privileged=true flag (optionally replace with a minimal capability such as
--cap-add SYS_NICE if needed); also apply the same registry change for the
inference container image referenced elsewhere.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b8bf85c and d1a9ede.

📒 Files selected for processing (3)

.github/actions/setup-vllm/action.yml
.github/workflows/redhat-distro-container.yml
tests/smoke.sh

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: build-test-push (linux/amd64)
GitHub Check: Summary

🔇 Additional comments (6)

.github/actions/setup-vllm/action.yml (2)

1-2: LGTM: Capitalization fix for vLLM.

The action name and description now correctly use "vLLM" capitalization, improving consistency with the project's branding.

6-28: Container name update is properly implemented throughout the codebase.

All references to the vLLM containers have been correctly updated to use the new names vllm-inference and vllm-embedding across workflows, configurations, and related scripts. No stale references remain.

.github/workflows/redhat-distro-container.yml (3)

53-55: LGTM: Embedding configuration migrated to vLLM.

The embedding model source has been correctly updated from sentence-transformers/ to vllm-embedding/ prefix, and the new VLLM_EMBEDDING_URL points to port 8001, which aligns with the embedding container setup in the action file.

154-155: LGTM: Log collection updated for both containers.

The log collection step now correctly references both vllm-inference and vllm-embedding containers, ensuring debugging information is captured from both services.

191-191: LGTM: Cleanup properly handles both vLLM containers.

The cleanup step now removes both vllm-inference and vllm-embedding containers, preventing resource leaks in the CI environment.

tests/smoke.sh (1)

22-22: VLLM_EMBEDDING_URL is properly defined in the workflow.

The variable is set at the job-level environment in .github/workflows/redhat-distro-container.yml (line 55) with value http://localhost:8001/v1, making it available to the smoke test step that executes ./tests/smoke.sh (line 138).

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In @.github/actions/setup-vllm/action.yml:
- Around line 32-47: The docker run command block has a trailing backslash after
the --port 8001 argument which is followed by a blank line and a comment,
causing a bash line-continuation syntax error; remove the trailing backslash
after "--port 8001" (or move the comment onto the same continued line) so the
docker run invocation is properly terminated before the health-check echo and
timeout lines in the GitHub Actions run script.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d1a9ede and 5590417.

📒 Files selected for processing (3)

.github/actions/setup-vllm/action.yml
.github/workflows/redhat-distro-container.yml
tests/smoke.sh

🚧 Files skipped from review as they are similar to previous changes (2)

tests/smoke.sh
.github/workflows/redhat-distro-container.yml

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: build-test-push (linux/amd64)
GitHub Check: Summary

🔇 Additional comments (4)

.github/actions/setup-vllm/action.yml (4)

1-2: LGTM! Correct capitalization.

The metadata correctly uses "vLLM" capitalization, which aligns with the project's branding.

6-28: LGTM! Clear naming for multi-container setup.

The updates properly distinguish the inference container from the new embeddings container with explicit naming and improved log messages.

38-38: Verify the use of a personal namespace for the container image.

The container image quay.io/nweinber/vllm-cpu:ac9f933-granite125m appears to use a personal Quay.io namespace rather than an organizational registry. This could impact:

Long-term availability and maintenance

Trust and security for production CI

Consistency with the inference container which uses quay.io/higginsd/...

Consider moving this image to an official organizational registry (e.g., under quay.io/opendatahub-io/) for production use.

34-40: Verify that the model is configured within the container image, as no model flags are present in the docker run command.

The embedding container's docker run command lacks the model configuration flags present in the inference container (--model, --served-model-name, --max-model-len). The inference container explicitly specifies these, while the embedding container only sets --host and --port.

If ibm-granite/granite-embedding-125m-english is baked into the image quay.io/nweinber/vllm-cpu:ac9f933-granite125m as a default, this is acceptable. Otherwise, the container will fail to start. Please confirm this is intentional.

.github/actions/setup-vllm/action.yml

we have up until now used sentence-transformers for embeddings within our CI environment this commit migrates this to a vLLM container as this is our primary targeted usecase Signed-off-by: Nathan Weinberg <nweinber@redhat.com>

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In @.github/actions/setup-vllm/action.yml:
- Around line 30-47: The workflow step named "Start vLLM container for
embeddings" uses a personal quay image tag
"quay.io/nweinber/vllm-cpu:ac9f933-granite125m" which may be private; replace
that image with an official or organizational image (e.g., "quay.io/modh/vllm"
or your org's published vllm image) and update any other occurrences (the
inference step referencing "quay.io/higginsd/vllm-cpu") to the same vetted
registry; ensure the run step still passes the same CLI flags (--host, --port)
and add a brief comment in the action.yml explaining why the org/official image
is required for CI reliability.

🧹 Nitpick comments (1)

.github/actions/setup-vllm/action.yml (1)

36-36: Consider removing --privileged flag if not required.

Both vLLM containers use --privileged=true, which grants extensive permissions. For CPU-only vLLM containers, this flag may not be necessary. Consider testing without it to reduce the security surface area.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5590417 and 55ca92e.

📒 Files selected for processing (3)

.github/actions/setup-vllm/action.yml
.github/workflows/redhat-distro-container.yml
tests/smoke.sh

🚧 Files skipped from review as they are similar to previous changes (1)

tests/smoke.sh

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: build-test-push (linux/amd64)
GitHub Check: Summary

🔇 Additional comments (5)

.github/actions/setup-vllm/action.yml (2)

1-2: LGTM! Correct capitalization.

The metadata updates correctly reflect the vLLM project's standard capitalization.

6-28: LGTM! Clear naming for multiple containers.

The inference container setup is correctly updated with descriptive naming that distinguishes it from the new embedding container.

.github/workflows/redhat-distro-container.yml (3)

153-154: LGTM! Proper log collection for both containers.

The log collection correctly captures output from both vLLM containers with descriptive filenames.

186-190: LGTM! Complete cleanup of all containers.

The cleanup step correctly removes both vLLM containers (vllm-inference and vllm-embedding) along with the other service containers. The container names match those defined in the setup action.

53-55: The environment variable changes are correctly integrated.

The provider name vllm-embedding is a recognized provider in llama-stack's embedding configuration (defined in distribution/config.yaml), and VLLM_EMBEDDING_URL is properly passed to the test suite in tests/smoke.sh (line 22). The configuration aligns with the distribution setup and correctly points the embedding service to port 8001.

.github/actions/setup-vllm/action.yml

skamenan7

LGTM. Thanks.

one question: any idea of the startup time like for the embedding container? wondering if we could lower the timeout or add some progress output so it's clear things are still working during the wait.

.github/actions/setup-vllm/action.yml

skamenan7 · 2026-01-09T21:05:49Z

Couple more observations after a second look:

Using images from individual namespaces(higginsd, nweinber) is brittle and can change/disappear unexpectedly. Can we use org-owned images and pin by digest so CI is reproducible?”
Running two vLLM instances (inference + embedding) on a single CI runner might be risky. vLLM can try to utilize all available CPU cores by default, so the two containers may fight each other and cause slow/flaky startup.
Suggestion: set something like OMP_NUM_THREADS=2 (or similar) on both containers (and/or Docker CPU limits) to constrain resource usage.

Env var wiring
Double checking: does the llama-stack container actually consume VLLM_EMBEDDING_URL yet? I see it added to smoke.sh, but if the stack config doesn’t map this env var to the embedding provider config, it could be ignored or fall back silently.
Fail fast on dead container
Re: the timeout loop, we can avoid the 15-minute hang by checking liveness inside the wait loop and dumping logs immediately:
if ! docker ps -q -f name=vllm-inference | grep -q .; then
echo "vllm-inference container died unexpectedly"
docker logs vllm-inference
exit 1
fi
(repeat similarly for vllm-embedding)

nathan-weinberg · 2026-01-09T22:40:01Z

LGTM. Thanks.

But it isn't working in its present state?

.github/actions/setup-vllm/action.yml

mergify · 2026-01-16T18:24:57Z

This pull request has merge conflicts that must be resolved before it can be merged. @nathan-weinberg please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

github-actions · 2026-03-18T00:28:38Z

This pull request has been automatically marked as stale because it has not had activity within 60 days. It will be automatically closed if no further activity occurs within 30 days.

nathan-weinberg requested review from derekhiggins and skamenan7 January 9, 2026 16:50

nathan-weinberg requested review from Artemon-line and kami619 as code owners January 9, 2026 16:50

nathan-weinberg force-pushed the vllm-embed branch from c9f293d to d1a9ede Compare January 9, 2026 16:51

nathan-weinberg changed the title ~~enhance(test): use vLLM for embeddings in CI~~ ci(test): use vLLM for embeddings in CI Jan 9, 2026

coderabbitai bot reviewed Jan 9, 2026

View reviewed changes

nathan-weinberg force-pushed the vllm-embed branch from d1a9ede to 5590417 Compare January 9, 2026 17:15

coderabbitai bot reviewed Jan 9, 2026

View reviewed changes

.github/actions/setup-vllm/action.yml Show resolved Hide resolved

ci(test): use vLLM for embeddings in CI

55ca92e

we have up until now used sentence-transformers for embeddings within our CI environment this commit migrates this to a vLLM container as this is our primary targeted usecase Signed-off-by: Nathan Weinberg <nweinber@redhat.com>

nathan-weinberg force-pushed the vllm-embed branch from 5590417 to 55ca92e Compare January 9, 2026 17:42

coderabbitai bot reviewed Jan 9, 2026

View reviewed changes

.github/actions/setup-vllm/action.yml Show resolved Hide resolved

nathan-weinberg added the do-not-merge Apply to PRs that should not be merged (yet) label Jan 9, 2026

skamenan7 approved these changes Jan 9, 2026

View reviewed changes

.github/actions/setup-vllm/action.yml Show resolved Hide resolved

.github/actions/setup-vllm/action.yml Show resolved Hide resolved

.github/actions/setup-vllm/action.yml Show resolved Hide resolved

derekhiggins reviewed Jan 12, 2026

View reviewed changes

.github/actions/setup-vllm/action.yml Show resolved Hide resolved

nathan-weinberg mentioned this pull request Jan 15, 2026

feat: add Containerfile for building vllm CPU images #203

Open

mergify bot added the needs-rebase label Jan 16, 2026

github-actions bot added the stale label Mar 18, 2026

Conversation

nathan-weinberg commented Jan 9, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Test Plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

skamenan7 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

skamenan7 commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nathan-weinberg commented Jan 9, 2026

Uh oh!

Uh oh!

mergify bot commented Jan 16, 2026

Uh oh!

github-actions bot commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nathan-weinberg commented Jan 9, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 9, 2026 •

edited

Loading

skamenan7 commented Jan 9, 2026 •

edited

Loading