Skip to content

ci(test): use vLLM for embeddings in CI#193

Open
nathan-weinberg wants to merge 1 commit intoopendatahub-io:mainfrom
nathan-weinberg:vllm-embed
Open

ci(test): use vLLM for embeddings in CI#193
nathan-weinberg wants to merge 1 commit intoopendatahub-io:mainfrom
nathan-weinberg:vllm-embed

Conversation

@nathan-weinberg
Copy link
Collaborator

@nathan-weinberg nathan-weinberg commented Jan 9, 2026

What does this PR do?

we have up until now used sentence-transformers
for embeddings within our CI environment

this commit migrates this to a vLLM container as
this is our primary targeted usecase

Closes #171

Test Plan

CI should pass if all is well

Summary by CodeRabbit

  • Chores
    • Split inference and embedding into separate vLLM containers with distinct startup and health checks for clearer isolation.
    • Added VLLM_EMBEDDING_URL and updated embedding model reference to use the vllm-embedding path.
    • Updated startup/cleanup steps, logs, and labels to reference vLLM inference and embedding containers separately.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 9, 2026

📝 Walkthrough

Walkthrough

Split single vLLM setup into two containers: vllm-inference and vllm-embedding, add CI env var VLLM_EMBEDDING_URL, update workflow logging/cleanup, and adjust smoke test environment to use the new embedding service endpoint.

Changes

Cohort / File(s) Summary
GitHub Actions setup
/.github/actions/setup-vllm/action.yml
Rename action metadata to "vLLM"; replace single vllm run with parallel vllm-inference and vllm-embedding docker runs; add readiness loops and update related log messages and container names/ports (8000 inference, 8001 embeddings).
Workflow configuration
/.github/workflows/redhat-distro-container.yml
Change EMBEDDING_MODEL to vllm-embedding/...; add VLLM_EMBEDDING_URL=http://localhost:8001/v1 to job env; update log collection and cleanup to reference vllm-inference and vllm-embedding.
Tests / Smoke script
tests/smoke.sh
Remove sentence-transformers envs (ENABLE_SENTENCE_TRANSFORMERS, EMBEDDING_PROVIDER); add VLLM_EMBEDDING_URL into docker run envs so tests target the new embedding container.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant GH as GitHub Actions
participant Docker as Docker Engine
participant Inf as vllm-inference (container)
participant Emb as vllm-embedding (container)
participant Test as smoke tests

GH->>Docker: run `vllm-inference` (expose :8000 -> /v1)
Docker-->>Inf: container starts
GH->>Docker: run `vllm-embedding` (expose :8001 -> /v1)
Docker-->>Emb: container starts
GH->>Inf: readiness probe GET /health
GH->>Emb: readiness probe GET /health
GH->>Test: trigger smoke tests (VLLM_EMBEDDING_URL=http://localhost:8001/v1)
Test->>Inf: send inference requests
Test->>Emb: send embedding requests

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 A hop to ports eight-thousand and one,

Two tubs of vLLM basking in the sun,
One for answers, one for embedding art,
Tests dance lively — each plays its part,
I twitch my nose and bless this clever start.

🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'ci(test): use vLLM for embeddings in CI' clearly and concisely summarizes the main change: migrating CI to use vLLM containers for embeddings instead of sentence-transformers.
Linked Issues check ✅ Passed The PR successfully implements all coding requirements from issue #171: updates setup-vllm action to start a vLLM embeddings container, configures the workflow to use vLLM for embeddings, and removes sentence-transformers configuration.
Out of Scope Changes check ✅ Passed All changes are directly aligned with issue #171 objectives: vLLM container setup, embeddings configuration, and removal of sentence-transformers dependencies. No unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@nathan-weinberg nathan-weinberg changed the title enhance(test): use vLLM for embeddings in CI ci(test): use vLLM for embeddings in CI Jan 9, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @.github/actions/setup-vllm/action.yml:
- Around line 30-47: Update the "Start vLLM container for embeddings" step to
use the organization image (replace quay.io/nweinber/... with your org registry
like quay.io/<org>/vllm-cpu:ac9f933-granite125m), add explicit model flags to
mirror the inference container (include --model /root/.cache/Qwen3-0.6B and
--served-model-name Qwen/Qwen3-0.6B), and remove the overly broad
--privileged=true flag (optionally replace with a minimal capability such as
--cap-add SYS_NICE if needed); also apply the same registry change for the
inference container image referenced elsewhere.
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b8bf85c and d1a9ede.

📒 Files selected for processing (3)
  • .github/actions/setup-vllm/action.yml
  • .github/workflows/redhat-distro-container.yml
  • tests/smoke.sh
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-test-push (linux/amd64)
  • GitHub Check: Summary
🔇 Additional comments (6)
.github/actions/setup-vllm/action.yml (2)

1-2: LGTM: Capitalization fix for vLLM.

The action name and description now correctly use "vLLM" capitalization, improving consistency with the project's branding.


6-28: Container name update is properly implemented throughout the codebase.

All references to the vLLM containers have been correctly updated to use the new names vllm-inference and vllm-embedding across workflows, configurations, and related scripts. No stale references remain.

.github/workflows/redhat-distro-container.yml (3)

53-55: LGTM: Embedding configuration migrated to vLLM.

The embedding model source has been correctly updated from sentence-transformers/ to vllm-embedding/ prefix, and the new VLLM_EMBEDDING_URL points to port 8001, which aligns with the embedding container setup in the action file.


154-155: LGTM: Log collection updated for both containers.

The log collection step now correctly references both vllm-inference and vllm-embedding containers, ensuring debugging information is captured from both services.


191-191: LGTM: Cleanup properly handles both vLLM containers.

The cleanup step now removes both vllm-inference and vllm-embedding containers, preventing resource leaks in the CI environment.

tests/smoke.sh (1)

22-22: VLLM_EMBEDDING_URL is properly defined in the workflow.

The variable is set at the job-level environment in .github/workflows/redhat-distro-container.yml (line 55) with value http://localhost:8001/v1, making it available to the smoke test step that executes ./tests/smoke.sh (line 138).

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @.github/actions/setup-vllm/action.yml:
- Around line 32-47: The docker run command block has a trailing backslash after
the --port 8001 argument which is followed by a blank line and a comment,
causing a bash line-continuation syntax error; remove the trailing backslash
after "--port 8001" (or move the comment onto the same continued line) so the
docker run invocation is properly terminated before the health-check echo and
timeout lines in the GitHub Actions run script.
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d1a9ede and 5590417.

📒 Files selected for processing (3)
  • .github/actions/setup-vllm/action.yml
  • .github/workflows/redhat-distro-container.yml
  • tests/smoke.sh
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/smoke.sh
  • .github/workflows/redhat-distro-container.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-test-push (linux/amd64)
  • GitHub Check: Summary
🔇 Additional comments (4)
.github/actions/setup-vllm/action.yml (4)

1-2: LGTM! Correct capitalization.

The metadata correctly uses "vLLM" capitalization, which aligns with the project's branding.


6-28: LGTM! Clear naming for multi-container setup.

The updates properly distinguish the inference container from the new embeddings container with explicit naming and improved log messages.


38-38: Verify the use of a personal namespace for the container image.

The container image quay.io/nweinber/vllm-cpu:ac9f933-granite125m appears to use a personal Quay.io namespace rather than an organizational registry. This could impact:

  • Long-term availability and maintenance
  • Trust and security for production CI
  • Consistency with the inference container which uses quay.io/higginsd/...

Consider moving this image to an official organizational registry (e.g., under quay.io/opendatahub-io/) for production use.


34-40: Verify that the model is configured within the container image, as no model flags are present in the docker run command.

The embedding container's docker run command lacks the model configuration flags present in the inference container (--model, --served-model-name, --max-model-len). The inference container explicitly specifies these, while the embedding container only sets --host and --port.

If ibm-granite/granite-embedding-125m-english is baked into the image quay.io/nweinber/vllm-cpu:ac9f933-granite125m as a default, this is acceptable. Otherwise, the container will fail to start. Please confirm this is intentional.

we have up until now used sentence-transformers
for embeddings within our CI environment

this commit migrates this to a vLLM container as
this is our primary targeted usecase

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @.github/actions/setup-vllm/action.yml:
- Around line 30-47: The workflow step named "Start vLLM container for
embeddings" uses a personal quay image tag
"quay.io/nweinber/vllm-cpu:ac9f933-granite125m" which may be private; replace
that image with an official or organizational image (e.g., "quay.io/modh/vllm"
or your org's published vllm image) and update any other occurrences (the
inference step referencing "quay.io/higginsd/vllm-cpu") to the same vetted
registry; ensure the run step still passes the same CLI flags (--host, --port)
and add a brief comment in the action.yml explaining why the org/official image
is required for CI reliability.
🧹 Nitpick comments (1)
.github/actions/setup-vllm/action.yml (1)

36-36: Consider removing --privileged flag if not required.

Both vLLM containers use --privileged=true, which grants extensive permissions. For CPU-only vLLM containers, this flag may not be necessary. Consider testing without it to reduce the security surface area.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5590417 and 55ca92e.

📒 Files selected for processing (3)
  • .github/actions/setup-vllm/action.yml
  • .github/workflows/redhat-distro-container.yml
  • tests/smoke.sh
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/smoke.sh
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-test-push (linux/amd64)
  • GitHub Check: Summary
🔇 Additional comments (5)
.github/actions/setup-vllm/action.yml (2)

1-2: LGTM! Correct capitalization.

The metadata updates correctly reflect the vLLM project's standard capitalization.


6-28: LGTM! Clear naming for multiple containers.

The inference container setup is correctly updated with descriptive naming that distinguishes it from the new embedding container.

.github/workflows/redhat-distro-container.yml (3)

153-154: LGTM! Proper log collection for both containers.

The log collection correctly captures output from both vLLM containers with descriptive filenames.


186-190: LGTM! Complete cleanup of all containers.

The cleanup step correctly removes both vLLM containers (vllm-inference and vllm-embedding) along with the other service containers. The container names match those defined in the setup action.


53-55: The environment variable changes are correctly integrated.

The provider name vllm-embedding is a recognized provider in llama-stack's embedding configuration (defined in distribution/config.yaml), and VLLM_EMBEDDING_URL is properly passed to the test suite in tests/smoke.sh (line 22). The configuration aligns with the distribution setup and correctly points the embedding service to port 8001.

@nathan-weinberg nathan-weinberg added the do-not-merge Apply to PRs that should not be merged (yet) label Jan 9, 2026
Copy link
Collaborator

@skamenan7 skamenan7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks.

one question: any idea of the startup time like for the embedding container? wondering if we could lower the timeout or add some progress output so it's clear things are still working during the wait.

@skamenan7
Copy link
Collaborator

skamenan7 commented Jan 9, 2026

Couple more observations after a second look:

  1. Using images from individual namespaces(higginsd, nweinber) is brittle and can change/disappear unexpectedly. Can we use org-owned images and pin by digest so CI is reproducible?”

  2. Running two vLLM instances (inference + embedding) on a single CI runner might be risky. vLLM can try to utilize all available CPU cores by default, so the two containers may fight each other and cause slow/flaky startup.
    Suggestion: set something like OMP_NUM_THREADS=2 (or similar) on both containers (and/or Docker CPU limits) to constrain resource usage.

  1. Env var wiring
    Double checking: does the llama-stack container actually consume VLLM_EMBEDDING_URL yet? I see it added to smoke.sh, but if the stack config doesn’t map this env var to the embedding provider config, it could be ignored or fall back silently.

  2. Fail fast on dead container
    Re: the timeout loop, we can avoid the 15-minute hang by checking liveness inside the wait loop and dumping logs immediately:
    if ! docker ps -q -f name=vllm-inference | grep -q .; then
    echo "vllm-inference container died unexpectedly"
    docker logs vllm-inference
    exit 1
    fi
    (repeat similarly for vllm-embedding)

@nathan-weinberg
Copy link
Collaborator Author

LGTM. Thanks.

But it isn't working in its present state?

@mergify
Copy link
Contributor

mergify bot commented Jan 16, 2026

This pull request has merge conflicts that must be resolved before it can be merged. @nathan-weinberg please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jan 16, 2026
@github-actions
Copy link
Contributor

This pull request has been automatically marked as stale because it has not had activity within 60 days. It will be automatically closed if no further activity occurs within 30 days.

@github-actions github-actions bot added the stale label Mar 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge Apply to PRs that should not be merged (yet) needs-rebase stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use vLLM image for testing with embeddings

3 participants