ci(test): use vLLM for embeddings in CI#193
ci(test): use vLLM for embeddings in CI#193nathan-weinberg wants to merge 1 commit intoopendatahub-io:mainfrom
Conversation
📝 WalkthroughWalkthroughSplit single vLLM setup into two containers: Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
c9f293d to
d1a9ede
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @.github/actions/setup-vllm/action.yml:
- Around line 30-47: Update the "Start vLLM container for embeddings" step to
use the organization image (replace quay.io/nweinber/... with your org registry
like quay.io/<org>/vllm-cpu:ac9f933-granite125m), add explicit model flags to
mirror the inference container (include --model /root/.cache/Qwen3-0.6B and
--served-model-name Qwen/Qwen3-0.6B), and remove the overly broad
--privileged=true flag (optionally replace with a minimal capability such as
--cap-add SYS_NICE if needed); also apply the same registry change for the
inference container image referenced elsewhere.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
.github/actions/setup-vllm/action.yml.github/workflows/redhat-distro-container.ymltests/smoke.sh
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build-test-push (linux/amd64)
- GitHub Check: Summary
🔇 Additional comments (6)
.github/actions/setup-vllm/action.yml (2)
1-2: LGTM: Capitalization fix for vLLM.The action name and description now correctly use "vLLM" capitalization, improving consistency with the project's branding.
6-28: Container name update is properly implemented throughout the codebase.All references to the vLLM containers have been correctly updated to use the new names
vllm-inferenceandvllm-embeddingacross workflows, configurations, and related scripts. No stale references remain..github/workflows/redhat-distro-container.yml (3)
53-55: LGTM: Embedding configuration migrated to vLLM.The embedding model source has been correctly updated from
sentence-transformers/tovllm-embedding/prefix, and the newVLLM_EMBEDDING_URLpoints to port 8001, which aligns with the embedding container setup in the action file.
154-155: LGTM: Log collection updated for both containers.The log collection step now correctly references both
vllm-inferenceandvllm-embeddingcontainers, ensuring debugging information is captured from both services.
191-191: LGTM: Cleanup properly handles both vLLM containers.The cleanup step now removes both
vllm-inferenceandvllm-embeddingcontainers, preventing resource leaks in the CI environment.tests/smoke.sh (1)
22-22: VLLM_EMBEDDING_URL is properly defined in the workflow.The variable is set at the job-level environment in
.github/workflows/redhat-distro-container.yml(line 55) with valuehttp://localhost:8001/v1, making it available to the smoke test step that executes./tests/smoke.sh(line 138).
d1a9ede to
5590417
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @.github/actions/setup-vllm/action.yml:
- Around line 32-47: The docker run command block has a trailing backslash after
the --port 8001 argument which is followed by a blank line and a comment,
causing a bash line-continuation syntax error; remove the trailing backslash
after "--port 8001" (or move the comment onto the same continued line) so the
docker run invocation is properly terminated before the health-check echo and
timeout lines in the GitHub Actions run script.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
.github/actions/setup-vllm/action.yml.github/workflows/redhat-distro-container.ymltests/smoke.sh
🚧 Files skipped from review as they are similar to previous changes (2)
- tests/smoke.sh
- .github/workflows/redhat-distro-container.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build-test-push (linux/amd64)
- GitHub Check: Summary
🔇 Additional comments (4)
.github/actions/setup-vllm/action.yml (4)
1-2: LGTM! Correct capitalization.The metadata correctly uses "vLLM" capitalization, which aligns with the project's branding.
6-28: LGTM! Clear naming for multi-container setup.The updates properly distinguish the inference container from the new embeddings container with explicit naming and improved log messages.
38-38: Verify the use of a personal namespace for the container image.The container image
quay.io/nweinber/vllm-cpu:ac9f933-granite125mappears to use a personal Quay.io namespace rather than an organizational registry. This could impact:
- Long-term availability and maintenance
- Trust and security for production CI
- Consistency with the inference container which uses
quay.io/higginsd/...Consider moving this image to an official organizational registry (e.g., under
quay.io/opendatahub-io/) for production use.
34-40: Verify that the model is configured within the container image, as no model flags are present in the docker run command.The embedding container's docker run command lacks the model configuration flags present in the inference container (
--model,--served-model-name,--max-model-len). The inference container explicitly specifies these, while the embedding container only sets--hostand--port.If
ibm-granite/granite-embedding-125m-englishis baked into the imagequay.io/nweinber/vllm-cpu:ac9f933-granite125mas a default, this is acceptable. Otherwise, the container will fail to start. Please confirm this is intentional.
we have up until now used sentence-transformers for embeddings within our CI environment this commit migrates this to a vLLM container as this is our primary targeted usecase Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
5590417 to
55ca92e
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @.github/actions/setup-vllm/action.yml:
- Around line 30-47: The workflow step named "Start vLLM container for
embeddings" uses a personal quay image tag
"quay.io/nweinber/vllm-cpu:ac9f933-granite125m" which may be private; replace
that image with an official or organizational image (e.g., "quay.io/modh/vllm"
or your org's published vllm image) and update any other occurrences (the
inference step referencing "quay.io/higginsd/vllm-cpu") to the same vetted
registry; ensure the run step still passes the same CLI flags (--host, --port)
and add a brief comment in the action.yml explaining why the org/official image
is required for CI reliability.
🧹 Nitpick comments (1)
.github/actions/setup-vllm/action.yml (1)
36-36: Consider removing--privilegedflag if not required.Both vLLM containers use
--privileged=true, which grants extensive permissions. For CPU-only vLLM containers, this flag may not be necessary. Consider testing without it to reduce the security surface area.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
.github/actions/setup-vllm/action.yml.github/workflows/redhat-distro-container.ymltests/smoke.sh
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/smoke.sh
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build-test-push (linux/amd64)
- GitHub Check: Summary
🔇 Additional comments (5)
.github/actions/setup-vllm/action.yml (2)
1-2: LGTM! Correct capitalization.The metadata updates correctly reflect the vLLM project's standard capitalization.
6-28: LGTM! Clear naming for multiple containers.The inference container setup is correctly updated with descriptive naming that distinguishes it from the new embedding container.
.github/workflows/redhat-distro-container.yml (3)
153-154: LGTM! Proper log collection for both containers.The log collection correctly captures output from both vLLM containers with descriptive filenames.
186-190: LGTM! Complete cleanup of all containers.The cleanup step correctly removes both vLLM containers (
vllm-inferenceandvllm-embedding) along with the other service containers. The container names match those defined in the setup action.
53-55: The environment variable changes are correctly integrated.The provider name
vllm-embeddingis a recognized provider in llama-stack's embedding configuration (defined indistribution/config.yaml), andVLLM_EMBEDDING_URLis properly passed to the test suite intests/smoke.sh(line 22). The configuration aligns with the distribution setup and correctly points the embedding service to port 8001.
skamenan7
left a comment
There was a problem hiding this comment.
LGTM. Thanks.
one question: any idea of the startup time like for the embedding container? wondering if we could lower the timeout or add some progress output so it's clear things are still working during the wait.
|
Couple more observations after a second look:
|
But it isn't working in its present state? |
|
This pull request has merge conflicts that must be resolved before it can be merged. @nathan-weinberg please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork |
|
This pull request has been automatically marked as stale because it has not had activity within 60 days. It will be automatically closed if no further activity occurs within 30 days. |
What does this PR do?
we have up until now used sentence-transformers
for embeddings within our CI environment
this commit migrates this to a vLLM container as
this is our primary targeted usecase
Closes #171
Test Plan
CI should pass if all is well
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.