feat: add Containerfile for building vllm CPU images#203
feat: add Containerfile for building vllm CPU images#203nathan-weinberg wants to merge 4 commits intoopendatahub-io:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a vLLM CPU Containerfile and README, a new GitHub Actions workflow to build/test/publish vLLM CPU images, parameterizes the setup-vllm action to accept Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. 📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip CodeRabbit can use TruffleHog to scan for secrets in your code with verification capabilities.Add a TruffleHog config file (e.g. trufflehog-config.yml, trufflehog.yml) to your project to customize detectors and scanning behavior. The tool runs only when a config file is present. |
5193fc8 to
5125766
Compare
5125766 to
f688455
Compare
There was a problem hiding this comment.
Actionable comments posted: 7
🤖 Fix all issues with AI agents
In @.github/workflows/vllm-cpu-container.yml:
- Around line 38-39: The workflow defines EMBEDDING_MODEL but the Containerfile
only accepts a single MODEL_NAME so the embedding model is never included;
either remove EMBEDDING_MODEL/EMBEDDING_TAG from the workflow and simplify the
image tag to only use INFERENCE_MODEL, or extend the Containerfile to accept a
second build arg (e.g., ARG EMBEDDING_MODEL) and update the image build step to
pass --build-arg EMBEDDING_MODEL=${{ github.event.inputs.embedding_model }} and
adjust the Containerfile logic that installs or copies models to handle both
MODEL_NAME and EMBEDDING_MODEL (and update the final image tag to reflect both
models if desired) so the workflow variables and the Containerfile are
consistent.
- Around line 66-77: The GitHub Actions build step "Build image" (id: build,
using docker/build-push-action) fails because the Containerfile requires the
MODEL_NAME build-arg; add MODEL_NAME to the step's with -> build-arg list so the
action passes MODEL_NAME (e.g., from the env: ${{ env.MODEL_NAME }}). Ensure the
added build-arg key matches "MODEL_NAME" exactly and is passed alongside
existing inputs (context, file, platforms, tags, etc.) so the Containerfile
validation no longer errors.
- Around line 79-84: The workflow step "Setup vllm for image test" sets
VLLM_IMAGE to `${{ env.IMAGE_NAME }}` which omits the tag and will default to
:latest, causing the action `./.github/actions/setup-vllm` to pull a
non-existent image; change VLLM_IMAGE to include the full tag by combining `${{
env.IMAGE_NAME }}` with `${{ env.INFERENCE_TAG }}` and `${{ env.EMBEDDING_TAG
}}` (e.g. `${{ env.IMAGE_NAME }}:${{ env.INFERENCE_TAG }}-${{ env.EMBEDDING_TAG
}}`) so the setup action uses the actual built image tag.
- Around line 95-106: The publish step for docker/build-push-action is missing
the MODEL_NAME build argument; update the publish job (the step with id
"publish" that uses docker/build-push-action@2634353...) to include build-args
with MODEL_NAME set to the workflow env (e.g., add a build-args entry mapping
MODEL_NAME to ${{ env.MODEL_NAME }} under the with: block) so the image is built
and tagged with the same model configuration as the build step.
In `@vllm/Containerfile`:
- Line 1: Add a short provenance section to vllm/README.md documenting the base
image public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo used in Containerfile: state
what the image contains (OS, preinstalled vLLM and dependencies), why this
specific AWS ECR account/image was chosen (e.g., builds provided by upstream or
verified maintainers, compatibility with our runtime), and how it maps to
official vLLM releases (include release tag or digest and link to upstream vLLM
release or repo). Also mention how to verify the image (digest/checksum or a
link to the ECR repo) and any trust/maintainer contact or audit steps for future
maintainers.
In `@vllm/README.md`:
- Around line 17-23: Update the example Docker build command in vllm/README.md
to use the vllm directory as the build context and explicitly reference its
Containerfile; replace the current `docker build .` invocation with a call that
passes `-f vllm/Containerfile` and `vllm` as the context so the gated model
example uses the same context/path fix as the other examples.
- Around line 7-11: The Docker build in the README uses '.' as the context but
doesn't point to the Containerfile in the vllm directory, which will fail from
the repo root; update the command in README.md to pass the correct context and
Dockerfile path by using the -f vllm/Containerfile flag and vllm as the build
context (or alternatively note to cd into the vllm directory), ensuring the
referenced Containerfile name "vllm/Containerfile" and the build args (e.g.,
MODEL_NAME) remain unchanged.
🧹 Nitpick comments (1)
.github/actions/setup-vllm/action.yml (1)
14-14: Quote the environment variable and add validation.The
$VLLM_IMAGEvariable should be quoted to prevent word splitting issues. Additionally, consider adding validation to fail early with a clear error message if the variable is not set.♻️ Suggested improvement
run: | # Start vllm container + if [ -z "${VLLM_IMAGE}" ]; then + echo "ERROR: VLLM_IMAGE environment variable is required" >&2 + exit 1 + fi docker run -d \ --name vllm \ --privileged=true \ --net=host \ - $VLLM_IMAGE \ + "${VLLM_IMAGE}" \ --host 0.0.0.0 \
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
.github/actions/setup-vllm/action.yml.github/workflows/redhat-distro-container.yml.github/workflows/vllm-cpu-container.ymlvllm/Containerfilevllm/README.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build-test-push (linux/amd64)
- GitHub Check: Summary
🔇 Additional comments (3)
.github/workflows/redhat-distro-container.yml (1)
127-128: Consider using the official organization image reference.The
VLLM_IMAGEpoints to a personal Quay.io account (higginsd). Once the newvllm-cpu-container.ymlworkflow publishes images toquay.io/opendatahub/vllm-cpu, this reference should be updated to use the official organization image for consistency and maintainability.Is this a temporary reference for initial testing? If so, consider adding a TODO comment or tracking issue.
vllm/Containerfile (2)
24-26: LGTM!The ENTRYPOINT pattern is well-designed: uses
execfor proper signal handling (PID 1), quotes the variable for safety, and correctly forwards additional arguments via the"$@"with"--"pattern.
16-22: No issue. Thehf downloadcommand is a documented shorthand alias forhuggingface-cli downloadin the HuggingFace Hub CLI, and the options used (--local-dir,--token) are officially supported. The command syntax is correct.Likely an incorrect or invalid review comment.
f688455 to
34b71dd
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In @.github/workflows/redhat-distro-container.yml:
- Around line 127-128: The workflow hardcodes VLLM_IMAGE to a personal quay
namespace (VLLM_IMAGE: quay.io/higginsd/vllm-cpu:65393ee064-qwen3); update the
workflow to stop using a personal image by either switching VLLM_IMAGE to the
official registry image (e.g., quay.io/opendatahub/vllm-cpu) or make VLLM_IMAGE
a configurable input/env/ repository variable (expose as a workflow input or use
a repo/organization secret or variable) so the image reference can be updated
without changing code.
In @.github/workflows/vllm-cpu-container.yml:
- Around line 68-79: The Docker build step (action with id "build" using
docker/build-push-action) is missing required build-args INFERENCE_MODEL and
EMBEDDING_MODEL; update the step to pass them via with.build-args (or separate
with entries) e.g. add build-args: INFERENCE_MODEL=${{ env.INFERENCE_MODEL }}
EMBEDDING_MODEL=${{ env.EMBEDDING_MODEL }} (or add individual keys build-arg:
INFERENCE_MODEL and EMBEDDING_MODEL) so the Containerfile validation receives
non-empty values during the build.
In `@vllm/Containerfile`:
- Around line 21-32: The Dockerfile loop that downloads models using
INFERENCE_MODEL and EMBEDDING_MODEL should avoid passing the token via the
--token argument; instead, read the secret from /run/secrets/hf_token and inject
it as the HF_TOKEN environment variable for the hf download invocation (for both
branches that currently call hf download) so the token is not exposed on the
command line — locate the RUN --mount block that references MODEL_CACHE_DIR,
/run/secrets/hf_token and the hf download calls and change them to set HF_TOKEN
from the secret in-line for the hf download command rather than using --token
"${HF_TOKEN}".
♻️ Duplicate comments (2)
vllm/Containerfile (1)
1-1: Document the base image source and provenance.The base image
public.ecr.aws/q9t5s3a7/vllm-cpu-release-repolacks documentation about its origin, contents, and relationship to official vLLM releases. Add provenance information tovllm/README.mdfor security transparency and maintainability..github/workflows/vllm-cpu-container.yml (1)
118-129: Critical: Missing build arguments in publish step.Same issue as the build step - the publish step also needs
INFERENCE_MODELandEMBEDDING_MODELbuild arguments for the Containerfile.🐛 Proposed fix
- name: Publish image to Quay.io id: publish if: contains(fromJSON('["push", "workflow_dispatch"]'), github.event_name) uses: docker/build-push-action@263435318d21b8e681c14492fe198d362a7d2c83 # v6.18.0 with: context: . file: vllm/Containerfile platforms: ${{ matrix.platform }} push: true tags: ${{ env.IMAGE_NAME }}:${{ env.INFERENCE_TAG }}-${{ env.EMBEDDING_TAG }} + build-args: | + INFERENCE_MODEL=${{ env.INFERENCE_MODEL }} + EMBEDDING_MODEL=${{ env.EMBEDDING_MODEL }} cache-from: type=gha cache-to: type=gha,mode=max
🧹 Nitpick comments (3)
vllm/Containerfile (1)
34-34: ENTRYPOINT lacks default arguments for model serving.The entrypoint is
vllm servewithout specifying which model to serve. The container user must provide the model path as a runtime argument. Consider documenting this requirement or providing a default CMD.💡 Suggested improvement
ENTRYPOINT ["vllm", "serve"] +CMD ["${MODEL_CACHE_DIR}/${INFERENCE_MODEL}"]Alternatively, document in README.md that users must specify the model path when running the container.
.github/workflows/vllm-cpu-container.yml (2)
50-55: Tag derivation logic may produce unexpected results for some model names.The shell parameter expansion
${VAR#*/}removes everything up to the first/, then${VAR%-*}removes everything after the last-. This works forQwen/Qwen3-0.6B→Qwen3, but may produce unexpected results for models with different naming patterns (e.g.,org/model-name-v2→model-name).Consider adding validation or using a more explicit transformation.
💡 Example of more robust tag generation
- name: Set image tag components run: | - INFERENCE_TEMP="${INFERENCE_MODEL#*/}" - EMBEDDING_TEMP="${EMBEDDING_MODEL#*/}" - echo "INFERENCE_TAG=${INFERENCE_TEMP%-*}" >> "$GITHUB_ENV" - echo "EMBEDDING_TAG=${EMBEDDING_TEMP%-*}" >> "$GITHUB_ENV" + # Extract model name, replace invalid chars for Docker tags + INFERENCE_TAG=$(echo "${INFERENCE_MODEL#*/}" | tr '/' '-' | tr '[:upper:]' '[:lower:]') + EMBEDDING_TAG=$(echo "${EMBEDDING_MODEL#*/}" | tr '/' '-' | tr '[:upper:]' '[:lower:]') + echo "INFERENCE_TAG=${INFERENCE_TAG}" >> "$GITHUB_ENV" + echo "EMBEDDING_TAG=${EMBEDDING_TAG}" >> "$GITHUB_ENV"
89-93: Cleanup steps may fail silently if container doesn't exist.The cleanup steps use
docker rm -fwhich will error if the container doesn't exist (e.g., if the setup step was skipped due toworkflow_dispatch). Consider adding|| trueto prevent workflow failure.💡 Proposed fix
- name: Cleanup vllm-inference if: always() shell: bash run: | - docker rm -f vllm-inference + docker rm -f vllm-inference || true - name: Cleanup vllm-embedding if: always() shell: bash run: | - docker rm -f vllm-embedding + docker rm -f vllm-embedding || trueAlso applies to: 103-107
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
.github/actions/setup-vllm/action.yml.github/workflows/redhat-distro-container.yml.github/workflows/vllm-cpu-container.ymlvllm/Containerfilevllm/README.md
🚧 Files skipped from review as they are similar to previous changes (2)
- vllm/README.md
- .github/actions/setup-vllm/action.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Summary
🔇 Additional comments (2)
vllm/Containerfile (2)
31-31: No action required. The${model_path}/originaldirectory removal is safe—it contains only metadata and provenance information from the Hugging Face Hub, not model weights. The actual model weights reside in${model_path}and remain intact after this cleanup. vLLM's runtime has no dependency on theoriginal/directory. This is a standard container optimization to reduce image size.Likely an incorrect or invalid review comment.
10-12: Model paths are correctly configured—vLLM receives model paths via runtime arguments.The
MODEL_CACHE_DIR=/root/.cacheis correct. Models download to/root/.cache/{model_name}as expected. Thevllm serveENTRYPOINT passes the full model path at runtime—as shown invllm/README.md, you provide--model /root/.cache/{model_name}when running the container. The setup is correct and works as designed.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
06199c1 to
bd12c1d
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
.github/workflows/redhat-distro-container.yml (1)
123-130: Fix vLLM container name mismatch (breaks logs + cleanup)
setup-vllmnow starts--name vllm-$VLLM_MODE, but this workflow still referencesvllmindocker logs/docker rm. WithVLLM_MODE: inference, the container name isvllm-inference.Proposed fix (define once; reuse everywhere)
jobs: build-test-push: runs-on: ubuntu-latest env: + VLLM_MODE: inference + VLLM_CONTAINER_NAME: vllm-inference VERTEX_AI_PROJECT: ${{ secrets.VERTEX_AI_PROJECT }} VERTEX_AI_LOCATION: us-central1 GCP_WORKLOAD_IDENTITY_PROVIDER: ${{ secrets.GCP_WORKLOAD_IDENTITY_PROVIDER }} @@ - name: Setup vllm for image test if: github.event_name != 'workflow_dispatch' id: vllm uses: ./.github/actions/setup-vllm - env: - VLLM_IMAGE: quay.io/higginsd/vllm-cpu:65393ee064-qwen3 - VLLM_MODE: inference + env: + VLLM_IMAGE: ${{ vars.VLLM_CPU_IMAGE || 'quay.io/opendatahub/vllm-cpu:latest' }} + VLLM_MODE: ${{ env.VLLM_MODE }} @@ - docker logs vllm > logs/vllm.log 2>&1 || echo "Failed to get vllm logs" > logs/vllm.log + docker logs "${VLLM_CONTAINER_NAME}" > logs/vllm.log 2>&1 || echo "Failed to get vllm logs" > logs/vllm.log @@ - docker rm -f vllm llama-stack postgres + docker rm -f "${VLLM_CONTAINER_NAME}" llama-stack postgresAlso applies to: 148-157, 188-192
🤖 Fix all issues with AI agents
In @.github/actions/setup-vllm/action.yml:
- Around line 9-26: The VLLM_ARGS model paths omit the namespace directory used
by the Containerfile cache, causing missing model files; update the VLLM_ARGS
assignments (controlled by VLLM_MODE) so the --model paths include the repo
namespace directory (e.g., change /root/.cache/Qwen3-0.6B to
/root/.cache/Qwen/Qwen3-0.6B and /root/.cache/granite-embedding-125m-english to
/root/.cache/ibm-granite/granite-embedding-125m-english) while preserving the
--served-model-name values and the docker run usage of $VLLM_ARGS and
$VLLM_MODE.
- Around line 9-25: Add a pre-run validation that VLLM_IMAGE is set and
non-empty and fail fast with a clear error before constructing VLLM_ARGS or
calling docker run; check the environment variable VLLM_IMAGE and if empty print
a descriptive error (e.g., "Error: VLLM_IMAGE must be set to a valid Docker
image") and exit with non-zero status so the subsequent docker run --name
vllm-$VLLM_MODE $VLLM_IMAGE $VLLM_ARGS call never executes with a missing image
name.
In `@vllm/Containerfile`:
- Line 5: Update the RUN invocation that currently uses `uv pip install
"huggingface-hub[cli]"` so it succeeds when no virtualenv is active: add the
`--system` flag to `uv pip install` (i.e., `uv --system pip install ...`) or,
alternatively, detect `uv` availability and fall back to plain `pip install
"huggingface-hub[cli]"` if `uv` is not present; adjust the `RUN` line that
contains `uv pip install "huggingface-hub[cli]"` accordingly.
In `@vllm/README.md`:
- Around line 34-65: The README examples use --model paths without the repo
namespace but the Containerfile stores models under /root/.cache/<repo_id>, so
update the two docker run examples to include the repo namespace in the --model
argument (change --model /root/.cache/Qwen3-0.6B to --model
/root/.cache/Qwen/Qwen3-0.6B and change --model
/root/.cache/granite-embedding-125m-english to --model
/root/.cache/ibm-granite/granite-embedding-125m-english) so the container can
find the downloaded model.
♻️ Duplicate comments (3)
.github/workflows/redhat-distro-container.yml (1)
123-130: Stop hardcoding a personalVLLM_IMAGEreferenceThis still points at
quay.io/higginsd/..., which is brittle for CI and downstream users.Proposed fix (use repo/organization variable with a safe default)
- name: Setup vllm for image test if: github.event_name != 'workflow_dispatch' id: vllm uses: ./.github/actions/setup-vllm env: - VLLM_IMAGE: quay.io/higginsd/vllm-cpu:65393ee064-qwen3 + VLLM_IMAGE: ${{ vars.VLLM_CPU_IMAGE || 'quay.io/opendatahub/vllm-cpu:latest' }} VLLM_MODE: inferencevllm/Containerfile (2)
1-1: Document + pin base image provenance (digest) for supply-chain clarityThis base image origin/provenance still isn’t explained, and it’s not pinned to a digest.
21-32: Avoid passing HF tokens on the command lineEven with BuildKit secrets,
--token "${HF_TOKEN}"can leak via process args/logging;hf downloadsupportsHF_TOKENenv.Proposed fix (HF_TOKEN env, optional secret mount)
-RUN --mount=type=secret,id=hf_token \ +RUN --mount=type=secret,id=hf_token,required=false \ for model in "${INFERENCE_MODEL}" "${EMBEDDING_MODEL}"; do \ model_path="${MODEL_CACHE_DIR}/${model}" && \ mkdir -p "${model_path}" && \ if [ -f /run/secrets/hf_token ]; then \ - HF_TOKEN=$(cat /run/secrets/hf_token) && \ - hf download "${model}" --local-dir "${model_path}" --token "${HF_TOKEN}"; \ + HF_TOKEN="$(cat /run/secrets/hf_token)" hf download "${model}" --local-dir "${model_path}"; \ else \ hf download "${model}" --local-dir "${model_path}"; \ fi && \ rm -rf /root/.cache/huggingface "${model_path}/original"; \ donedocker buildkit Dockerfile secret mount `required=` default behavior + huggingface-hub `hf download` honoring HF_TOKEN environment variable
🧹 Nitpick comments (1)
vllm/README.md (1)
7-13: Consider usingvllm/as build context (faster + clearer)
docker build . --file vllm/Containerfileworks, but it sends the whole repo as context;vllm/is sufficient here.Proposed tweak
-DOCKER_BUILDKIT=1 docker build . \ +DOCKER_BUILDKIT=1 docker build vllm \ --build-arg INFERENCE_MODEL="Qwen/Qwen3-0.6B" \ --build-arg EMBEDDING_MODEL="ibm-granite/granite-embedding-125m-english" \ --tag opendatahub/vllm-cpu:Qwen3-granite-embedding-125m \ - --file vllm/Containerfile + --file vllm/ContainerfileAlso applies to: 19-27
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
.github/actions/setup-vllm/action.yml.github/workflows/redhat-distro-container.yml.github/workflows/vllm-cpu-container.ymlvllm/Containerfilevllm/README.md
🚧 Files skipped from review as they are similar to previous changes (1)
- .github/workflows/vllm-cpu-container.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-test-push (linux/amd64)
- GitHub Check: build-test-push (linux/amd64)
- GitHub Check: Summary
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
bd12c1d to
f6a427a
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In @.github/actions/setup-vllm/action.yml:
- Around line 14-15: The legacy VLLM_ARGS assignment for when VLLM_MODE ==
"legacy" uses the wrong model path (/root/.cache/Qwen3-0.6B); change the model
path in the VLLM_ARGS string to include the namespace directory (e.g.,
/root/.cache/Qwen/Qwen3-0.6B) so it matches how models are downloaded; update
the VLLM_ARGS line (the string containing --model and --served-model-name) to
reference /root/.cache/<repo_id>/<model_name> (for example
/root/.cache/Qwen/Qwen3-0.6B).
In @.github/workflows/vllm-cpu-container.yml:
- Around line 46-47: Update the checkout action reference used in the workflow:
replace the pinned SHA value in the uses line (currently
"actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8") with the patch tag
"actions/checkout@v6.0.2" so the workflow uses the latest v6.0.2 release; commit
the change to the workflow file.
In `@vllm/README.md`:
- Around line 1-68: The README ends with a truncated sentence ("Additional vLLM
arguments can be passed directly"); update the vllm/README.md final paragraph to
complete the thought by stating that additional vLLM CLI flags can be appended
when running the container (i.e., pass extra vLLM arguments after the image name
in the docker run command), and ensure the line containing that phrase
("Additional vLLM arguments can be passed directly") is replaced with a clear,
finished sentence referencing the runtime invocation.
♻️ Duplicate comments (3)
vllm/Containerfile (2)
5-5: Add--systemflag touv pip install.Without
--system,uv pip installwill refuse to install into the system Python when not running inside a virtualenv. This will cause the build to fail.🐛 Proposed fix
-RUN uv pip install "huggingface-hub[cli]" +RUN uv pip install --system "huggingface-hub[cli]"
21-32: Pass HF token via environment variable instead of CLI argument.Reading the token into a shell variable and passing it via
--token "${HF_TOKEN}"can expose the token in process listings. Thehf downloadcommand honors theHF_TOKENenvironment variable directly.🔒 Proposed fix
RUN --mount=type=secret,id=hf_token \ for model in "${INFERENCE_MODEL}" "${EMBEDDING_MODEL}"; do \ model_path="${MODEL_CACHE_DIR}/${model}" && \ mkdir -p "${model_path}" && \ if [ -f /run/secrets/hf_token ]; then \ - HF_TOKEN=$(cat /run/secrets/hf_token) && \ - hf download "${model}" --local-dir "${model_path}" --token "${HF_TOKEN}"; \ + HF_TOKEN=$(cat /run/secrets/hf_token) hf download "${model}" --local-dir "${model_path}"; \ else \ hf download "${model}" --local-dir "${model_path}"; \ fi && \ rm -rf /root/.cache/huggingface "${model_path}/original"; \ done.github/actions/setup-vllm/action.yml (1)
9-27: Add validation forVLLM_IMAGEenvironment variable.The
docker runcommand uses$VLLM_IMAGEwithout validation. If unset or empty, Docker will fail with the confusing error"docker run" requires at least 1 argument. Add an early check for a clear error message.🐛 Proposed fix
# Set VLLM_ARGS based on VLLM_MODE + if [[ -z "${VLLM_IMAGE:-}" ]]; then + echo "Error: VLLM_IMAGE environment variable must be set" >&2 + exit 1 + fi + if [[ "$VLLM_MODE" == "inference" ]]; then
🧹 Nitpick comments (1)
.github/workflows/vllm-cpu-container.yml (1)
49-54: Tag generation produces simplified names - verify this is intentional.The tag extraction logic removes version info:
Qwen/Qwen3-0.6B→ tagQwen3ibm-granite/granite-embedding-125m-english→ taggraniteThis creates short, readable tags but loses version specificity. If the same model family with different versions is used, tags could collide. Verify this behavior is acceptable for your use case.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
.github/actions/setup-vllm/action.yml.github/workflows/redhat-distro-container.yml.github/workflows/vllm-cpu-container.ymlvllm/Containerfilevllm/README.md
🚧 Files skipped from review as they are similar to previous changes (1)
- .github/workflows/redhat-distro-container.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-test-push (linux/amd64)
- GitHub Check: build-test-push (linux/amd64)
- GitHub Check: Summary
🔇 Additional comments (2)
.github/actions/setup-vllm/action.yml (1)
29-39: Health check logic looks good.The port determination correctly maps embedding mode to 8001 and others to 8000. The 15-minute timeout is reasonable for CPU-based model loading.
.github/workflows/vllm-cpu-container.yml (1)
67-151: Build, test, and publish flow looks well-structured.
- Build step correctly passes both
INFERENCE_MODELandEMBEDDING_MODELas build-args- Test steps use the correct full image tag
${{ env.IMAGE_NAME }}:${{ env.INFERENCE_TAG }}-${{ env.EMBEDDING_TAG }}- Publish step mirrors the build configuration correctly
- Log gathering with
if: always()ensures debugging info is captured on failurePrevious issues regarding missing build-args and image tag mismatches have been addressed.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
1ce168f to
428fbb6
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In @.github/workflows/vllm-cpu-container.yml:
- Around line 145-159: The publish step ("Publish image to Quay.io", id:
publish, using docker/build-push-action) is missing the secrets input so gated
model downloads fail; add a secrets block under the step's with: parameters and
pass HF_TOKEN (e.g., secrets: | HF_TOKEN=${{ secrets.HF_TOKEN }}) so the
docker/build-push-action has the HF_TOKEN available during the push/build like
the earlier build step.
- Around line 67-81: The build step ("Build image", id: build) doesn't pass the
secret needed by the Containerfile's RUN --mount=type=secret,id=hf_token, so
gated model downloads will run unauthenticated; update the
docker/build-push-action invocation (uses: docker/build-push-action@...) to
include the secrets input and map the HF token (e.g. add a secrets block under
with that provides hf_token=${{ secrets.HF_TOKEN }}), ensuring the Containerfile
can access the hf_token secret during build.
♻️ Duplicate comments (3)
vllm/Containerfile (2)
5-5: Add--systemflag touv pip install.Without
--system,uv pip installrefuses to install into the system Python when no virtualenv is active.Proposed fix
-RUN uv pip install "huggingface-hub[cli]" +RUN uv pip install --system "huggingface-hub[cli]"
21-32: Pass HF_TOKEN as environment variable instead of command-line argument.The
--token "${HF_TOKEN}"argument can expose the token in process listings or verbose logs. Thehf downloadcommand honors theHF_TOKENenvironment variable directly.Proposed fix
RUN --mount=type=secret,id=hf_token \ for model in "${INFERENCE_MODEL}" "${EMBEDDING_MODEL}"; do \ model_path="${MODEL_CACHE_DIR}/${model}" && \ mkdir -p "${model_path}" && \ if [ -f /run/secrets/hf_token ]; then \ - HF_TOKEN=$(cat /run/secrets/hf_token) && \ - hf download "${model}" --local-dir "${model_path}" --token "${HF_TOKEN}"; \ + HF_TOKEN=$(cat /run/secrets/hf_token) hf download "${model}" --local-dir "${model_path}"; \ else \ hf download "${model}" --local-dir "${model_path}"; \ fi && \ rm -rf /root/.cache/huggingface "${model_path}/original"; \ done.github/workflows/vllm-cpu-container.yml (1)
46-47: Update checkout action to latest patch version.The workflow uses
actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8(v6.0.1), but v6.0.2 was released on Jan 9, 2026.Proposed fix
- name: Checkout repository - uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1 + uses: actions/checkout@<v6.0.2-sha> # v6.0.2
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
.github/actions/setup-vllm/action.yml.github/workflows/redhat-distro-container.yml.github/workflows/vllm-cpu-container.ymlvllm/Containerfilevllm/README.md
🚧 Files skipped from review as they are similar to previous changes (3)
- .github/workflows/redhat-distro-container.yml
- .github/actions/setup-vllm/action.yml
- vllm/README.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-test-push (linux/amd64)
- GitHub Check: build-test-push (linux/amd64)
- GitHub Check: Summary
🔇 Additional comments (3)
.github/workflows/vllm-cpu-container.yml (3)
83-97: Tests are skipped onworkflow_dispatch.Both inference and embedding tests are conditioned on
github.event_name != 'workflow_dispatch', meaning manual triggers publish images without running smoke tests. If this is intentional (e.g., for quick republishing), consider documenting this behavior. Otherwise, the condition should be removed to ensure all published images are tested.
49-54: Verify tag derivation logic handles model naming variations correctly.The tag extraction uses
${var#*/}to strip the org prefix and${var%-*}to strip the shortest-*suffix. For "Qwen/Qwen3-0.6B", this yields "Qwen3". For models with multiple hyphens like "granite-embedding-125m-english", it yields "granite-embedding-125m".This could cause tag collisions if two models share the same prefix (e.g., "org/model-v1" and "org/model-v2" both become "model"). Verify this is acceptable for your model selection, or consider using a hash-based or full-name approach for uniqueness.
99-128: LGTM!Good practice to gather logs and system information on failure, with proper artifact upload and retention policy.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
|
CI was passing on a previous run, not sure why it is failing now |
428fbb6 to
e00abaf
Compare
f28e425 to
2b0424d
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @.github/workflows/vllm-cpu-container.yml:
- Around line 49-54: The current tag derivation strips everything after the last
dash using the `${INFERENCE_TEMP%-*}` and `${EMBEDDING_TEMP%-*}` expansions
which can remove meaningful parts of model names and create collisions for
custom inputs (via workflow_dispatch); update the logic to either (a) stop
truncating by dash and simply set tags to the final path segment after `/` (use
the already-derived INFERENCE_TEMP/EMBEDDING_TEMP directly), or (b) derive a
compact unique tag by hashing the full model name and appending/including the
last path segment, and document the chosen behavior so users know how tags are
formed (reference the variables INFERENCE_MODEL, EMBEDDING_MODEL,
INFERENCE_TEMP, EMBEDDING_TEMP, INFERENCE_TAG, EMBEDDING_TAG when making the
change).
♻️ Duplicate comments (7)
vllm/Containerfile (2)
5-5: Add--systemflag touv pip install.The
uv pip installcommand will fail when not running inside a virtualenv unless--systemis specified. The vLLM CPU base image doesn't activate a virtualenv by default.Proposed fix
-RUN uv pip install "huggingface-hub[cli]" +RUN uv pip install --system "huggingface-hub[cli]"
21-32: Pass HF_TOKEN as environment variable instead of command-line argument.Reading the token into a shell variable and passing via
--token "${HF_TOKEN}"can expose it in process listings. Thehf downloadcommand honors theHF_TOKENenvironment variable directly.Proposed fix
RUN --mount=type=secret,id=hf_token \ for model in "${INFERENCE_MODEL}" "${EMBEDDING_MODEL}"; do \ model_path="${MODEL_CACHE_DIR}/${model}" && \ mkdir -p "${model_path}" && \ if [ -f /run/secrets/hf_token ]; then \ - HF_TOKEN=$(cat /run/secrets/hf_token) && \ - hf download "${model}" --local-dir "${model_path}" --token "${HF_TOKEN}"; \ + HF_TOKEN=$(cat /run/secrets/hf_token) hf download "${model}" --local-dir "${model_path}"; \ else \ hf download "${model}" --local-dir "${model_path}"; \ fi && \ rm -rf /root/.cache/huggingface "${model_path}/original"; \ done.github/actions/setup-vllm/action.yml (2)
16-18: Fix model path in legacy mode - missing namespace directory.The legacy mode path
/root/.cache/Qwen3-0.6Bis missing the namespace directory. The Containerfile downloads models to/root/.cache/<repo_id>(e.g.,/root/.cache/Qwen/Qwen3-0.6B).Proposed fix
elif [[ "$VLLM_MODE" == "legacy" ]]; then - VLLM_ARGS="--host 0.0.0.0 --port 8000 --enable-auto-tool-choice --tool-call-parser hermes --model /root/.cache/Qwen3-0.6B --served-model-name Qwen/Qwen3-0.6B --max-model-len 8192" + VLLM_ARGS="--host 0.0.0.0 --port 8000 --enable-auto-tool-choice --tool-call-parser hermes --model /root/.cache/Qwen/Qwen3-0.6B --served-model-name Qwen/Qwen3-0.6B --max-model-len 8192" VLLM_PORT=8000
9-22: Add validation forVLLM_IMAGEto fail fast with a clear error.The
docker runcommand on line 29 uses$VLLM_IMAGEwithout validation. If this environment variable is unset or empty, docker run will fail with a confusing error message.Proposed fix
# Set VLLM_ARGS based on VLLM_MODE + if [[ -z "${VLLM_IMAGE:-}" ]]; then + echo "Error: VLLM_IMAGE environment variable must be set" >&2 + exit 1 + fi + if [[ "$VLLM_MODE" == "inference" ]]; then.github/workflows/vllm-cpu-container.yml (3)
67-81: Missingsecretsinput for gated model access.The Containerfile uses
RUN --mount=type=secret,id=hf_tokento access the HuggingFace token, but the build step doesn't pass this secret. Builds requiring gated model access will fail silently—the download proceeds without authentication.Proposed fix
- name: Build image id: build uses: docker/build-push-action@263435318d21b8e681c14492fe198d362a7d2c83 # v6.18.0 with: context: . file: vllm/Containerfile platforms: ${{ matrix.platform }} push: false tags: ${{ env.IMAGE_NAME }}:${{ env.INFERENCE_TAG }}-${{ env.EMBEDDING_TAG }} load: true # needed to load for smoke test cache-from: type=gha cache-to: type=gha,mode=max build-args: | INFERENCE_MODEL=${{ env.INFERENCE_MODEL }} EMBEDDING_MODEL=${{ env.EMBEDDING_MODEL }} + secrets: | + hf_token=${{ secrets.HF_TOKEN }}
145-159: Missingsecretsinput in publish step.Same issue as the build step—the publish step also needs the
hf_tokensecret for gated model downloads.Proposed fix
- name: Publish image to Quay.io id: publish if: contains(fromJSON('["push", "workflow_dispatch"]'), github.event_name) uses: docker/build-push-action@263435318d21b8e681c14492fe198d362a7d2c83 # v6.18.0 with: context: . file: vllm/Containerfile platforms: ${{ matrix.platform }} push: true tags: ${{ env.IMAGE_NAME }}:${{ env.INFERENCE_TAG }}-${{ env.EMBEDDING_TAG }} cache-from: type=gha cache-to: type=gha,mode=max build-args: | INFERENCE_MODEL=${{ env.INFERENCE_MODEL }} EMBEDDING_MODEL=${{ env.EMBEDDING_MODEL }} + secrets: | + hf_token=${{ secrets.HF_TOKEN }}
46-47: Update checkout action to v6.0.2.The workflow pins
actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8(v6.0.1), but v6.0.2 is available.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
.github/actions/setup-vllm/action.yml.github/workflows/redhat-distro-container.yml.github/workflows/vllm-cpu-container.ymlvllm/Containerfilevllm/README.md
🚧 Files skipped from review as they are similar to previous changes (2)
- .github/workflows/redhat-distro-container.yml
- vllm/README.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-test-push (linux/amd64)
- GitHub Check: build-test-push (linux/amd64)
- GitHub Check: Summary
🔇 Additional comments (4)
vllm/Containerfile (1)
7-19: LGTM - Build argument validation.The validation ensures both
INFERENCE_MODELandEMBEDDING_MODELare provided at build time, failing fast with clear error messages..github/actions/setup-vllm/action.yml (1)
32-36: LGTM - Health check with timeout.The readiness check uses
timeout 900(15 minutes) with a health endpoint poll, which is appropriate for model loading time..github/workflows/vllm-cpu-container.yml (2)
99-128: LGTM - Comprehensive logging and artifact collection.The logging step captures container logs, system info (disk, memory, docker state), and uploads them as artifacts with 7-day retention. The
if: always()ensures logs are captured even on failure.
83-97: LGTM - Test setup with correct image tags.Both inference and embedding test steps now correctly use the full image tag
${{ env.IMAGE_NAME }}:${{ env.INFERENCE_TAG }}-${{ env.EMBEDDING_TAG }}matching the built image.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
2b0424d to
6404e46
Compare
6404e46 to
3022272
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @.github/actions/free-disk-space/action.yml:
- Around line 18-20: The docker rmi invocation is malformed because the -f flag
is inside the command substitution; update the command so the -f flag is passed
to docker rmi (e.g., call docker rmi -f "$(docker image ls -aq)" or place -f
before the command substitution) so the force flag is not treated as part of the
image name; adjust the line containing sudo docker rmi "$(docker image ls -aq)
-f" to move -f outside the quotes and ensure proper quoting of the "$(docker
image ls -aq)" substitution.
♻️ Duplicate comments (2)
.github/actions/setup-vllm/action.yml (2)
16-18: Legacy mode model path is still missing the namespace directory.The
legacymode uses/root/.cache/Qwen3-0.6Bbut the Containerfile downloads models to/root/.cache/<namespace>/<model>. This should be/root/.cache/Qwen/Qwen3-0.6Bto match the inference mode path.🐛 Proposed fix
elif [[ "$VLLM_MODE" == "legacy" ]]; then - VLLM_ARGS="--host 0.0.0.0 --port 8000 --enable-auto-tool-choice --tool-call-parser hermes --model /root/.cache/Qwen3-0.6B --served-model-name Qwen/Qwen3-0.6B --max-model-len 8192" + VLLM_ARGS="--host 0.0.0.0 --port 8000 --enable-auto-tool-choice --tool-call-parser hermes --model /root/.cache/Qwen/Qwen3-0.6B --served-model-name Qwen/Qwen3-0.6B --max-model-len 8192" VLLM_PORT=8000
9-30: Add validation forVLLM_IMAGEenvironment variable.The
docker runcommand uses$VLLM_IMAGEwithout checking if it's set. If unset or empty, docker will fail with the confusing error"docker run" requires at least 1 argument. Add an early check alongside the VLLM_MODE validation.🛠️ Proposed fix
# Set VLLM_ARGS based on VLLM_MODE + if [[ -z "${VLLM_IMAGE:-}" ]]; then + echo "Error: VLLM_IMAGE must be set to a valid Docker image" + exit 1 + fi + if [[ "$VLLM_MODE" == "inference" ]]; then
🧹 Nitpick comments (2)
.github/actions/free-disk-space/action.yml (2)
70-76: Inconsistent service stop/disable order.For
postfix(lines 65-66), you callstopthendisable. Forcups, it'sdisablethenstop. While both work, stopping before disabling is the conventional order to ensure a clean shutdown before preventing future starts.Suggested fix for consistency
# Remove Cups because we're not printing anything cups_packages=$(dnf list installed | grep cups || echo "") if [[ ! -z "${cups_packages}" ]]; then - sudo systemctl disable cups sudo systemctl stop cups + sudo systemctl disable cups sudo dnf -y remove cups fi
89-93: Consider broader Xcode cleanup.The pattern
Xcode_15*only removes Xcode 15.x versions. Future GitHub-hosted runners may have newer versions, reducing the effectiveness of this cleanup. Consider a broader glob or removing multiple known versions.Suggested broader cleanup
- name: Free Disk Space MacOS if: runner.os == 'macOS' run: | - sudo rm -rf /System/Volumes/Data/Applications/Xcode_15* + sudo rm -rf /System/Volumes/Data/Applications/Xcode_* || true shell: bash
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
.github/actions/free-disk-space/action.yml.github/actions/setup-vllm/action.yml.github/workflows/redhat-distro-container.yml.github/workflows/vllm-cpu-container.ymlvllm/Containerfilevllm/README.md
🚧 Files skipped from review as they are similar to previous changes (4)
- .github/workflows/redhat-distro-container.yml
- vllm/Containerfile
- .github/workflows/vllm-cpu-container.yml
- vllm/README.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-test-push (linux/amd64)
- GitHub Check: build-test-push (linux/amd64)
- GitHub Check: Summary
🔇 Additional comments (1)
.github/actions/setup-vllm/action.yml (1)
32-36: LGTM!The health check logic correctly uses the dynamic
VLLM_PORTbased on mode, with a reasonable 15-minute timeout for model loading. The/healthendpoint and retry interval are appropriate.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
3022272 to
f7bb982
Compare
f7bb982 to
54121a5
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (3)
.github/actions/setup-vllm/action.yml (2)
24-30:⚠️ Potential issue | 🟠 MajorCode Quality: Add fail-fast validation for
VLLM_IMAGEbeforedocker run.Line 29 can execute with an empty image reference and fail with a non-actionable Docker error. Validate early and exit with a clear message.
Proposed fix
- # Set VLLM_ARGS based on VLLM_MODE + # Validate required env first + if [[ -z "${VLLM_IMAGE:-}" ]]; then + echo "Error: VLLM_IMAGE must be set to a valid Docker image" >&2 + exit 1 + fi + + # Set VLLM_ARGS based on VLLM_MODE if [[ "$VLLM_MODE" == "inference" ]]; then🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/actions/setup-vllm/action.yml around lines 24 - 30, Before executing the docker run block, validate the VLLM_IMAGE environment variable (VLLM_IMAGE) and fail fast if it's empty; add a check immediately prior to the docker run block in action.yml that tests for an empty VLLM_IMAGE, writes a clear error message to stderr (including the variable name) and exits with a non‑zero status so the action doesn't attempt docker run with a blank image reference.
16-18:⚠️ Potential issue | 🔴 CriticalCode Quality: Fix legacy model path to match cached directory layout.
Line 17 still points to
/root/.cache/Qwen3-0.6B, which does not match the namespace-based cache path used elsewhere and will break legacy startup.Proposed fix
elif [[ "$VLLM_MODE" == "legacy" ]]; then - VLLM_ARGS="--host 0.0.0.0 --port 8000 --enable-auto-tool-choice --tool-call-parser hermes --model /root/.cache/Qwen3-0.6B --served-model-name Qwen/Qwen3-0.6B --max-model-len 8192" + VLLM_ARGS="--host 0.0.0.0 --port 8000 --enable-auto-tool-choice --tool-call-parser hermes --model /root/.cache/Qwen/Qwen3-0.6B --served-model-name Qwen/Qwen3-0.6B --max-model-len 8192" VLLM_PORT=8000🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/actions/setup-vllm/action.yml around lines 16 - 18, The legacy branch for VLLM startup uses an incorrect model path; update the VLLM_ARGS string in the elif [[ "$VLLM_MODE" == "legacy" ]] block so the --model path matches the namespace-based cache layout used elsewhere (make it consistent with the served-model-name), e.g. replace /root/.cache/Qwen3-0.6B with the namespaced cache path such as /root/.cache/Qwen/Qwen3-0.6B so VLLM_ARGS and --served-model-name Qwen/Qwen3-0.6B point to the same cached model..github/actions/free-disk-space/action.yml (1)
18-20:⚠️ Potential issue | 🟠 MajorCode Quality:
docker rmiinvocation is malformed and does not pass-fcorrectly.Line 19 places
-finside the quoted argument, so it is treated as text rather than a flag.Proposed fix
- if command -v docker 2>&1 >/dev/null ; then - sudo docker rmi "$(docker image ls -aq) -f" >/dev/null 2>&1 || true + if command -v docker >/dev/null 2>&1; then + images="$(docker image ls -aq)" + if [[ -n "${images}" ]]; then + sudo docker rmi -f ${images} >/dev/null 2>&1 || true + fi fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/actions/free-disk-space/action.yml around lines 18 - 20, The docker removal command is malformed: the -f flag is placed inside the quoted argument so it’s treated as text; change the invocation so -f is a separate flag before the list of IDs (use docker rmi -f with the output of docker image ls -aq as the argument) and ensure the substitution is not quoted so flags are parsed correctly (target the line invoking sudo docker rmi and the command substitution docker image ls -aq).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/actions/free-disk-space/action.yml:
- Around line 83-85: The step that removes DNF caches uses "rm -rf
/var/cache/dnf*" without sudo which can fail on locked-down runners and break
the action; update the branch handling where "sudo dnf clean all" is followed by
the cache deletion to call sudo for the removal (i.e., use sudo for the rm
command), ensuring the runner has permission and the action won't exit due to a
permission error when executing the cache deletion command.
In @.github/workflows/vllm-cpu-container.yml:
- Around line 12-19: The workflow .github/workflows/vllm-cpu-container.yml
currently restricts triggers to only 'vllm/Containerfile', which can skip CI
when the workflow or related composite actions change; update the
push/pull_request path filters (the paths keys in vllm-cpu-container.yml) to
include the workflow and action files (for example add '.github/workflows/**'
and any composite action dirs like '.github/actions/**' and/or remove the paths
filter entirely) so changes to workflow files always trigger the pipeline.
- Around line 131-136: The "Cleanup vllm containers" step currently always runs
and fails when containers don't exist; make it idempotent by changing the run
command in that step so it does not return non-zero if vllm-inference or
vllm-embedding are absent (e.g., check for each container with docker ps -a or
docker container inspect before removing, or append "|| true" to docker rm -f
vllm-inference vllm-embedding) so the step succeeds even when skipped earlier
for workflow_dispatch; update the run block in the "Cleanup vllm containers"
step accordingly.
---
Duplicate comments:
In @.github/actions/free-disk-space/action.yml:
- Around line 18-20: The docker removal command is malformed: the -f flag is
placed inside the quoted argument so it’s treated as text; change the invocation
so -f is a separate flag before the list of IDs (use docker rmi -f with the
output of docker image ls -aq as the argument) and ensure the substitution is
not quoted so flags are parsed correctly (target the line invoking sudo docker
rmi and the command substitution docker image ls -aq).
In @.github/actions/setup-vllm/action.yml:
- Around line 24-30: Before executing the docker run block, validate the
VLLM_IMAGE environment variable (VLLM_IMAGE) and fail fast if it's empty; add a
check immediately prior to the docker run block in action.yml that tests for an
empty VLLM_IMAGE, writes a clear error message to stderr (including the variable
name) and exits with a non‑zero status so the action doesn't attempt docker run
with a blank image reference.
- Around line 16-18: The legacy branch for VLLM startup uses an incorrect model
path; update the VLLM_ARGS string in the elif [[ "$VLLM_MODE" == "legacy" ]]
block so the --model path matches the namespace-based cache layout used
elsewhere (make it consistent with the served-model-name), e.g. replace
/root/.cache/Qwen3-0.6B with the namespaced cache path such as
/root/.cache/Qwen/Qwen3-0.6B so VLLM_ARGS and --served-model-name
Qwen/Qwen3-0.6B point to the same cached model.
ℹ️ Review info
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
📒 Files selected for processing (6)
.github/actions/free-disk-space/action.yml.github/actions/setup-vllm/action.yml.github/workflows/redhat-distro-container.yml.github/workflows/vllm-cpu-container.ymlvllm/Containerfilevllm/README.md
🚧 Files skipped from review as they are similar to previous changes (3)
- vllm/Containerfile
- vllm/README.md
- .github/workflows/redhat-distro-container.yml
4e5157d to
f63cb27
Compare
ae17eaf to
bcae7fc
Compare
bcae7fc to
7d4ed24
Compare
this commit adds a Containerfile and README to allow users to build vLLM CPU images with pre-downloaded models it also adds a CI action that will publish an official image to the OpenDataHub org on Quay Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
…urce Replace the multi-stage build that compiled vLLM from source with the official vllm/vllm-openai-cpu:v0.17.1 image. This dramatically reduces build time and image complexity while retaining model download support. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
8763b84 to
3e0425b
Compare
What does this PR do?
adds a Containerfile and README to allow
users to build vLLM CPU images with pre-downloaded
models
Closes #202
Summary by CodeRabbit
New Features
Documentation
Chores