feat: add Containerfile for building vllm CPU images by nathan-weinberg · Pull Request #203 · opendatahub-io/llama-stack-distribution

nathan-weinberg · 2026-01-13T20:39:21Z

What does this PR do?

adds a Containerfile and README to allow
users to build vLLM CPU images with pre-downloaded
models

Closes #202

Summary by CodeRabbit

New Features
- Configurable vLLM container modes (inference/embedding/legacy) with mode-specific ports, dynamic startup args, and improved readiness checks.
- New workflow to build, test, and publish vLLM CPU container images; Containerfile supports pre-downloaded models and authenticated model retrieval.
Documentation
- Added vLLM CPU container README with build/run, model caching, and usage examples.
Chores
- CI updates for configurable image/mode and cleanup; added runner action to free disk space.

coderabbitai · 2026-01-13T20:39:29Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a vLLM CPU Containerfile and README, a new GitHub Actions workflow to build/test/publish vLLM CPU images, parameterizes the setup-vllm action to accept VLLM_IMAGE and VLLM_MODE (mode-specific args, container naming, and readiness checks), and introduces a reusable "Free Disk Space" composite action.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Out of Scope Changes check	❓ Inconclusive	All changes are directly related to the stated objectives. However, the PR introduces infrastructure changes beyond the core requirement: a free-disk-space GitHub Action and modifications to existing workflows that appear to be supporting utilities rather than core deliverables.	Clarify whether the free-disk-space action and workflow modifications are necessary dependencies for the vLLM CPU images feature or represent separate infrastructure improvements that should be in a separate PR.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and accurately summarizes the main objective: adding a Containerfile for building vLLM CPU images with pre-downloaded models.
Linked Issues check	✅ Passed	The PR implements both key requirements from issue `#202`: a Containerfile for building vLLM CPU images with pre-loaded models and GitHub Actions workflow for CI building, along with supporting documentation and infrastructure changes.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tip

CodeRabbit can use TruffleHog to scan for secrets in your code with verification capabilities.

Add a TruffleHog config file (e.g. trufflehog-config.yml, trufflehog.yml) to your project to customize detectors and scanning behavior. The tool runs only when a config file is present.

coderabbitai

Actionable comments posted: 7

🤖 Fix all issues with AI agents

In @.github/workflows/vllm-cpu-container.yml:
- Around line 38-39: The workflow defines EMBEDDING_MODEL but the Containerfile
only accepts a single MODEL_NAME so the embedding model is never included;
either remove EMBEDDING_MODEL/EMBEDDING_TAG from the workflow and simplify the
image tag to only use INFERENCE_MODEL, or extend the Containerfile to accept a
second build arg (e.g., ARG EMBEDDING_MODEL) and update the image build step to
pass --build-arg EMBEDDING_MODEL=${{ github.event.inputs.embedding_model }} and
adjust the Containerfile logic that installs or copies models to handle both
MODEL_NAME and EMBEDDING_MODEL (and update the final image tag to reflect both
models if desired) so the workflow variables and the Containerfile are
consistent.
- Around line 66-77: The GitHub Actions build step "Build image" (id: build,
using docker/build-push-action) fails because the Containerfile requires the
MODEL_NAME build-arg; add MODEL_NAME to the step's with -> build-arg list so the
action passes MODEL_NAME (e.g., from the env: ${{ env.MODEL_NAME }}). Ensure the
added build-arg key matches "MODEL_NAME" exactly and is passed alongside
existing inputs (context, file, platforms, tags, etc.) so the Containerfile
validation no longer errors.
- Around line 79-84: The workflow step "Setup vllm for image test" sets
VLLM_IMAGE to `${{ env.IMAGE_NAME }}` which omits the tag and will default to
:latest, causing the action `./.github/actions/setup-vllm` to pull a
non-existent image; change VLLM_IMAGE to include the full tag by combining `${{
env.IMAGE_NAME }}` with `${{ env.INFERENCE_TAG }}` and `${{ env.EMBEDDING_TAG
}}` (e.g. `${{ env.IMAGE_NAME }}:${{ env.INFERENCE_TAG }}-${{ env.EMBEDDING_TAG
}}`) so the setup action uses the actual built image tag.
- Around line 95-106: The publish step for docker/build-push-action is missing
the MODEL_NAME build argument; update the publish job (the step with id
"publish" that uses docker/build-push-action@2634353...) to include build-args
with MODEL_NAME set to the workflow env (e.g., add a build-args entry mapping
MODEL_NAME to ${{ env.MODEL_NAME }} under the with: block) so the image is built
and tagged with the same model configuration as the build step.

In `@vllm/Containerfile`:
- Line 1: Add a short provenance section to vllm/README.md documenting the base
image public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo used in Containerfile: state
what the image contains (OS, preinstalled vLLM and dependencies), why this
specific AWS ECR account/image was chosen (e.g., builds provided by upstream or
verified maintainers, compatibility with our runtime), and how it maps to
official vLLM releases (include release tag or digest and link to upstream vLLM
release or repo). Also mention how to verify the image (digest/checksum or a
link to the ECR repo) and any trust/maintainer contact or audit steps for future
maintainers.

In `@vllm/README.md`:
- Around line 17-23: Update the example Docker build command in vllm/README.md
to use the vllm directory as the build context and explicitly reference its
Containerfile; replace the current `docker build .` invocation with a call that
passes `-f vllm/Containerfile` and `vllm` as the context so the gated model
example uses the same context/path fix as the other examples.
- Around line 7-11: The Docker build in the README uses '.' as the context but
doesn't point to the Containerfile in the vllm directory, which will fail from
the repo root; update the command in README.md to pass the correct context and
Dockerfile path by using the -f vllm/Containerfile flag and vllm as the build
context (or alternatively note to cd into the vllm directory), ensuring the
referenced Containerfile name "vllm/Containerfile" and the build args (e.g.,
MODEL_NAME) remain unchanged.

🧹 Nitpick comments (1)

.github/actions/setup-vllm/action.yml (1)
14-14: Quote the environment variable and add validation.

The $VLLM_IMAGE variable should be quoted to prevent word splitting issues. Additionally, consider adding validation to fail early with a clear error message if the variable is not set.
♻️ Suggested improvement
      run: |
        # Start vllm container
+       if [ -z "${VLLM_IMAGE}" ]; then
+         echo "ERROR: VLLM_IMAGE environment variable is required" >&2
+         exit 1
+       fi
        docker run -d \
          --name vllm \
          --privileged=true \
          --net=host \
-         $VLLM_IMAGE \
+         "${VLLM_IMAGE}" \
          --host 0.0.0.0 \

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dadba53 and f688455.

📒 Files selected for processing (5)

.github/actions/setup-vllm/action.yml
.github/workflows/redhat-distro-container.yml
.github/workflows/vllm-cpu-container.yml
vllm/Containerfile
vllm/README.md

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: build-test-push (linux/amd64)
GitHub Check: Summary

🔇 Additional comments (3)

.github/workflows/redhat-distro-container.yml (1)

127-128: Consider using the official organization image reference.

The VLLM_IMAGE points to a personal Quay.io account (higginsd). Once the new vllm-cpu-container.yml workflow publishes images to quay.io/opendatahub/vllm-cpu, this reference should be updated to use the official organization image for consistency and maintainability.

Is this a temporary reference for initial testing? If so, consider adding a TODO comment or tracking issue.

vllm/Containerfile (2)

24-26: LGTM!

The ENTRYPOINT pattern is well-designed: uses exec for proper signal handling (PID 1), quotes the variable for safety, and correctly forwards additional arguments via the "$@" with "--" pattern.

16-22: No issue. The hf download command is a documented shorthand alias for huggingface-cli download in the HuggingFace Hub CLI, and the options used (--local-dir, --token) are officially supported. The command syntax is correct.

Likely an incorrect or invalid review comment.

.github/workflows/vllm-cpu-container.yml

vllm/Containerfile

vllm/README.md

coderabbitai

Actionable comments posted: 4

🤖 Fix all issues with AI agents

In @.github/workflows/redhat-distro-container.yml:
- Around line 127-128: The workflow hardcodes VLLM_IMAGE to a personal quay
namespace (VLLM_IMAGE: quay.io/higginsd/vllm-cpu:65393ee064-qwen3); update the
workflow to stop using a personal image by either switching VLLM_IMAGE to the
official registry image (e.g., quay.io/opendatahub/vllm-cpu) or make VLLM_IMAGE
a configurable input/env/ repository variable (expose as a workflow input or use
a repo/organization secret or variable) so the image reference can be updated
without changing code.

In @.github/workflows/vllm-cpu-container.yml:
- Around line 68-79: The Docker build step (action with id "build" using
docker/build-push-action) is missing required build-args INFERENCE_MODEL and
EMBEDDING_MODEL; update the step to pass them via with.build-args (or separate
with entries) e.g. add build-args: INFERENCE_MODEL=${{ env.INFERENCE_MODEL }}
EMBEDDING_MODEL=${{ env.EMBEDDING_MODEL }} (or add individual keys build-arg:
INFERENCE_MODEL and EMBEDDING_MODEL) so the Containerfile validation receives
non-empty values during the build.

In `@vllm/Containerfile`:
- Around line 21-32: The Dockerfile loop that downloads models using
INFERENCE_MODEL and EMBEDDING_MODEL should avoid passing the token via the
--token argument; instead, read the secret from /run/secrets/hf_token and inject
it as the HF_TOKEN environment variable for the hf download invocation (for both
branches that currently call hf download) so the token is not exposed on the
command line — locate the RUN --mount block that references MODEL_CACHE_DIR,
/run/secrets/hf_token and the hf download calls and change them to set HF_TOKEN
from the secret in-line for the hf download command rather than using --token
"${HF_TOKEN}".

♻️ Duplicate comments (2)

vllm/Containerfile (1)

1-1: Document the base image source and provenance.

The base image public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo lacks documentation about its origin, contents, and relationship to official vLLM releases. Add provenance information to vllm/README.md for security transparency and maintainability.
.github/workflows/vllm-cpu-container.yml (1)
118-129: Critical: Missing build arguments in publish step.

Same issue as the build step - the publish step also needs INFERENCE_MODEL and EMBEDDING_MODEL build arguments for the Containerfile.
🐛 Proposed fix
       - name: Publish image to Quay.io
         id: publish
         if: contains(fromJSON('["push", "workflow_dispatch"]'), github.event_name)
         uses: docker/build-push-action@263435318d21b8e681c14492fe198d362a7d2c83 # v6.18.0
         with:
           context: .
           file: vllm/Containerfile
           platforms: ${{ matrix.platform }}
           push: true
           tags: ${{ env.IMAGE_NAME }}:${{ env.INFERENCE_TAG }}-${{ env.EMBEDDING_TAG }}
+          build-args: |
+            INFERENCE_MODEL=${{ env.INFERENCE_MODEL }}
+            EMBEDDING_MODEL=${{ env.EMBEDDING_MODEL }}
           cache-from: type=gha
           cache-to: type=gha,mode=max

🧹 Nitpick comments (3)

vllm/Containerfile (1)
34-34: ENTRYPOINT lacks default arguments for model serving.

The entrypoint is vllm serve without specifying which model to serve. The container user must provide the model path as a runtime argument. Consider documenting this requirement or providing a default CMD.
💡 Suggested improvement
 ENTRYPOINT ["vllm", "serve"]
+CMD ["${MODEL_CACHE_DIR}/${INFERENCE_MODEL}"]
Alternatively, document in README.md that users must specify the model path when running the container.
.github/workflows/vllm-cpu-container.yml (2)
50-55: Tag derivation logic may produce unexpected results for some model names.

The shell parameter expansion ${VAR#*/} removes everything up to the first /, then ${VAR%-*} removes everything after the last -. This works for Qwen/Qwen3-0.6B → Qwen3, but may produce unexpected results for models with different naming patterns (e.g., org/model-name-v2 → model-name).

Consider adding validation or using a more explicit transformation.
💡 Example of more robust tag generation
       - name: Set image tag components
         run: |
-          INFERENCE_TEMP="${INFERENCE_MODEL#*/}"
-          EMBEDDING_TEMP="${EMBEDDING_MODEL#*/}"
-          echo "INFERENCE_TAG=${INFERENCE_TEMP%-*}" >> "$GITHUB_ENV"
-          echo "EMBEDDING_TAG=${EMBEDDING_TEMP%-*}" >> "$GITHUB_ENV"
+          # Extract model name, replace invalid chars for Docker tags
+          INFERENCE_TAG=$(echo "${INFERENCE_MODEL#*/}" | tr '/' '-' | tr '[:upper:]' '[:lower:]')
+          EMBEDDING_TAG=$(echo "${EMBEDDING_MODEL#*/}" | tr '/' '-' | tr '[:upper:]' '[:lower:]')
+          echo "INFERENCE_TAG=${INFERENCE_TAG}" >> "$GITHUB_ENV"
+          echo "EMBEDDING_TAG=${EMBEDDING_TAG}" >> "$GITHUB_ENV"
89-93: Cleanup steps may fail silently if container doesn't exist.

The cleanup steps use docker rm -f which will error if the container doesn't exist (e.g., if the setup step was skipped due to workflow_dispatch). Consider adding || true to prevent workflow failure.
💡 Proposed fix
       - name: Cleanup vllm-inference
         if: always()
         shell: bash
         run: |
-          docker rm -f vllm-inference
+          docker rm -f vllm-inference || true

       - name: Cleanup vllm-embedding
         if: always()
         shell: bash
         run: |
-          docker rm -f vllm-embedding
+          docker rm -f vllm-embedding || true
Also applies to: 103-107

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f688455 and 34b71dd.

📒 Files selected for processing (5)

.github/actions/setup-vllm/action.yml
.github/workflows/redhat-distro-container.yml
.github/workflows/vllm-cpu-container.yml
vllm/Containerfile
vllm/README.md

🚧 Files skipped from review as they are similar to previous changes (2)

vllm/README.md
.github/actions/setup-vllm/action.yml

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Summary

🔇 Additional comments (2)

vllm/Containerfile (2)

31-31: No action required. The ${model_path}/original directory removal is safe—it contains only metadata and provenance information from the Hugging Face Hub, not model weights. The actual model weights reside in ${model_path} and remain intact after this cleanup. vLLM's runtime has no dependency on the original/ directory. This is a standard container optimization to reduce image size.

Likely an incorrect or invalid review comment.

10-12: Model paths are correctly configured—vLLM receives model paths via runtime arguments.

The MODEL_CACHE_DIR=/root/.cache is correct. Models download to /root/.cache/{model_name} as expected. The vllm serve ENTRYPOINT passes the full model path at runtime—as shown in vllm/README.md, you provide --model /root/.cache/{model_name} when running the container. The setup is correct and works as designed.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

.github/workflows/redhat-distro-container.yml

.github/workflows/vllm-cpu-container.yml

vllm/Containerfile

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

.github/workflows/redhat-distro-container.yml (1)

123-130: Fix vLLM container name mismatch (breaks logs + cleanup)

setup-vllm now starts --name vllm-$VLLM_MODE, but this workflow still references vllm in docker logs / docker rm. With VLLM_MODE: inference, the container name is vllm-inference.

Proposed fix (define once; reuse everywhere)

 jobs:
   build-test-push:
     runs-on: ubuntu-latest
     env:
+      VLLM_MODE: inference
+      VLLM_CONTAINER_NAME: vllm-inference
       VERTEX_AI_PROJECT: ${{ secrets.VERTEX_AI_PROJECT }}
       VERTEX_AI_LOCATION: us-central1
       GCP_WORKLOAD_IDENTITY_PROVIDER: ${{ secrets.GCP_WORKLOAD_IDENTITY_PROVIDER }}
@@
       - name: Setup vllm for image test
         if: github.event_name != 'workflow_dispatch'
         id: vllm
         uses: ./.github/actions/setup-vllm
-        env:
-          VLLM_IMAGE: quay.io/higginsd/vllm-cpu:65393ee064-qwen3
-          VLLM_MODE: inference
+        env:
+          VLLM_IMAGE: ${{ vars.VLLM_CPU_IMAGE || 'quay.io/opendatahub/vllm-cpu:latest' }}
+          VLLM_MODE: ${{ env.VLLM_MODE }}
@@
-          docker logs vllm > logs/vllm.log 2>&1 || echo "Failed to get vllm logs" > logs/vllm.log
+          docker logs "${VLLM_CONTAINER_NAME}" > logs/vllm.log 2>&1 || echo "Failed to get vllm logs" > logs/vllm.log
@@
-          docker rm -f vllm llama-stack postgres
+          docker rm -f "${VLLM_CONTAINER_NAME}" llama-stack postgres

Also applies to: 148-157, 188-192

🤖 Fix all issues with AI agents

In @.github/actions/setup-vllm/action.yml:
- Around line 9-26: The VLLM_ARGS model paths omit the namespace directory used
by the Containerfile cache, causing missing model files; update the VLLM_ARGS
assignments (controlled by VLLM_MODE) so the --model paths include the repo
namespace directory (e.g., change /root/.cache/Qwen3-0.6B to
/root/.cache/Qwen/Qwen3-0.6B and /root/.cache/granite-embedding-125m-english to
/root/.cache/ibm-granite/granite-embedding-125m-english) while preserving the
--served-model-name values and the docker run usage of $VLLM_ARGS and
$VLLM_MODE.
- Around line 9-25: Add a pre-run validation that VLLM_IMAGE is set and
non-empty and fail fast with a clear error before constructing VLLM_ARGS or
calling docker run; check the environment variable VLLM_IMAGE and if empty print
a descriptive error (e.g., "Error: VLLM_IMAGE must be set to a valid Docker
image") and exit with non-zero status so the subsequent docker run --name
vllm-$VLLM_MODE $VLLM_IMAGE $VLLM_ARGS call never executes with a missing image
name.

In `@vllm/Containerfile`:
- Line 5: Update the RUN invocation that currently uses `uv pip install
"huggingface-hub[cli]"` so it succeeds when no virtualenv is active: add the
`--system` flag to `uv pip install` (i.e., `uv --system pip install ...`) or,
alternatively, detect `uv` availability and fall back to plain `pip install
"huggingface-hub[cli]"` if `uv` is not present; adjust the `RUN` line that
contains `uv pip install "huggingface-hub[cli]"` accordingly.

In `@vllm/README.md`:
- Around line 34-65: The README examples use --model paths without the repo
namespace but the Containerfile stores models under /root/.cache/<repo_id>, so
update the two docker run examples to include the repo namespace in the --model
argument (change --model /root/.cache/Qwen3-0.6B to --model
/root/.cache/Qwen/Qwen3-0.6B and change --model
/root/.cache/granite-embedding-125m-english to --model
/root/.cache/ibm-granite/granite-embedding-125m-english) so the container can
find the downloaded model.

♻️ Duplicate comments (3)

.github/workflows/redhat-distro-container.yml (1)
123-130: Stop hardcoding a personal VLLM_IMAGE reference

This still points at quay.io/higginsd/..., which is brittle for CI and downstream users.
Proposed fix (use repo/organization variable with a safe default)
       - name: Setup vllm for image test
         if: github.event_name != 'workflow_dispatch'
         id: vllm
         uses: ./.github/actions/setup-vllm
         env:
-          VLLM_IMAGE: quay.io/higginsd/vllm-cpu:65393ee064-qwen3
+          VLLM_IMAGE: ${{ vars.VLLM_CPU_IMAGE || 'quay.io/opendatahub/vllm-cpu:latest' }}
           VLLM_MODE: inference
vllm/Containerfile (2)
1-1: Document + pin base image provenance (digest) for supply-chain clarity

This base image origin/provenance still isn’t explained, and it’s not pinned to a digest.

21-32: Avoid passing HF tokens on the command line

Even with BuildKit secrets, --token "${HF_TOKEN}" can leak via process args/logging; hf download supports HF_TOKEN env.
Proposed fix (HF_TOKEN env, optional secret mount)
-RUN --mount=type=secret,id=hf_token \
+RUN --mount=type=secret,id=hf_token,required=false \
     for model in "${INFERENCE_MODEL}" "${EMBEDDING_MODEL}"; do \
         model_path="${MODEL_CACHE_DIR}/${model}" && \
         mkdir -p "${model_path}" && \
         if [ -f /run/secrets/hf_token ]; then \
-            HF_TOKEN=$(cat /run/secrets/hf_token) && \
-            hf download "${model}" --local-dir "${model_path}" --token "${HF_TOKEN}"; \
+            HF_TOKEN="$(cat /run/secrets/hf_token)" hf download "${model}" --local-dir "${model_path}"; \
         else \
             hf download "${model}" --local-dir "${model_path}"; \
         fi && \
         rm -rf /root/.cache/huggingface "${model_path}/original"; \
     done
docker buildkit Dockerfile secret mount `required=` default behavior + huggingface-hub `hf download` honoring HF_TOKEN environment variable

🧹 Nitpick comments (1)

vllm/README.md (1)
7-13: Consider using vllm/ as build context (faster + clearer)

docker build . --file vllm/Containerfile works, but it sends the whole repo as context; vllm/ is sufficient here.
Proposed tweak
-DOCKER_BUILDKIT=1 docker build . \
+DOCKER_BUILDKIT=1 docker build vllm \
     --build-arg INFERENCE_MODEL="Qwen/Qwen3-0.6B" \
     --build-arg EMBEDDING_MODEL="ibm-granite/granite-embedding-125m-english" \
     --tag opendatahub/vllm-cpu:Qwen3-granite-embedding-125m \
-    --file vllm/Containerfile
+    --file vllm/Containerfile
Also applies to: 19-27

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 34b71dd and bd12c1d.

📒 Files selected for processing (5)

.github/actions/setup-vllm/action.yml
.github/workflows/redhat-distro-container.yml
.github/workflows/vllm-cpu-container.yml
vllm/Containerfile
vllm/README.md

🚧 Files skipped from review as they are similar to previous changes (1)

.github/workflows/vllm-cpu-container.yml

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: build-test-push (linux/amd64)
GitHub Check: build-test-push (linux/amd64)
GitHub Check: Summary

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

.github/actions/setup-vllm/action.yml

vllm/Containerfile

vllm/README.md

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI agents

In @.github/actions/setup-vllm/action.yml:
- Around line 14-15: The legacy VLLM_ARGS assignment for when VLLM_MODE ==
"legacy" uses the wrong model path (/root/.cache/Qwen3-0.6B); change the model
path in the VLLM_ARGS string to include the namespace directory (e.g.,
/root/.cache/Qwen/Qwen3-0.6B) so it matches how models are downloaded; update
the VLLM_ARGS line (the string containing --model and --served-model-name) to
reference /root/.cache/<repo_id>/<model_name> (for example
/root/.cache/Qwen/Qwen3-0.6B).

In @.github/workflows/vllm-cpu-container.yml:
- Around line 46-47: Update the checkout action reference used in the workflow:
replace the pinned SHA value in the uses line (currently
"actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8") with the patch tag
"actions/checkout@v6.0.2" so the workflow uses the latest v6.0.2 release; commit
the change to the workflow file.

In `@vllm/README.md`:
- Around line 1-68: The README ends with a truncated sentence ("Additional vLLM
arguments can be passed directly"); update the vllm/README.md final paragraph to
complete the thought by stating that additional vLLM CLI flags can be appended
when running the container (i.e., pass extra vLLM arguments after the image name
in the docker run command), and ensure the line containing that phrase
("Additional vLLM arguments can be passed directly") is replaced with a clear,
finished sentence referencing the runtime invocation.

♻️ Duplicate comments (3)

vllm/Containerfile (2)
5-5: Add --system flag to uv pip install.

Without --system, uv pip install will refuse to install into the system Python when not running inside a virtualenv. This will cause the build to fail.
🐛 Proposed fix
-RUN uv pip install "huggingface-hub[cli]"
+RUN uv pip install --system "huggingface-hub[cli]"
21-32: Pass HF token via environment variable instead of CLI argument.

Reading the token into a shell variable and passing it via --token "${HF_TOKEN}" can expose the token in process listings. The hf download command honors the HF_TOKEN environment variable directly.
🔒 Proposed fix
 RUN --mount=type=secret,id=hf_token \
     for model in "${INFERENCE_MODEL}" "${EMBEDDING_MODEL}"; do \
         model_path="${MODEL_CACHE_DIR}/${model}" && \
         mkdir -p "${model_path}" && \
         if [ -f /run/secrets/hf_token ]; then \
-            HF_TOKEN=$(cat /run/secrets/hf_token) && \
-            hf download "${model}" --local-dir "${model_path}" --token "${HF_TOKEN}"; \
+            HF_TOKEN=$(cat /run/secrets/hf_token) hf download "${model}" --local-dir "${model_path}"; \
         else \
             hf download "${model}" --local-dir "${model_path}"; \
         fi && \
         rm -rf /root/.cache/huggingface "${model_path}/original"; \
     done
.github/actions/setup-vllm/action.yml (1)
9-27: Add validation for VLLM_IMAGE environment variable.

The docker run command uses $VLLM_IMAGE without validation. If unset or empty, Docker will fail with the confusing error "docker run" requires at least 1 argument. Add an early check for a clear error message.
🐛 Proposed fix
         # Set VLLM_ARGS based on VLLM_MODE
+        if [[ -z "${VLLM_IMAGE:-}" ]]; then
+          echo "Error: VLLM_IMAGE environment variable must be set" >&2
+          exit 1
+        fi
+
         if [[ "$VLLM_MODE" == "inference" ]]; then

🧹 Nitpick comments (1)

.github/workflows/vllm-cpu-container.yml (1)

49-54: Tag generation produces simplified names - verify this is intentional.

The tag extraction logic removes version info:

Qwen/Qwen3-0.6B → tag Qwen3

ibm-granite/granite-embedding-125m-english → tag granite

This creates short, readable tags but loses version specificity. If the same model family with different versions is used, tags could collide. Verify this behavior is acceptable for your use case.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bd12c1d and f6a427a.

📒 Files selected for processing (5)

.github/actions/setup-vllm/action.yml
.github/workflows/redhat-distro-container.yml
.github/workflows/vllm-cpu-container.yml
vllm/Containerfile
vllm/README.md

🚧 Files skipped from review as they are similar to previous changes (1)

.github/workflows/redhat-distro-container.yml

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: build-test-push (linux/amd64)
GitHub Check: build-test-push (linux/amd64)
GitHub Check: Summary

🔇 Additional comments (2)

.github/actions/setup-vllm/action.yml (1)

29-39: Health check logic looks good.

The port determination correctly maps embedding mode to 8001 and others to 8000. The 15-minute timeout is reasonable for CPU-based model loading.

.github/workflows/vllm-cpu-container.yml (1)

67-151: Build, test, and publish flow looks well-structured.

Build step correctly passes both INFERENCE_MODEL and EMBEDDING_MODEL as build-args

Test steps use the correct full image tag ${{ env.IMAGE_NAME }}:${{ env.INFERENCE_TAG }}-${{ env.EMBEDDING_TAG }}

Publish step mirrors the build configuration correctly

Log gathering with if: always() ensures debugging info is captured on failure

Previous issues regarding missing build-args and image tag mismatches have been addressed.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

.github/actions/setup-vllm/action.yml

.github/workflows/vllm-cpu-container.yml

vllm/README.md

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In @.github/workflows/vllm-cpu-container.yml:
- Around line 145-159: The publish step ("Publish image to Quay.io", id:
publish, using docker/build-push-action) is missing the secrets input so gated
model downloads fail; add a secrets block under the step's with: parameters and
pass HF_TOKEN (e.g., secrets: | HF_TOKEN=${{ secrets.HF_TOKEN }}) so the
docker/build-push-action has the HF_TOKEN available during the push/build like
the earlier build step.
- Around line 67-81: The build step ("Build image", id: build) doesn't pass the
secret needed by the Containerfile's RUN --mount=type=secret,id=hf_token, so
gated model downloads will run unauthenticated; update the
docker/build-push-action invocation (uses: docker/build-push-action@...) to
include the secrets input and map the HF token (e.g. add a secrets block under
with that provides hf_token=${{ secrets.HF_TOKEN }}), ensuring the Containerfile
can access the hf_token secret during build.

♻️ Duplicate comments (3)

vllm/Containerfile (2)
5-5: Add --system flag to uv pip install.

Without --system, uv pip install refuses to install into the system Python when no virtualenv is active.
Proposed fix
-RUN uv pip install "huggingface-hub[cli]"
+RUN uv pip install --system "huggingface-hub[cli]"
21-32: Pass HF_TOKEN as environment variable instead of command-line argument.

The --token "${HF_TOKEN}" argument can expose the token in process listings or verbose logs. The hf download command honors the HF_TOKEN environment variable directly.
Proposed fix
 RUN --mount=type=secret,id=hf_token \
     for model in "${INFERENCE_MODEL}" "${EMBEDDING_MODEL}"; do \
         model_path="${MODEL_CACHE_DIR}/${model}" && \
         mkdir -p "${model_path}" && \
         if [ -f /run/secrets/hf_token ]; then \
-            HF_TOKEN=$(cat /run/secrets/hf_token) && \
-            hf download "${model}" --local-dir "${model_path}" --token "${HF_TOKEN}"; \
+            HF_TOKEN=$(cat /run/secrets/hf_token) hf download "${model}" --local-dir "${model_path}"; \
         else \
             hf download "${model}" --local-dir "${model_path}"; \
         fi && \
         rm -rf /root/.cache/huggingface "${model_path}/original"; \
     done
.github/workflows/vllm-cpu-container.yml (1)
46-47: Update checkout action to latest patch version.

The workflow uses actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 (v6.0.1), but v6.0.2 was released on Jan 9, 2026.
Proposed fix
       - name: Checkout repository
-        uses: actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 # v6.0.1
+        uses: actions/checkout@<v6.0.2-sha> # v6.0.2

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f6a427a and 428fbb6.

📒 Files selected for processing (5)

.github/actions/setup-vllm/action.yml
.github/workflows/redhat-distro-container.yml
.github/workflows/vllm-cpu-container.yml
vllm/Containerfile
vllm/README.md

🚧 Files skipped from review as they are similar to previous changes (3)

.github/workflows/redhat-distro-container.yml
.github/actions/setup-vllm/action.yml
vllm/README.md

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: build-test-push (linux/amd64)
GitHub Check: build-test-push (linux/amd64)
GitHub Check: Summary

🔇 Additional comments (3)

.github/workflows/vllm-cpu-container.yml (3)

83-97: Tests are skipped on workflow_dispatch.

Both inference and embedding tests are conditioned on github.event_name != 'workflow_dispatch', meaning manual triggers publish images without running smoke tests. If this is intentional (e.g., for quick republishing), consider documenting this behavior. Otherwise, the condition should be removed to ensure all published images are tested.

49-54: Verify tag derivation logic handles model naming variations correctly.

The tag extraction uses ${var#*/} to strip the org prefix and ${var%-*} to strip the shortest -* suffix. For "Qwen/Qwen3-0.6B", this yields "Qwen3". For models with multiple hyphens like "granite-embedding-125m-english", it yields "granite-embedding-125m".

This could cause tag collisions if two models share the same prefix (e.g., "org/model-v1" and "org/model-v2" both become "model"). Verify this is acceptable for your model selection, or consider using a hash-based or full-name approach for uniqueness.

99-128: LGTM!

Good practice to gather logs and system information on failure, with proper artifact upload and retention policy.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

.github/workflows/vllm-cpu-container.yml

nathan-weinberg · 2026-01-15T04:28:04Z

CI was passing on a previous run, not sure why it is failing now

.github/actions/setup-vllm/action.yml

vllm/Containerfile

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In @.github/workflows/vllm-cpu-container.yml:
- Around line 49-54: The current tag derivation strips everything after the last
dash using the `${INFERENCE_TEMP%-*}` and `${EMBEDDING_TEMP%-*}` expansions
which can remove meaningful parts of model names and create collisions for
custom inputs (via workflow_dispatch); update the logic to either (a) stop
truncating by dash and simply set tags to the final path segment after `/` (use
the already-derived INFERENCE_TEMP/EMBEDDING_TEMP directly), or (b) derive a
compact unique tag by hashing the full model name and appending/including the
last path segment, and document the chosen behavior so users know how tags are
formed (reference the variables INFERENCE_MODEL, EMBEDDING_MODEL,
INFERENCE_TEMP, EMBEDDING_TEMP, INFERENCE_TAG, EMBEDDING_TAG when making the
change).

♻️ Duplicate comments (7)

vllm/Containerfile (2)
5-5: Add --system flag to uv pip install.

The uv pip install command will fail when not running inside a virtualenv unless --system is specified. The vLLM CPU base image doesn't activate a virtualenv by default.
Proposed fix
-RUN uv pip install "huggingface-hub[cli]"
+RUN uv pip install --system "huggingface-hub[cli]"
21-32: Pass HF_TOKEN as environment variable instead of command-line argument.

Reading the token into a shell variable and passing via --token "${HF_TOKEN}" can expose it in process listings. The hf download command honors the HF_TOKEN environment variable directly.
Proposed fix
 RUN --mount=type=secret,id=hf_token \
     for model in "${INFERENCE_MODEL}" "${EMBEDDING_MODEL}"; do \
         model_path="${MODEL_CACHE_DIR}/${model}" && \
         mkdir -p "${model_path}" && \
         if [ -f /run/secrets/hf_token ]; then \
-            HF_TOKEN=$(cat /run/secrets/hf_token) && \
-            hf download "${model}" --local-dir "${model_path}" --token "${HF_TOKEN}"; \
+            HF_TOKEN=$(cat /run/secrets/hf_token) hf download "${model}" --local-dir "${model_path}"; \
         else \
             hf download "${model}" --local-dir "${model_path}"; \
         fi && \
         rm -rf /root/.cache/huggingface "${model_path}/original"; \
     done
.github/actions/setup-vllm/action.yml (2)
16-18: Fix model path in legacy mode - missing namespace directory.

The legacy mode path /root/.cache/Qwen3-0.6B is missing the namespace directory. The Containerfile downloads models to /root/.cache/<repo_id> (e.g., /root/.cache/Qwen/Qwen3-0.6B).
Proposed fix
         elif [[ "$VLLM_MODE" == "legacy" ]]; then
-          VLLM_ARGS="--host 0.0.0.0 --port 8000 --enable-auto-tool-choice --tool-call-parser hermes --model /root/.cache/Qwen3-0.6B --served-model-name Qwen/Qwen3-0.6B --max-model-len 8192"
+          VLLM_ARGS="--host 0.0.0.0 --port 8000 --enable-auto-tool-choice --tool-call-parser hermes --model /root/.cache/Qwen/Qwen3-0.6B --served-model-name Qwen/Qwen3-0.6B --max-model-len 8192"
           VLLM_PORT=8000
9-22: Add validation for VLLM_IMAGE to fail fast with a clear error.

The docker run command on line 29 uses $VLLM_IMAGE without validation. If this environment variable is unset or empty, docker run will fail with a confusing error message.
Proposed fix
         # Set VLLM_ARGS based on VLLM_MODE
+        if [[ -z "${VLLM_IMAGE:-}" ]]; then
+          echo "Error: VLLM_IMAGE environment variable must be set" >&2
+          exit 1
+        fi
+
         if [[ "$VLLM_MODE" == "inference" ]]; then
.github/workflows/vllm-cpu-container.yml (3)
67-81: Missing secrets input for gated model access.

The Containerfile uses RUN --mount=type=secret,id=hf_token to access the HuggingFace token, but the build step doesn't pass this secret. Builds requiring gated model access will fail silently—the download proceeds without authentication.
Proposed fix
       - name: Build image
         id: build
         uses: docker/build-push-action@263435318d21b8e681c14492fe198d362a7d2c83 # v6.18.0
         with:
           context: .
           file: vllm/Containerfile
           platforms: ${{ matrix.platform }}
           push: false
           tags: ${{ env.IMAGE_NAME }}:${{ env.INFERENCE_TAG }}-${{ env.EMBEDDING_TAG }}
           load: true  # needed to load for smoke test
           cache-from: type=gha
           cache-to: type=gha,mode=max
           build-args: |
             INFERENCE_MODEL=${{ env.INFERENCE_MODEL }}
             EMBEDDING_MODEL=${{ env.EMBEDDING_MODEL }}
+          secrets: |
+            hf_token=${{ secrets.HF_TOKEN }}
145-159: Missing secrets input in publish step.

Same issue as the build step—the publish step also needs the hf_token secret for gated model downloads.
Proposed fix
       - name: Publish image to Quay.io
         id: publish
         if: contains(fromJSON('["push", "workflow_dispatch"]'), github.event_name)
         uses: docker/build-push-action@263435318d21b8e681c14492fe198d362a7d2c83 # v6.18.0
         with:
           context: .
           file: vllm/Containerfile
           platforms: ${{ matrix.platform }}
           push: true
           tags: ${{ env.IMAGE_NAME }}:${{ env.INFERENCE_TAG }}-${{ env.EMBEDDING_TAG }}
           cache-from: type=gha
           cache-to: type=gha,mode=max
           build-args: |
             INFERENCE_MODEL=${{ env.INFERENCE_MODEL }}
             EMBEDDING_MODEL=${{ env.EMBEDDING_MODEL }}
+          secrets: |
+            hf_token=${{ secrets.HF_TOKEN }}
46-47: Update checkout action to v6.0.2.

The workflow pins actions/checkout@8e8c483db84b4bee98b60c0593521ed34d9990e8 (v6.0.1), but v6.0.2 is available.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 428fbb6 and f28e425.

📒 Files selected for processing (5)

.github/actions/setup-vllm/action.yml
.github/workflows/redhat-distro-container.yml
.github/workflows/vllm-cpu-container.yml
vllm/Containerfile
vllm/README.md

🚧 Files skipped from review as they are similar to previous changes (2)

.github/workflows/redhat-distro-container.yml
vllm/README.md

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: build-test-push (linux/amd64)
GitHub Check: build-test-push (linux/amd64)
GitHub Check: Summary

🔇 Additional comments (4)

vllm/Containerfile (1)

7-19: LGTM - Build argument validation.

The validation ensures both INFERENCE_MODEL and EMBEDDING_MODEL are provided at build time, failing fast with clear error messages.

.github/actions/setup-vllm/action.yml (1)

32-36: LGTM - Health check with timeout.

The readiness check uses timeout 900 (15 minutes) with a health endpoint poll, which is appropriate for model loading time.

.github/workflows/vllm-cpu-container.yml (2)

99-128: LGTM - Comprehensive logging and artifact collection.

The logging step captures container logs, system info (disk, memory, docker state), and uploads them as artifacts with 7-day retention. The if: always() ensures logs are captured even on failure.

83-97: LGTM - Test setup with correct image tags.

Both inference and embedding test steps now correctly use the full image tag ${{ env.IMAGE_NAME }}:${{ env.INFERENCE_TAG }}-${{ env.EMBEDDING_TAG }} matching the built image.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

.github/workflows/vllm-cpu-container.yml

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In @.github/actions/free-disk-space/action.yml:
- Around line 18-20: The docker rmi invocation is malformed because the -f flag
is inside the command substitution; update the command so the -f flag is passed
to docker rmi (e.g., call docker rmi -f "$(docker image ls -aq)" or place -f
before the command substitution) so the force flag is not treated as part of the
image name; adjust the line containing sudo docker rmi "$(docker image ls -aq)
-f" to move -f outside the quotes and ensure proper quoting of the "$(docker
image ls -aq)" substitution.

♻️ Duplicate comments (2)

.github/actions/setup-vllm/action.yml (2)
16-18: Legacy mode model path is still missing the namespace directory.

The legacy mode uses /root/.cache/Qwen3-0.6B but the Containerfile downloads models to /root/.cache/<namespace>/<model>. This should be /root/.cache/Qwen/Qwen3-0.6B to match the inference mode path.
🐛 Proposed fix
         elif [[ "$VLLM_MODE" == "legacy" ]]; then
-          VLLM_ARGS="--host 0.0.0.0 --port 8000 --enable-auto-tool-choice --tool-call-parser hermes --model /root/.cache/Qwen3-0.6B --served-model-name Qwen/Qwen3-0.6B --max-model-len 8192"
+          VLLM_ARGS="--host 0.0.0.0 --port 8000 --enable-auto-tool-choice --tool-call-parser hermes --model /root/.cache/Qwen/Qwen3-0.6B --served-model-name Qwen/Qwen3-0.6B --max-model-len 8192"
           VLLM_PORT=8000
9-30: Add validation for VLLM_IMAGE environment variable.

The docker run command uses $VLLM_IMAGE without checking if it's set. If unset or empty, docker will fail with the confusing error "docker run" requires at least 1 argument. Add an early check alongside the VLLM_MODE validation.
🛠️ Proposed fix
         # Set VLLM_ARGS based on VLLM_MODE
+        if [[ -z "${VLLM_IMAGE:-}" ]]; then
+          echo "Error: VLLM_IMAGE must be set to a valid Docker image"
+          exit 1
+        fi
+
         if [[ "$VLLM_MODE" == "inference" ]]; then

🧹 Nitpick comments (2)

.github/actions/free-disk-space/action.yml (2)
70-76: Inconsistent service stop/disable order.

For postfix (lines 65-66), you call stop then disable. For cups, it's disable then stop. While both work, stopping before disabling is the conventional order to ensure a clean shutdown before preventing future starts.
Suggested fix for consistency
          # Remove Cups because we're not printing anything
          cups_packages=$(dnf list installed | grep cups || echo "")
          if [[ ! -z "${cups_packages}" ]]; then
-              sudo systemctl disable cups
               sudo systemctl stop cups
+              sudo systemctl disable cups
               sudo dnf -y remove cups
          fi
89-93: Consider broader Xcode cleanup.

The pattern Xcode_15* only removes Xcode 15.x versions. Future GitHub-hosted runners may have newer versions, reducing the effectiveness of this cleanup. Consider a broader glob or removing multiple known versions.
Suggested broader cleanup
     - name: Free Disk Space MacOS
       if: runner.os == 'macOS'
       run: |
-        sudo rm -rf /System/Volumes/Data/Applications/Xcode_15*
+        sudo rm -rf /System/Volumes/Data/Applications/Xcode_* || true
       shell: bash

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f28e425 and 3022272.

📒 Files selected for processing (6)

.github/actions/free-disk-space/action.yml
.github/actions/setup-vllm/action.yml
.github/workflows/redhat-distro-container.yml
.github/workflows/vllm-cpu-container.yml
vllm/Containerfile
vllm/README.md

🚧 Files skipped from review as they are similar to previous changes (4)

.github/workflows/redhat-distro-container.yml
vllm/Containerfile
.github/workflows/vllm-cpu-container.yml
vllm/README.md

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: build-test-push (linux/amd64)
GitHub Check: build-test-push (linux/amd64)
GitHub Check: Summary

🔇 Additional comments (1)

.github/actions/setup-vllm/action.yml (1)

32-36: LGTM!

The health check logic correctly uses the dynamic VLLM_PORT based on mode, with a reasonable 15-minute timeout for model loading. The /health endpoint and retry interval are appropriate.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

.github/actions/free-disk-space/action.yml

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (3)

.github/actions/setup-vllm/action.yml (2)

24-30: ⚠️ Potential issue | 🟠 Major

Code Quality: Add fail-fast validation for VLLM_IMAGE before docker run.

Line 29 can execute with an empty image reference and fail with a non-actionable Docker error. Validate early and exit with a clear message.

Proposed fix

-        # Set VLLM_ARGS based on VLLM_MODE
+        # Validate required env first
+        if [[ -z "${VLLM_IMAGE:-}" ]]; then
+          echo "Error: VLLM_IMAGE must be set to a valid Docker image" >&2
+          exit 1
+        fi
+
+        # Set VLLM_ARGS based on VLLM_MODE
         if [[ "$VLLM_MODE" == "inference" ]]; then

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.github/actions/setup-vllm/action.yml around lines 24 - 30, Before executing
the docker run block, validate the VLLM_IMAGE environment variable (VLLM_IMAGE)
and fail fast if it's empty; add a check immediately prior to the docker run
block in action.yml that tests for an empty VLLM_IMAGE, writes a clear error
message to stderr (including the variable name) and exits with a non‑zero status
so the action doesn't attempt docker run with a blank image reference.

16-18: ⚠️ Potential issue | 🔴 Critical

Code Quality: Fix legacy model path to match cached directory layout.

Line 17 still points to /root/.cache/Qwen3-0.6B, which does not match the namespace-based cache path used elsewhere and will break legacy startup.

Proposed fix

         elif [[ "$VLLM_MODE" == "legacy" ]]; then
-          VLLM_ARGS="--host 0.0.0.0 --port 8000 --enable-auto-tool-choice --tool-call-parser hermes --model /root/.cache/Qwen3-0.6B --served-model-name Qwen/Qwen3-0.6B --max-model-len 8192"
+          VLLM_ARGS="--host 0.0.0.0 --port 8000 --enable-auto-tool-choice --tool-call-parser hermes --model /root/.cache/Qwen/Qwen3-0.6B --served-model-name Qwen/Qwen3-0.6B --max-model-len 8192"
           VLLM_PORT=8000

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.github/actions/setup-vllm/action.yml around lines 16 - 18, The legacy
branch for VLLM startup uses an incorrect model path; update the VLLM_ARGS
string in the elif [[ "$VLLM_MODE" == "legacy" ]] block so the --model path
matches the namespace-based cache layout used elsewhere (make it consistent with
the served-model-name), e.g. replace /root/.cache/Qwen3-0.6B with the namespaced
cache path such as /root/.cache/Qwen/Qwen3-0.6B so VLLM_ARGS and
--served-model-name Qwen/Qwen3-0.6B point to the same cached model.

.github/actions/free-disk-space/action.yml (1)

18-20: ⚠️ Potential issue | 🟠 Major

Code Quality: docker rmi invocation is malformed and does not pass -f correctly.

Line 19 places -f inside the quoted argument, so it is treated as text rather than a flag.

Proposed fix

-        if command -v docker 2>&1 >/dev/null ; then
-          sudo docker rmi "$(docker image ls -aq) -f" >/dev/null 2>&1 || true
+        if command -v docker >/dev/null 2>&1; then
+          images="$(docker image ls -aq)"
+          if [[ -n "${images}" ]]; then
+            sudo docker rmi -f ${images} >/dev/null 2>&1 || true
+          fi
         fi

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.github/actions/free-disk-space/action.yml around lines 18 - 20, The docker
removal command is malformed: the -f flag is placed inside the quoted argument
so it’s treated as text; change the invocation so -f is a separate flag before
the list of IDs (use docker rmi -f with the output of docker image ls -aq as the
argument) and ensure the substitution is not quoted so flags are parsed
correctly (target the line invoking sudo docker rmi and the command substitution
docker image ls -aq).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/actions/free-disk-space/action.yml:
- Around line 83-85: The step that removes DNF caches uses "rm -rf
/var/cache/dnf*" without sudo which can fail on locked-down runners and break
the action; update the branch handling where "sudo dnf clean all" is followed by
the cache deletion to call sudo for the removal (i.e., use sudo for the rm
command), ensuring the runner has permission and the action won't exit due to a
permission error when executing the cache deletion command.

In @.github/workflows/vllm-cpu-container.yml:
- Around line 12-19: The workflow .github/workflows/vllm-cpu-container.yml
currently restricts triggers to only 'vllm/Containerfile', which can skip CI
when the workflow or related composite actions change; update the
push/pull_request path filters (the paths keys in vllm-cpu-container.yml) to
include the workflow and action files (for example add '.github/workflows/**'
and any composite action dirs like '.github/actions/**' and/or remove the paths
filter entirely) so changes to workflow files always trigger the pipeline.
- Around line 131-136: The "Cleanup vllm containers" step currently always runs
and fails when containers don't exist; make it idempotent by changing the run
command in that step so it does not return non-zero if vllm-inference or
vllm-embedding are absent (e.g., check for each container with docker ps -a or
docker container inspect before removing, or append "|| true" to docker rm -f
vllm-inference vllm-embedding) so the step succeeds even when skipped earlier
for workflow_dispatch; update the run block in the "Cleanup vllm containers"
step accordingly.

---

Duplicate comments:
In @.github/actions/free-disk-space/action.yml:
- Around line 18-20: The docker removal command is malformed: the -f flag is
placed inside the quoted argument so it’s treated as text; change the invocation
so -f is a separate flag before the list of IDs (use docker rmi -f with the
output of docker image ls -aq as the argument) and ensure the substitution is
not quoted so flags are parsed correctly (target the line invoking sudo docker
rmi and the command substitution docker image ls -aq).

In @.github/actions/setup-vllm/action.yml:
- Around line 24-30: Before executing the docker run block, validate the
VLLM_IMAGE environment variable (VLLM_IMAGE) and fail fast if it's empty; add a
check immediately prior to the docker run block in action.yml that tests for an
empty VLLM_IMAGE, writes a clear error message to stderr (including the variable
name) and exits with a non‑zero status so the action doesn't attempt docker run
with a blank image reference.
- Around line 16-18: The legacy branch for VLLM startup uses an incorrect model
path; update the VLLM_ARGS string in the elif [[ "$VLLM_MODE" == "legacy" ]]
block so the --model path matches the namespace-based cache layout used
elsewhere (make it consistent with the served-model-name), e.g. replace
/root/.cache/Qwen3-0.6B with the namespaced cache path such as
/root/.cache/Qwen/Qwen3-0.6B so VLLM_ARGS and --served-model-name
Qwen/Qwen3-0.6B point to the same cached model.

ℹ️ Review info

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between f7bb982 and 54121a5.

📒 Files selected for processing (6)

.github/actions/free-disk-space/action.yml
.github/actions/setup-vllm/action.yml
.github/workflows/redhat-distro-container.yml
.github/workflows/vllm-cpu-container.yml
vllm/Containerfile
vllm/README.md

🚧 Files skipped from review as they are similar to previous changes (3)

vllm/Containerfile
vllm/README.md
.github/workflows/redhat-distro-container.yml

.github/actions/free-disk-space/action.yml

.github/workflows/vllm-cpu-container.yml

skamenan7

LGTM, thanks

.github/workflows/vllm-cpu-container.yml

this commit adds a Containerfile and README to allow users to build vLLM CPU images with pre-downloaded models it also adds a CI action that will publish an official image to the OpenDataHub org on Quay Signed-off-by: Nathan Weinberg <nweinber@redhat.com>

…urce Replace the multi-stage build that compiled vLLM from source with the official vllm/vllm-openai-cpu:v0.17.1 image. This dramatically reduces build time and image complexity while retaining model download support. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Nathan Weinberg <nweinber@redhat.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Nathan Weinberg <nweinber@redhat.com>

nathan-weinberg force-pushed the vllm-cpu-factory branch from 5193fc8 to 5125766 Compare January 14, 2026 02:10

nathan-weinberg marked this pull request as ready for review January 14, 2026 02:10

nathan-weinberg requested review from Artemon-line, kami619 and kelbrown20 as code owners January 14, 2026 02:10

nathan-weinberg requested review from derekhiggins and leseb January 14, 2026 02:10

nathan-weinberg force-pushed the vllm-cpu-factory branch from 5125766 to f688455 Compare January 14, 2026 02:13

coderabbitai bot reviewed Jan 14, 2026

View reviewed changes

nathan-weinberg added the do-not-merge Apply to PRs that should not be merged (yet) label Jan 14, 2026

nathan-weinberg force-pushed the vllm-cpu-factory branch from f688455 to 34b71dd Compare January 14, 2026 18:54

coderabbitai bot reviewed Jan 14, 2026

View reviewed changes

.github/workflows/redhat-distro-container.yml Show resolved Hide resolved

.github/workflows/vllm-cpu-container.yml Outdated Show resolved Hide resolved

.github/workflows/vllm-cpu-container.yml Show resolved Hide resolved

vllm/Containerfile Show resolved Hide resolved

nathan-weinberg force-pushed the vllm-cpu-factory branch 6 times, most recently from 06199c1 to bd12c1d Compare January 14, 2026 19:31

coderabbitai bot reviewed Jan 14, 2026

View reviewed changes

.github/actions/setup-vllm/action.yml Show resolved Hide resolved

.github/actions/setup-vllm/action.yml Show resolved Hide resolved

vllm/Containerfile Outdated Show resolved Hide resolved

vllm/README.md Outdated Show resolved Hide resolved

nathan-weinberg force-pushed the vllm-cpu-factory branch from bd12c1d to f6a427a Compare January 14, 2026 20:02

nathan-weinberg removed the do-not-merge Apply to PRs that should not be merged (yet) label Jan 14, 2026

coderabbitai bot reviewed Jan 14, 2026

View reviewed changes

.github/actions/setup-vllm/action.yml Show resolved Hide resolved

.github/workflows/vllm-cpu-container.yml Show resolved Hide resolved

vllm/README.md Outdated Show resolved Hide resolved

nathan-weinberg force-pushed the vllm-cpu-factory branch 2 times, most recently from 1ce168f to 428fbb6 Compare January 14, 2026 20:36

coderabbitai bot reviewed Jan 14, 2026

View reviewed changes

.github/workflows/vllm-cpu-container.yml Show resolved Hide resolved

.github/workflows/vllm-cpu-container.yml Show resolved Hide resolved

derekhiggins reviewed Jan 15, 2026

View reviewed changes

.github/actions/setup-vllm/action.yml Outdated Show resolved Hide resolved

.github/actions/setup-vllm/action.yml Show resolved Hide resolved

vllm/Containerfile Outdated Show resolved Hide resolved

derekhiggins mentioned this pull request Jan 15, 2026

Update vllm-cpu organization image llamastack/llama-stack#4436

Open

nathan-weinberg force-pushed the vllm-cpu-factory branch from 428fbb6 to e00abaf Compare January 15, 2026 15:28

nathan-weinberg force-pushed the vllm-cpu-factory branch 2 times, most recently from f28e425 to 2b0424d Compare January 15, 2026 15:39

coderabbitai bot reviewed Jan 15, 2026

View reviewed changes

.github/workflows/vllm-cpu-container.yml Show resolved Hide resolved

nathan-weinberg force-pushed the vllm-cpu-factory branch from 2b0424d to 6404e46 Compare January 15, 2026 15:42

ktdreyer mentioned this pull request Jan 15, 2026

[CI/Build] add directions for CPU image upload to Docker Hub vllm-project/vllm#32032

Merged

5 tasks

nathan-weinberg force-pushed the vllm-cpu-factory branch from 6404e46 to 3022272 Compare January 16, 2026 16:00

coderabbitai bot reviewed Jan 16, 2026

View reviewed changes

.github/actions/free-disk-space/action.yml Show resolved Hide resolved

nathan-weinberg force-pushed the vllm-cpu-factory branch from 3022272 to f7bb982 Compare January 16, 2026 18:06

nathan-weinberg force-pushed the vllm-cpu-factory branch from f7bb982 to 54121a5 Compare March 3, 2026 19:56

coderabbitai bot reviewed Mar 3, 2026

View reviewed changes

.github/actions/free-disk-space/action.yml Show resolved Hide resolved

.github/workflows/vllm-cpu-container.yml Show resolved Hide resolved

.github/workflows/vllm-cpu-container.yml Show resolved Hide resolved

nathan-weinberg force-pushed the vllm-cpu-factory branch 7 times, most recently from 4e5157d to f63cb27 Compare March 11, 2026 15:04

nathan-weinberg requested a review from derekhiggins March 11, 2026 15:45

nathan-weinberg force-pushed the vllm-cpu-factory branch 2 times, most recently from ae17eaf to bcae7fc Compare March 12, 2026 14:19

This was referenced Mar 12, 2026

[Feature]: Publish CPU images compatible with GitHub Actions vllm-project/vllm#36898

Closed

[CI] Add AVX2-only CPU image build for GitHub Actions compatibility vllm-project/vllm#36901

Closed

skamenan7 approved these changes Mar 12, 2026

View reviewed changes

.github/workflows/vllm-cpu-container.yml Show resolved Hide resolved

nathan-weinberg force-pushed the vllm-cpu-factory branch from bcae7fc to 7d4ed24 Compare March 13, 2026 14:23

mergify bot mentioned this pull request Mar 13, 2026

test PR: do not merge #274

Closed

nathan-weinberg and others added 3 commits March 13, 2026 11:25

docs: update vllm README to reflect pre-built base image

3e0425b

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Nathan Weinberg <nweinber@redhat.com>

nathan-weinberg force-pushed the vllm-cpu-factory branch from 8763b84 to 3e0425b Compare March 13, 2026 15:25

Merge branch 'main' into vllm-cpu-factory

3bad17c

Conversation

nathan-weinberg commented Jan 13, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Estimated code review effort

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

nathan-weinberg commented Jan 15, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

skamenan7 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

nathan-weinberg commented Jan 13, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 13, 2026 •

edited

Loading