ci: add multi-arch matrix with MaaS vLLM for ARM64 testing#287
ci: add multi-arch matrix with MaaS vLLM for ARM64 testing#287Artemon-line wants to merge 3 commits intomainfrom
Conversation
📝 WalkthroughWalkthroughThe PR adds multi-architecture CI for amd64 and arm64 using a matrix-driven GitHub Actions workflow, with per-arch runners, Buildx platform selection, and per-arch cache scopes. vLLM/Containerfile generation and publishing steps are gated to amd64; MaaS-based vLLM/embedding endpoints and tokens can override local vLLM settings via environment writes. Smoke and integration test scripts were updated to conditionally inject vLLM/embedding envs and to skip specific integration tests. README documents ARM64 support and differences in fork vs origin testing. Docker cleanup and artifact naming were made arch-aware. Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Security ReviewCWE-532 (Insertion of Sensitive Information into Log Files)
CWE-798 (Use of Hard-coded Credentials) / Incomplete validation of sensitive inputs
CWE-200 (Exposure of Sensitive Information to an Unauthorized Actor)
Action items (must be implemented or documented):
No other immediate CWE/CVE items were identified in the diff; focus review on ensuring secrets handling and conditional gating are implemented exactly as described above. 🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/smoke.sh (1)
3-3:⚠️ Potential issue | 🟡 MinorMissing
-eflag in shell options (CWE-755: Improper Handling of Exceptional Conditions).Per coding guidelines, shell scripts should use
set -euo pipefail. Without-e, commands can fail silently and the script continues, potentially masking errors.Proposed fix
-set -uo pipefail +set -euo pipefail🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/smoke.sh` at line 3, The script currently uses the shell options token "set -uo pipefail" which omits the -e (exit-on-error) flag; update the options token to include -e so it reads "set -euo pipefail" (i.e., modify the "set -uo pipefail" line) to ensure the script exits immediately on any command failure.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/redhat-distro-container.yml:
- Around line 153-166: The run block "Configure MaaS vLLM endpoints" currently
interpolates secrets directly via ${{ env.MAAS_* }}, which can allow shell
metacharacter injection; instead expose those secrets via the job/step env and
write them into GITHUB_ENV using the shell environment variables (e.g.
MAAS_VLLM_URL, MAAS_VLLM_API_TOKEN, MAAS_EMBEDDING_URL,
MAAS_EMBEDDING_API_TOKEN) so the secret values are not expanded in the workflow
YAML; in the step, reference the runtime env vars and append each key/value to
$GITHUB_ENV using a safe, quoted write (e.g. printf or echo with quoting) for
VLLM_URL, VLLM_API_TOKEN, VLLM_EMBEDDING_URL, VLLM_EMBEDDING_API_TOKEN and the
fixed EMBEDDING/INFERENCE model entries to avoid any direct workflow
interpolation.
---
Outside diff comments:
In `@tests/smoke.sh`:
- Line 3: The script currently uses the shell options token "set -uo pipefail"
which omits the -e (exit-on-error) flag; update the options token to include -e
so it reads "set -euo pipefail" (i.e., modify the "set -uo pipefail" line) to
ensure the script exits immediately on any command failure.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 47b922af-9c83-45dd-bd6d-7d35500138ef
📒 Files selected for processing (3)
.github/workflows/redhat-distro-container.ymlREADME.mdtests/smoke.sh
Add matrix strategy for [amd64, arm64] with native GitHub runners. Use MaaS vLLM endpoints for inference/embedding tests on origin PRs, keeping local vllm-cpu fallback for fork PRs on amd64. Fork arm64 PRs are build-only (no runtime tests, since MaaS secrets unavailable). Changes: - Add matrix strategy with ubuntu-latest (amd64) and ubuntu-24.04-arm (arm64) - Add MaaS vLLM/embedding endpoint secrets for remote inference - Make smoke.sh env vars conditional (MaaS vs local vllm vs sentence-transformers) - Gate publish, Slack, and custom build steps on amd64 only - Add per-arch cache scoping and artifact names - Add ARM64 support section to README with fork PR coverage gap note - Map secrets to intermediate env vars in step-level env blocks (security) - Fix Vertex AI model path (publishers/google/models/gemini-2.0-flash) - Fix Vertex AI location (global) Supersedes PR #252. Implements RHAIENG-3509. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Artemy <ahladenk@redhat.com>
4d0ecc5 to
e883da9
Compare
- test_openai_embeddings_with_encoding_format_base64: nomic-embed-text-v1-5 doesn't support dimensions param (always returns 768) - test_openai_completion_logprobs: upstream /v1/completions validates logprobs as bool instead of int per OpenAI spec Verified locally: vLLM 21 passed, OpenAI 40 passed, 0 failed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Artemy <ahladenk@redhat.com>
e883da9 to
e705ffd
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
.github/workflows/redhat-distro-container.yml (2)
108-123:⚠️ Potential issue | 🟠 MajorScheduled arm64 runs are validating the wrong source revision.
On
schedule, Lines 186-200 still run smoke/integration on arm64 when MaaS is set, but Line 109 regeneratesdistribution/Containerfileonly in the amd64 leg. The arm64 nightly lane therefore tests the checked-in Containerfile instead ofmain. Run this step on arm64 forschedule, or skip the scheduled arm64 leg entirely.Suggested fix
- if: matrix.arch == 'amd64' && contains(fromJSON('["workflow_dispatch", "schedule"]'), github.event_name) + if: github.event_name == 'schedule' || (matrix.arch == 'amd64' && github.event_name == 'workflow_dispatch')🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/redhat-distro-container.yml around lines 108 - 123, The step "Generate Containerfile to build an image from an arbitrary llama-stack commit (workflow_dispatch/schedule)" is gated by if: matrix.arch == 'amd64' && contains(...), so scheduled arm64 runs use the checked-in Containerfile; remove the arch restriction so the Containerfile regeneration (the python3 distribution/build.py and sed -i removal of the pip install line) runs for schedule on arm64 too — change the step's if to only check the event (e.g., if: contains(fromJSON('["workflow_dispatch","schedule"]'), github.event_name)) or otherwise add logic to run the step for arm64 when github.event_name == 'schedule'.
290-298:⚠️ Potential issue | 🟠 MajorMajor: Validate
llama_stack_commit_shainput before interpolating in shell commands (CWE-94).
LLAMA_STACK_COMMIT_SHAoriginates fromgithub.event.inputs.llama_stack_commit_sha(a user-supplied workflow_dispatch input) and is used at lines 294–298 within arun:block. Although the values are quoted, they lack explicit validation. An attacker can inject shell commands via input likemain"; echo pwned". The job has access to secrets (OPENAI_API_KEY, MAAS_VLLM_API_TOKEN, etc.), making this exploitable. Validate the input format (reject anything exceptmainor a valid 7–40 character hex commit SHA) before use.Suggested fix
- name: Output custom build information if: matrix.arch == 'amd64' && contains(fromJSON('["workflow_dispatch", "schedule"]'), github.event_name) + env: + _LLAMA_STACK_COMMIT_SHA: ${{ env.LLAMA_STACK_COMMIT_SHA }} run: | + if [[ "$_LLAMA_STACK_COMMIT_SHA" != "main" && ! "$_LLAMA_STACK_COMMIT_SHA" =~ ^[0-9a-fA-F]{7,40}$ ]]; then + echo "Invalid llama_stack_commit_sha: must be 'main' or a 7-40 character hex SHA" >&2 + exit 1 + fi echo "✅ Custom container image built successfully!" - echo "📦 Image: ${{ env.IMAGE_NAME }}:source-${{ env.LLAMA_STACK_COMMIT_SHA }}" - echo "🔗 Llama Stack commit: ${{ env.LLAMA_STACK_COMMIT_SHA }}" + echo "📦 Image: $IMAGE_NAME:source-$_LLAMA_STACK_COMMIT_SHA" + echo "🔗 Llama Stack commit: $_LLAMA_STACK_COMMIT_SHA" echo "" echo "You can pull this image using:" - echo "docker pull ${{ env.IMAGE_NAME }}:source-${{ env.LLAMA_STACK_COMMIT_SHA }}" + echo "docker pull $IMAGE_NAME:source-$_LLAMA_STACK_COMMIT_SHA"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/redhat-distro-container.yml around lines 290 - 298, Validate and sanitize the user-supplied LLAMA_STACK_COMMIT_SHA (github.event.inputs.llama_stack_commit_sha) before interpolating it into the run block that echoes/pulls the image (where env.LLAMA_STACK_COMMIT_SHA and env.IMAGE_NAME are used); add an earlier step that checks the input against a whitelist (allow only the literal "main" or a 7–40 character hex SHA) and fail the job if it doesn't match, or set a safe default, then export the validated value into env.LLAMA_STACK_COMMIT_SHA for subsequent steps so the echo/docker pull lines use a vetted value and cannot be used for shell injection.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/redhat-distro-container.yml:
- Around line 153-171: The workflow step unconditionally sets
EMBEDDING_PROVIDER/EMBEDDING_MODEL to vllm-embedding whenever MAAS_VLLM_URL is
present, causing mixed config if MAAS_EMBEDDING_URL is empty; update the step so
it validates MAAS_EMBEDDING_URL before writing VLLM_EMBEDDING_URL,
VLLM_EMBEDDING_API_TOKEN, EMBEDDING_PROVIDER, EMBEDDING_MODEL and
EMBEDDING_PROVIDER_MODEL_ID into GITHUB_ENV — i.e., only append those
embedding-related env entries when _MAAS_EMBEDDING_URL is non-empty (or else
emit a clear failure), while still setting VLLM_URL, VLLM_API_TOKEN and
VLLM_INFERENCE_MODEL when _MAAS_VLLM_URL exists.
- Around line 60-64: The MAAS secrets (MAAS_VLLM_URL, MAAS_VLLM_API_TOKEN,
MAAS_EMBEDDING_URL, MAAS_EMBEDDING_API_TOKEN) are currently defined at job-level
env and therefore exposed to every step; remove these four variables from the
job-level env block and instead add them only to the "Configure MaaS vLLM
endpoints" step by setting its step-level env to the corresponding secrets
(e.g., MAAS_VLLM_URL: ${{ secrets.MAAS_VLLM_URL }}, etc.); ensure no other steps
reference or inherit these vars so the secrets are only available during that
single step.
- Around line 275-281: The Slack notification is currently evaluated inside the
amd64 matrix leg (the "Set Slack notify status" / "Notify Slack" steps) so
arm64-only failures are missed; instead add a new single fan-in job (e.g.,
"notify-slack") that uses needs to depend on the full matrix job (the job that
produces steps.distribution-changed.outputs.changed) and remove the matrix.arch
gating from the notification logic; in that new job inspect needs.*.result (and
the distribution-changed output via
needs.<matrixJob>.outputs.distribution-changed) to detect any failed
architecture or a successful publish and then run the Slack notify step once
based on those aggregated results rather than using failure() inside a single
matrix leg.
In `@tests/run_integration_tests.sh`:
- Around line 66-68: Remove test_openai_embeddings_with_encoding_format_base64
from the unconditional SKIP_TESTS string and instead append that single test
name to SKIP_TESTS only when the run is actually using the MaaS embedding path;
update the logic around the SKIP_TESTS variable in
tests/run_integration_tests.sh to detect the MaaS run (e.g., check the existing
env/config flags used elsewhere in the script such as EMBEDDING_PROVIDER,
EMBEDDING_ENDPOINT, EMBEDDING_MODEL or any MAAS-specific boolean) and only add
"test_openai_embeddings_with_encoding_format_base64" to SKIP_TESTS in that
MaaS-specific branch so other lanes keep coverage.
---
Outside diff comments:
In @.github/workflows/redhat-distro-container.yml:
- Around line 108-123: The step "Generate Containerfile to build an image from
an arbitrary llama-stack commit (workflow_dispatch/schedule)" is gated by if:
matrix.arch == 'amd64' && contains(...), so scheduled arm64 runs use the
checked-in Containerfile; remove the arch restriction so the Containerfile
regeneration (the python3 distribution/build.py and sed -i removal of the pip
install line) runs for schedule on arm64 too — change the step's if to only
check the event (e.g., if:
contains(fromJSON('["workflow_dispatch","schedule"]'), github.event_name)) or
otherwise add logic to run the step for arm64 when github.event_name ==
'schedule'.
- Around line 290-298: Validate and sanitize the user-supplied
LLAMA_STACK_COMMIT_SHA (github.event.inputs.llama_stack_commit_sha) before
interpolating it into the run block that echoes/pulls the image (where
env.LLAMA_STACK_COMMIT_SHA and env.IMAGE_NAME are used); add an earlier step
that checks the input against a whitelist (allow only the literal "main" or a
7–40 character hex SHA) and fail the job if it doesn't match, or set a safe
default, then export the validated value into env.LLAMA_STACK_COMMIT_SHA for
subsequent steps so the echo/docker pull lines use a vetted value and cannot be
used for shell injection.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: f0604127-4448-46d2-86b9-134827a253ef
📒 Files selected for processing (4)
.github/workflows/redhat-distro-container.ymlREADME.mdtests/run_integration_tests.shtests/smoke.sh
✅ Files skipped from review due to trivial changes (1)
- README.md
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/smoke.sh
| # MaaS vLLM endpoints (used for both arches on origin PRs) | ||
| MAAS_VLLM_URL: ${{ secrets.MAAS_VLLM_URL }} | ||
| MAAS_VLLM_API_TOKEN: ${{ secrets.MAAS_VLLM_API_TOKEN }} | ||
| MAAS_EMBEDDING_URL: ${{ secrets.MAAS_EMBEDDING_URL }} | ||
| MAAS_EMBEDDING_API_TOKEN: ${{ secrets.MAAS_EMBEDDING_API_TOKEN }} |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, locate and examine the workflow file
fd "redhat-distro-container.yml" -t fRepository: opendatahub-io/llama-stack-distribution
Length of output: 65
🏁 Script executed:
# Get the file and examine the lines in question
if [ -f ".github/workflows/redhat-distro-container.yml" ]; then
echo "=== Lines 50-75 (around the secrets segment) ===" && \
sed -n '50,75p' .github/workflows/redhat-distro-container.yml && \
echo -e "\n=== Lines 105-125 (around the custom-source segment) ===" && \
sed -n '105,125p' .github/workflows/redhat-distro-container.yml && \
echo -e "\n=== Full file structure overview ===" && \
cat -n .github/workflows/redhat-distro-container.yml | head -200
fiRepository: opendatahub-io/llama-stack-distribution
Length of output: 12304
CWE-668: Move MaaS secrets from job scope to step scope.
Lines 60-64 expose MAAS_VLLM_URL, MAAS_VLLM_API_TOKEN, MAAS_EMBEDDING_URL, and MAAS_EMBEDDING_API_TOKEN to every step in the job, including the custom-source build at lines 108-123 that runs pip install -e . from an external repository. An attacker controlling the cloned llama-stack repository can inject install hooks that exfiltrate these tokens before test execution. Remove the MaaS secrets from job-level env and inject them only in the "Configure MaaS vLLM endpoints" step (line 153).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/redhat-distro-container.yml around lines 60 - 64, The MAAS
secrets (MAAS_VLLM_URL, MAAS_VLLM_API_TOKEN, MAAS_EMBEDDING_URL,
MAAS_EMBEDDING_API_TOKEN) are currently defined at job-level env and therefore
exposed to every step; remove these four variables from the job-level env block
and instead add them only to the "Configure MaaS vLLM endpoints" step by setting
its step-level env to the corresponding secrets (e.g., MAAS_VLLM_URL: ${{
secrets.MAAS_VLLM_URL }}, etc.); ensure no other steps reference or inherit
these vars so the secrets are only available during that single step.
There was a problem hiding this comment.
these aren't raw secrets in env — they're ${{ secrets.* }} mapped to env vars, which is the standard GitHub Actions pattern (same as VERTEX_AI_PROJECT,
OPENAI_API_KEY on lines 56, 59
There was a problem hiding this comment.
@Artemon-line, you're correct that ${{ secrets.* }} in job-level env is the standard GitHub Actions pattern — my concern wasn't about the syntax, it was about scope (CWE-668): all job-level env vars, regardless of how they're populated, are visible to every step in the job, including the step that clones github.com/opendatahub-io/llama-stack and runs pip install -e .. A compromised or hijacked install hook in that external repo could read the process environment and exfiltrate any token present at that point.
That said, your consistency argument is fair — VERTEX_AI_PROJECT, GCP_WORKLOAD_IDENTITY_PROVIDER, and OPENAI_API_KEY are already at job-level with the same exposure, and the cloned repo is within your own org (opendatahub-io), which significantly reduces the supply-chain risk compared to a third-party dependency. Since the existing secrets already follow this pattern and the external repo is org-controlled, I'll defer to your judgment here.
There was a problem hiding this comment.
@Artemon-line are you sure that this comment is a false positive? I mean consistent insecure is also consistent. Just saying, these API tokens might be very valuable and interesting for people. I know the risk properly not so large for this particular repo, but in general it's important to keep an eye on such honeypots.
There was a problem hiding this comment.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
There was a problem hiding this comment.
Thank you @rhuss , that’s a very valuable point.
I’d like to keep this PR as a PoC (Proof of Concept) for now, so I’ve added the do-not-merge and test labels.
My intention is to demonstrate how this vLLM-on-MaaS solution could work; however, some notable downsides include secret management and MaaS availability.
There was a problem hiding this comment.
thanks @Artemon-line , by no means I want to block anything that is useful (and this PR is definitely helpful), just want to remind on a general issue with API keys in CI. :old-man-yelling-at-the-cloud:
There was a problem hiding this comment.
CodeRabbit is pointing at a best-practice here, and best-practices are certainly easier for humans (and agents) to reason about. It's worth considering.
| env: | ||
| VERTEX_AI_PROJECT: ${{ secrets.VERTEX_AI_PROJECT }} | ||
| VERTEX_AI_LOCATION: us-central1 | ||
| VERTEX_AI_LOCATION: global |
There was a problem hiding this comment.
This PR has a lot of changes in it. This is one example.
There was a problem hiding this comment.
Thank you @ktdreyer for pointing this out, I will address this in other PR.
|
The feature PoC looks good, and will unblock #252. The secret / token topic has already been discussed, but once that's addressed, I think this will really help. Thanks! |
Summary
[amd64, arm64]with native GitHub runners (ubuntu-latest/ubuntu-24.04-arm)test_openai_completion_logprobs,test_openai_embeddings_with_encoding_format_base64)Supersedes #252
Implements RHAIENG-3509
How publish works
The publish step is gated on
matrix.arch == 'amd64'so it runs only once (not duplicated across matrix legs). It does a fresh multi-arch build withplatforms: linux/amd64,linux/arm64, leveraging GHA cache from both the amd64 and arm64 build steps, and pushes a single multi-arch manifest. Both architectures are published in one operation.Test plan
llama-3-2-3b) and embedding (nomic-embed-text-v1-5)Skipped upstream tests
test_openai_embeddings_with_encoding_format_base64nomic-embed-text-v1-5doesn't supportdimensionsparam (always returns 768)test_openai_completion_logprobs/v1/completionsvalidateslogprobsasboolinstead ofintper OpenAI spec