ci: Add integration tests to CI workflow#26
Conversation
WalkthroughAdds job-level environment vars and an Install uv step to the Red Hat distro workflow, renames the smoke step, adds an integration-tests step that runs a new version-locked runner script, adds an unconditional cleanup step, introduces Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor GHRunner as GitHub Actions Runner
participant Job as Workflow Job
participant Docker as Docker Engine
participant Script as tests/run_integration_tests.sh
participant Git as Remote Repo (llama-stack)
participant Pytest as Pytest
Note right of Job #EFEFEF: Job env set: INFERENCE_MODEL, VLLM_URL
GHRunner->>Job: start job
Job->>Docker: start distro image & run smoke test
Job->>Script: run ./tests/run_integration_tests.sh
Script->>Script: parse llama-stack version from ../distribution/Containerfile.in
Script->>Git: clone/fetch repo and checkout tag vX.Y.Z
Script->>Script: validate STACK_CONFIG_PATH and prepare env
Script->>Pytest: run integration tests (--stack-config=server:"$STACK_CONFIG_PATH", --text-model=vllm-inference/"$INFERENCE_MODEL") with skip filter
Pytest-->>Script: results
Script-->>Job: exit code
Job->>Docker: cleanup containers (always) — `docker rm -f vllm llama-stack`
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
f38734e to
7ca08af
Compare
nathan-weinberg
left a comment
There was a problem hiding this comment.
A few nits but overall this is looking great!
7ca08af to
2f25bfb
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
tests/run_integration_tests.sh (2)
63-63: Past suggestion applied: quoting INFERENCE_MODEL in --text-model arg. Good.
87-88: Past suggestion applied: call main "$@" and exit 0.
🧹 Nitpick comments (3)
tests/run_integration_tests.sh (3)
41-48: Modernize venv bootstrap; install test extras if available.Upgrading pip tooling avoids build quirks; prefer installing test extras.
# shellcheck disable=SC1091 source venv/bin/activate - pip install -e . - pip install pytest pytest-asyncio + python -m pip install -U pip setuptools wheel + pip install -e '.[test]' || pip install -e . + pip install -U pytest pytest-asyncio
58-65: Validate INFERENCE_MODEL explicitly; avoid opaque “unbound variable” failures.Provide a clear error before pytest invocation; optional nit to avoid double-specifying the venv.
# shellcheck disable=SC1091 source venv/bin/activate # Test to skip SKIP_TESTS="test_text_chat_completion_tool_calling_tools_not_in_request or test_inference_store_tool_calls" - ./venv/bin/pytest -s -v tests/integration/inference/ \ + : "${INFERENCE_MODEL:?Environment variable INFERENCE_MODEL must be set (e.g., Llama-3.1-8B-Instruct)}" + pytest -s -v tests/integration/inference/ \ --stack-config=server:ci-tests \ --text-model=vllm-inference/"$INFERENCE_MODEL" \ -k "not ($SKIP_TESTS)"
7-8: Use a unique WORK_DIR to avoid collisions in parallel CI runs.Prevents cross-job interference when the workflow uses a matrix or retries.
-WORK_DIR="/tmp/llama-stack-integration-tests" +WORK_DIR="${WORK_DIR:-$(mktemp -d -t llama-stack-it-XXXXXX)}"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
.github/workflows/redhat-distro-container.yml(2 hunks)tests/run_integration_tests.sh(1 hunks)tests/smoke.sh(0 hunks)
💤 Files with no reviewable changes (1)
- tests/smoke.sh
🚧 Files skipped from review as they are similar to previous changes (1)
- .github/workflows/redhat-distro-container.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build-test-push (linux/amd64)
nathan-weinberg
left a comment
There was a problem hiding this comment.
I'd like the QE guys to look but from my POV this LGTM
2f25bfb to
4262ef1
Compare
4262ef1 to
9b8799b
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (2)
tests/run_integration_tests.sh (2)
12-18: Harden version extraction (file check, single match, prerelease-safe).
Prevents silent failures and clarifies errors.-CONTAINERFILE_IN="$SCRIPT_DIR/../distribution/Containerfile.in" -LLAMA_STACK_VERSION=$(grep -o 'llama-stack==[0-9]\+\.[0-9]\+\.[0-9]\+' "$CONTAINERFILE_IN" | cut -d'=' -f3) -if [ -z "$LLAMA_STACK_VERSION" ]; then - echo "Error: Could not extract llama-stack version from Containerfile.in" - exit 1 -fi +CONTAINERFILE_IN="$SCRIPT_DIR/../distribution/Containerfile.in" +[ -f "$CONTAINERFILE_IN" ] || { echo "Error: Containerfile not found at: $CONTAINERFILE_IN"; exit 1; } +# Accept X.Y.Z with optional pre/rc/build suffixes; ensure exactly one match +mapfile -t _matches < <(grep -Eo 'llama-stack==[0-9]+\.[0-9]+\.[0-9]+([.-][A-Za-z0-9]+([.-][A-Za-z0-9]+)*)?' "$CONTAINERFILE_IN" \ + | awk -F= '{print $3}' | tr -d '[:space:]' | sort -u) +case "${#_matches[@]}" in + 0) echo "Error: No llama-stack version found in $CONTAINERFILE_IN"; exit 1;; + 1) LLAMA_STACK_VERSION="${_matches[0]}";; + *) echo "Error: Multiple llama-stack versions found in $CONTAINERFILE_IN: ${_matches[*]}"; exit 1;; +esac +unset _matches
21-25: Make clone/checkout robust (handle non-git dir, shallow-by-tag, fetch tags, better diagnostics).
Prevents stale origins and missing tags; documents local re-run rationale.function clone_llama_stack() { - # Clone the repository if it doesn't exist - if [ ! -d "$WORK_DIR" ]; then - git clone "$LLAMA_STACK_REPO" "$WORK_DIR" - fi - - # Checkout the specific tag - cd "$WORK_DIR" - # fetch origin incase we didn't clone a fresh repo - git fetch origin - if ! git checkout "v$LLAMA_STACK_VERSION"; then - echo "Error: Could not checkout tag v$LLAMA_STACK_VERSION" - echo "Available tags:" - git tag | grep "^v" | tail -10 - exit 1 - fi + # Allow re-running locally: reuse existing checkout if it's a git repo; otherwise clone fresh. + if [ ! -d "$WORK_DIR/.git" ]; then + rm -rf "$WORK_DIR" + if ! git clone --depth 1 --branch "v$LLAMA_STACK_VERSION" --single-branch "$LLAMA_STACK_REPO" "$WORK_DIR"; then + echo "Warn: shallow clone by tag failed; falling back to full clone." + git clone "$LLAMA_STACK_REPO" "$WORK_DIR" + fi + fi + + # Checkout the specific tag + cd "$WORK_DIR" + # Fetch tags in case we didn't clone a fresh repo + git remote set-url origin "$LLAMA_STACK_REPO" >/dev/null 2>&1 || true + git fetch --tags --force --prune origin + if ! git -c advice.detachedHead=false checkout --force "v$LLAMA_STACK_VERSION"; then + echo "Error: Could not checkout tag v$LLAMA_STACK_VERSION" + echo "Available recent tags:" + git tag --sort=-creatordate | grep -E "^v" | head -10 + exit 1 + fi }Also applies to: 26-35
🧹 Nitpick comments (3)
tests/run_integration_tests.sh (3)
6-9: Let repo and workdir be overridable via env.
Improves local/dev usability without changing defaults.-LLAMA_STACK_REPO="https://github.com/meta-llama/llama-stack.git" -WORK_DIR="/tmp/llama-stack-integration-tests" -INFERENCE_MODEL="${INFERENCE_MODEL:-meta-llama/Llama-3.2-1B-Instruct}" +LLAMA_STACK_REPO="${LLAMA_STACK_REPO:-https://github.com/meta-llama/llama-stack.git}" +WORK_DIR="${WORK_DIR:-/tmp/llama-stack-integration-tests}" +INFERENCE_MODEL="${INFERENCE_MODEL:-meta-llama/Llama-3.2-1B-Instruct}"
28-29: Tiny wording fix in comment.
“incase” → “in case”.- # fetch origin incase we didn't clone a fresh repo + # Fetch origin in case we didn't clone a fresh repo
38-57: Pre-flight check for uv; keep pytest invocation as-is.
Avoids confusing failures when running locally without uv.function run_integration_tests() { echo "Running integration tests..." cd "$WORK_DIR" + if ! command -v uv >/dev/null 2>&1; then + echo "Error: 'uv' is required but not found in PATH." + exit 1 + fi + # Test to skip SKIP_TESTS="test_text_chat_completion_tool_calling_tools_not_in_request or test_inference_store_tool_calls" # Dynamically determine the path to run.yaml from the original script directory STACK_CONFIG_PATH="$SCRIPT_DIR/../distribution/run.yaml"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
.github/workflows/redhat-distro-container.yml(3 hunks)tests/run_integration_tests.sh(1 hunks)tests/smoke.sh(0 hunks)
💤 Files with no reviewable changes (1)
- tests/smoke.sh
🚧 Files skipped from review as they are similar to previous changes (1)
- .github/workflows/redhat-distro-container.yml
🧰 Additional context used
🧬 Code graph analysis (1)
tests/run_integration_tests.sh (1)
tests/smoke.sh (1)
main(64-70)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build-test-push (linux/amd64)
🔇 Additional comments (3)
tests/run_integration_tests.sh (3)
1-4: Strict mode + shebang: good call.
Solid baseline for predictable failures.
47-51: run.yaml path validation: looks good.
Clear error on missing config.
74-75: Explicit main + exit 0: fine.
Matches prior review suggestion and ensures a clean exit status.
9b8799b to
612d771
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
tests/run_integration_tests.sh (1)
12-18: Harden version extraction (file check, prerelease-safe, single match).Current grep can silently exit early on missing file (due to -e/pipefail), misses prereleases, and doesn’t guard against multiple matches.
Apply this diff:
-# Get version dynamically from Containerfile.in (look in parent directory) -CONTAINERFILE_IN="$SCRIPT_DIR/../distribution/Containerfile.in" -LLAMA_STACK_VERSION=$(grep -o 'llama-stack==[0-9]\+\.[0-9]\+\.[0-9]\+' "$CONTAINERFILE_IN" | cut -d'=' -f3) -if [ -z "$LLAMA_STACK_VERSION" ]; then - echo "Error: Could not extract llama-stack version from Containerfile.in" - exit 1 -fi +# Get version dynamically from Containerfile.in (look in parent directory) +CONTAINERFILE_IN="$SCRIPT_DIR/../distribution/Containerfile.in" +if [ ! -r "$CONTAINERFILE_IN" ]; then + echo "Error: Containerfile not found or unreadable at: $CONTAINERFILE_IN" + exit 1 +fi +# Accept X.Y.Z with optional suffixes (e.g., -rc1, .post1); ensure a single unique match +mapfile -t _matches < <(grep -Eo 'llama-stack==[0-9]+\.[0-9]+\.[0-9]+([.-][A-Za-z0-9]+)?' "$CONTAINERFILE_IN" \ + | cut -d'=' -f3 | tr -d '[:space:]' | sort -u) +case "${#_matches[@]}" in + 0) echo "Error: No llama-stack version found in $CONTAINERFILE_IN"; exit 1 ;; + 1) LLAMA_STACK_VERSION="${_matches[0]}" ;; + *) echo "Error: Multiple versions found in $CONTAINERFILE_IN: ${_matches[*]}"; exit 1 ;; +esac +unset _matches
🧹 Nitpick comments (3)
tests/run_integration_tests.sh (3)
7-7: Make WORK_DIR overridable via env.Allow local/custom paths without editing the script.
Apply this diff:
-WORK_DIR="/tmp/llama-stack-integration-tests" +WORK_DIR="${WORK_DIR:-/tmp/llama-stack-integration-tests}"
21-35: Clone/checkout robustness: handle non-git dirs, shallow-by-tag, fetch tags; fix comment typo.Prevents failures on re-runs, speeds CI, and ensures tags are present. Also corrects “incase” → “in case”.
Apply this diff:
function clone_llama_stack() { - # Clone the repository if it doesn't exist - if [ ! -d "$WORK_DIR" ]; then - git clone "$LLAMA_STACK_REPO" "$WORK_DIR" - fi + # Clone the repository if it doesn't exist (or isn't a git repo) + if [ ! -d "$WORK_DIR/.git" ]; then + rm -rf "$WORK_DIR" + if ! git clone --depth 1 --branch "v$LLAMA_STACK_VERSION" --single-branch "$LLAMA_STACK_REPO" "$WORK_DIR"; then + echo "Warn: shallow clone by tag failed; falling back to full clone." + git clone "$LLAMA_STACK_REPO" "$WORK_DIR" + fi + fi # Checkout the specific tag cd "$WORK_DIR" - # fetch origin incase we didn't clone a fresh repo - git fetch origin - if ! git checkout "v$LLAMA_STACK_VERSION"; then + # fetch origin in case we didn't clone a fresh repo + git remote set-url origin "$LLAMA_STACK_REPO" >/dev/null 2>&1 || true + git fetch --tags --force --prune origin + if ! git -c advice.detachedHead=false checkout -q "v$LLAMA_STACK_VERSION"; then echo "Error: Could not checkout tag v$LLAMA_STACK_VERSION" echo "Available tags:" - git tag | grep "^v" | tail -10 + git tag --sort=-version:refname | grep -E "^v" | head -10 exit 1 fi }
53-56: Pre-flight: assert required tools exist (git, uv).Fail fast with a clear message on local runs without CI provisioning.
Apply this diff to add a helper and checks:
} function run_integration_tests() { @@ -k "not ($SKIP_TESTS)" } +require_cmd() { + command -v "$1" >/dev/null 2>&1 || { echo "Error: required command not found: $1"; exit 1; } +} + function main() { + require_cmd git + require_cmd uv echo "Starting llama-stack integration tests"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
.github/workflows/redhat-distro-container.yml(3 hunks)tests/run_integration_tests.sh(1 hunks)tests/smoke.sh(0 hunks)
💤 Files with no reviewable changes (1)
- tests/smoke.sh
🚧 Files skipped from review as they are similar to previous changes (1)
- .github/workflows/redhat-distro-container.yml
🧰 Additional context used
🧬 Code graph analysis (1)
tests/run_integration_tests.sh (1)
tests/smoke.sh (1)
main(64-70)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build-test-push (linux/amd64)
🔇 Additional comments (5)
tests/run_integration_tests.sh (5)
1-1: Good: added portable Bash shebang.Portable and explicit.
3-3: Good: strict mode enabled.
set -euo pipefailis appropriate for CI scripts.
46-51: Good: explicit validation of stack config path.Early, clear failure is helpful.
55-55: Good: quoted model selector.Prevents word-splitting issues with slashes/spaces in model name.
74-75: Good: explicit main invocation and clean exit.Works fine with
set -euo pipefail.
o Added new integration test step that runs llama-stack pytest tests o Modified smoke test workflow to include cleanup step o Created run_integration_tests.sh script to clone upstream repo and run tests o Moved docker cleanup from smoke.sh to workflow cleanup step Signed-off-by: Derek Higgins <derekh@redhat.com>
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (6)
tests/run_integration_tests.sh (6)
13-18: Harden version extraction (existence check, ERE, prerelease support, single match).Avoid silent failures and ambiguity; enforce exactly one match and support rc/pre tags.
Apply:
-CONTAINERFILE_IN="$SCRIPT_DIR/../distribution/Containerfile.in" -LLAMA_STACK_VERSION=$(grep -o 'llama-stack==[0-9]\+\.[0-9]\+\.[0-9]\+' "$CONTAINERFILE_IN" | cut -d'=' -f3) -if [ -z "$LLAMA_STACK_VERSION" ]; then - echo "Error: Could not extract llama-stack version from Containerfile.in" - exit 1 -fi +CONTAINERFILE_IN="$SCRIPT_DIR/../distribution/Containerfile.in" +if [ ! -f "$CONTAINERFILE_IN" ]; then + echo "Error: Containerfile not found at: $CONTAINERFILE_IN" + exit 1 +fi +# Accept X.Y.Z and optional suffix (e.g., -rc1, .post1); enforce a single match +mapfile -t _matches < <(grep -Eo 'llama-stack==[0-9]+\.[0-9]+\.[0-9]+([.-][A-Za-z0-9]+)?' "$CONTAINERFILE_IN" \ + | cut -d'=' -f3 | tr -d '[:space:]' | sort -u) +case "${#_matches[@]}" in + 0) echo "Error: No llama-stack version found in $CONTAINERFILE_IN"; exit 1 ;; + 1) LLAMA_STACK_VERSION="${_matches[0]}" ;; + *) echo "Error: Multiple llama-stack versions found in $CONTAINERFILE_IN: ${_matches[*]}"; exit 1 ;; +esac
6-9: Make repo and work dir overridable via env for local runs.Keeps CI defaults while enabling customization.
-LLAMA_STACK_REPO="https://github.com/meta-llama/llama-stack.git" -WORK_DIR="/tmp/llama-stack-integration-tests" +LLAMA_STACK_REPO="${LLAMA_STACK_REPO:-https://github.com/meta-llama/llama-stack.git}" +WORK_DIR="${WORK_DIR:-/tmp/llama-stack-integration-tests}"
21-35: Handle non‑git dirs, stale remotes; prefer shallow tag clone; show newest tags on failure.Current logic fails if WORK_DIR exists but isn’t a git repo and lists tags alphabetically.
function clone_llama_stack() { # Clone the repository if it doesn't exist - if [ ! -d "$WORK_DIR" ]; then - git clone "$LLAMA_STACK_REPO" "$WORK_DIR" - fi + if [ ! -d "$WORK_DIR/.git" ]; then + rm -rf "$WORK_DIR" + if ! git clone --depth 1 --branch "v$LLAMA_STACK_VERSION" --single-branch "$LLAMA_STACK_REPO" "$WORK_DIR"; then + echo "Warn: shallow clone by tag failed; falling back to full clone." + git clone "$LLAMA_STACK_REPO" "$WORK_DIR" + fi + fi # Checkout the specific tag cd "$WORK_DIR" - # fetch origin incase we didn't clone a fresh repo - git fetch origin - if ! git checkout "v$LLAMA_STACK_VERSION"; then + # fetch origin in case we didn't clone a fresh repo + git remote set-url origin "$LLAMA_STACK_REPO" >/dev/null 2>&1 || true + git fetch --tags --force --prune origin + if ! git -c advice.detachedHead=false checkout "v$LLAMA_STACK_VERSION"; then echo "Error: Could not checkout tag v$LLAMA_STACK_VERSION" echo "Available tags:" - git tag | grep "^v" | tail -10 + git tag --sort=-version:refname | grep -E "^v" | head -10 exit 1 fi }
28-29: Nit: fix typo in comment.“incase” → “in case”.
- # fetch origin incase we didn't clone a fresh repo + # fetch origin in case we didn't clone a fresh repo
43-46: Allow overriding skipped tests via env.Keeps defaults but lets CI/jobs toggle without file edits.
- # Test to skip - SKIP_TESTS="test_text_chat_completion_tool_calling_tools_not_in_request or test_inference_store_tool_calls" + # Tests to skip (overridable via env SKIP_TESTS) + SKIP_TESTS="${SKIP_TESTS:-test_text_chat_completion_tool_calling_tools_not_in_request or test_inference_store_tool_calls}" + echo " SKIP_TESTS: $SKIP_TESTS"
53-56: Preflight check for ‘uv’ to fail fast with a helpful message.Avoids cryptic “command not found” locally.
- uv run pytest -s -v tests/integration/inference/ \ + command -v uv >/dev/null 2>&1 || { echo "Error: 'uv' not found on PATH. Install uv or ensure CI step installs it."; exit 127; } + uv run pytest -s -v tests/integration/inference/ \ --stack-config=server:"$STACK_CONFIG_PATH" \ --text-model=vllm-inference/"$INFERENCE_MODEL" \ -k "not ($SKIP_TESTS)"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
.github/workflows/redhat-distro-container.yml(3 hunks)tests/run_integration_tests.sh(1 hunks)tests/smoke.sh(0 hunks)
💤 Files with no reviewable changes (1)
- tests/smoke.sh
🚧 Files skipped from review as they are similar to previous changes (1)
- .github/workflows/redhat-distro-container.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build-test-push (linux/amd64)
🔇 Additional comments (2)
tests/run_integration_tests.sh (2)
1-3: Good: strict Bash with sane defaults.Shebang and
set -euo pipefailare in place.
59-71: Approve — verified llama-stack pin and upstream tagLGTM — distribution/Containerfile.in pins llama-stack==0.2.21 and upstream tag v0.2.21 exists.
…_version build: Update wheels version to 3.2.808
o Added new integration test step that runs llama-stack pytest tests
o Modified smoke test workflow to include cleanup step
o Created run_integration_tests.sh script to clone upstream repo and run tests
o Moved docker cleanup from smoke.sh to workflow cleanup step
Summary by CodeRabbit
Tests
Chores
No user-facing changes.