Skip to content

fix: build on midstream main#115

Merged
derekhiggins merged 8 commits intoopendatahub-io:mainfrom
leseb:fixxx
Dec 1, 2025
Merged

fix: build on midstream main#115
derekhiggins merged 8 commits intoopendatahub-io:mainfrom
leseb:fixxx

Conversation

@leseb
Copy link
Copy Markdown
Collaborator

@leseb leseb commented Nov 17, 2025

What does this PR do?

The main branch of this repo will now use the midstream main branch to install llama-stack.
llama-stack pyproject expects "uv" as the package manager.

This include fixes on the build/run files to remove the old telemetry API.

Test Plan

Build test.

Summary by CodeRabbit

  • New Features

    • Added a batch API entry and expanded inference backends (Bedrock, OpenAI, Vertex AI, vLLM, WatsonX).
  • Improvements

    • Unified persistence/storage with new default stores; telemetry moved to top-level and enabled; several provider versions bumped; default embedding model updated.
  • Documentation

    • Docs now resolve and link main-branch commits or release tags and show repo owner/version clearly.
  • Chores

    • Consolidated install flow to a new installer invocation; pre-commit requirement updated; CI clone source switched.
  • Tests

    • Test scripts updated routes, model handling, and enhanced debug output.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Nov 17, 2025

Walkthrough

Switched to using the uv pip wrapper across build and Containerfile flows; added an install-from-source path for llama-stack (defaulting to main/commit); reworked dependency parsing and Containerfile emission in distribution/build.py; migrated runtime configs to persistence-backed stores; enhanced docs tooling to resolve main commit hashes.

Changes

Cohort / File(s) Summary
Pre-commit config
\.pre-commit-config.yaml
Local hook "Distribution Build" dependency changed from a git-based llama-stack pre-release to uv>=0.9.0.
Containerfile templates
distribution/Containerfile, distribution/Containerfile.in
Added pip install uv and replaced subsequent pip install with uv pip install; consolidated many packages into grouped uv pip install blocks; adjusted torch/torchao installs to use extra-index and CPU wheel; moved llama-stack source usage toward main/commit.
Build automation & parsing
distribution/build.py
Added shlex; renamed check_llama_installed()check_package_installed(package_name); added install_llama_stack_from_source(llama_stack_version); require uv; switched dependency retrieval to parsing llama stack list-deps output (handles uv pip lines and plain packages), tokenization/classification (torch/no-deps/no-cache/standard), normalization/deduplication, and emit grouped uv pip install commands; improved error messages.
Distribution manifests
distribution/build.yaml, distribution/run.yaml
Bumped provider package versions; removed provider-scoped telemetry block and moved telemetry to top-level; added files, additional_pip_packages, image_type/image_name; replaced inline kvstore blocks with persistence-backed stores (kv/sql) and namespaces; added batch API entry and reorganized providers, storage, and resources.
Docs & generator
distribution/README.md, scripts/gen_distro_docs.py
README updated to reference commit/main and new APIs; gen_distro_docs.py adds resolve_main_commit_hash(repo_url), changes extract_llama_stack_version() to return (version, repo_owner), and builds commit vs tag links with short-hash displays.
Container build layout (install blocks)
distribution/Containerfile (install sections)
Consolidated and reordered uv pip install blocks; ensured pinned dependencies use uv pip install --upgrade; adjusted torch/torchao installation steps to use extra index URLs and CPU wheel selection; replaced git+llama-stack pre-release reference with main/commit usage (no-deps/no-cache handling).
Tests & CI scripts
tests/smoke.sh, tests/run_integration_tests.sh, .github/workflows/redhat-distro-container.yml
tests/smoke.sh: iterate INFERENCE_MODEL and EMBEDDING_MODEL, updated OpenAI inference endpoint path and model naming, added debug response logs. tests/run_integration_tests.sh: EMBEDDING_MODEL default changed, checkout logic supports main or version tags, switched to venv-based install for llama-stack-client, adjusted embedding model handling. CI workflow: clone source changed to opendatahub-io/llama-stack.git and EMBEDDING_MODEL updated.
Other (error handling & imports)
distribution/build.py, scripts/gen_distro_docs.py
Extended error reporting for missing executables and install failures; added shlex usage and clearer failure messages; gen_distro_docs now handles repo owner and commit resolution.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Dev as Developer / CI
    participant Build as distribution/build.py
    participant Git as Git (repo)
    participant LS as "llama stack" CLI
    participant UV as uv (pip wrapper)
    participant CF as Containerfile generator

    Dev->>Build: invoke build script (main)
    Build->>Build: check_package_installed("uv")
    alt uv missing
        Build->>UV: pip install uv
        UV-->>Build: uv available
    end
    Build->>Build: install_llama_stack_from_source(LLAMA_STACK_VERSION)
    Build->>Git: git+<repo>@<ref> (via `uv pip install`)
    Git-->>Build: package installed / source fetched
    Build->>LS: run "llama stack list-deps"
    LS-->>Build: dependency list (plain packages or `uv pip` lines)
    Build->>Build: shlex-tokenize & classify deps (torch / no-deps / no-cache / standard)
    Build->>UV: emit grouped `uv pip install` commands (handle extra-index for torch)
    UV-->>Build: install results
    Build->>CF: generate Containerfile with ordered `uv pip install` blocks
    CF-->>Dev: produced Containerfile artifact
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45–75 minutes

Points needing extra attention:

  • distribution/build.py: shlex parsing, dependency classification/normalization/deduplication, install-from-source flow, and updated error messages.
  • distribution/Containerfile & distribution/Containerfile.in: correctness and ordering of uv pip install blocks and torch/torchao extra-index usage.
  • distribution/run.yaml & distribution/build.yaml: persistence migration, store/table mappings, telemetry relocation, and provider version bumps.
  • scripts/gen_distro_docs.py: commit-hash resolution, changed return shape for extract_llama_stack_version(), and link generation.

Poem

🐇 I hopped from tag to main with glee,

uv now carries pip across the lea,
I parsed the deps with nimble paws,
Swapped kv for persistent laws,
Builds hum softly — a rabbit's tiny chee!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 43.75% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: build on midstream main' directly aligns with the PR's primary objective to switch llama-stack dependency from a pre-release tag to the midstream main branch, and addresses the fix needed for uv package manager compatibility.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
distribution/build.py (1)

17-35: Default version "main" combined with @v{llama_stack_version} produces an invalid or inconsistent ref

The review concern is confirmed. Here's what the code shows:

  • .pre-commit-config.yaml installs from git+https://github.com/opendatahub-io/llama-stack.git@main
  • distribution/build.py with CURRENT_LLAMA_STACK_VERSION = "main" and the template @v{llama_stack_version} generates @vmain
  • distribution/Containerfile line 61 already shows the generated output: @vmain

Since "main" has no dots, is_install_from_source("main") returns True, causing the template to substitute llama_stack_version="main", yielding @vmain.

This creates a version mismatch between:

  • Pre-commit hooks (which pin @main)
  • Generated Containerfile (which uses @vmain)

The suggested resolutions remain valid:

  • Remove the v prefix from source_install_command, or
  • Set CURRENT_LLAMA_STACK_VERSION = "vmain" and use main consistently elsewhere
🧹 Nitpick comments (4)
.pre-commit-config.yaml (1)

58-59: Align pre-commit llama-stack ref with image build ref

Using git+...llama-stack.git@main here while distribution/build.py generates git+...llama-stack.git@v{LLAMA_STACK_VERSION} (defaulting to vmain) means pre-commit and the built image may pull different refs, and vmain may not exist. I’d recommend deciding on a single convention (main, a tag, or a pinned commit) and using it consistently across .pre-commit-config.yaml, CURRENT_LLAMA_STACK_VERSION, and source_install_command.

distribution/Containerfile.in (1)

4-4: Confirm uv availability in UBI and consider standardizing installs

Installing uv via dnf here makes sense given the rest of the build now uses uv pip, but it’s worth confirming uv is available in the default UBI9 repos for all target build environments. You may also want to (a) switch the subsequent pip install sqlalchemy to uv pip install for consistency and (b) optionally add a dnf clean all to keep the image slim.

distribution/Containerfile (1)

5-15: uv-based dependency installation looks good; consider moving sqlalchemy to uv as well

The switch to uv pip install for pinned and general dependencies is consistent with the rest of the PR. The one outlier is RUN pip install sqlalchemy, which still uses the plain pip in the base image. For consistency and to keep all Python deps under uv’s control, it would be cleaner to convert this line to RUN uv pip install sqlalchemy (or similar) unless there’s a specific reason to keep it separate. You might also consider a dnf clean all after installing uv if image size becomes a concern.

distribution/build.py (1)

91-199: Dependency parsing is solid overall; tighten legacy uv pip handling to avoid regressions

The new get_dependencies() logic is a nice improvement (shlex-based parsing, flag handling, pymilvus normalization, sorting/dedup), but the legacy "uv pip" branch currently downgrades to plain pip and may surprise you if llama stack list-deps still emits that format:

if line.startswith("uv pip"):
    # Legacy format: "uv pip install ..."
    line = line.replace("uv ", "RUN ", 1)
    parts = line.split(" ", 3)
    if len(parts) >= 4:
        cmd_parts = parts[:3]  # "RUN pip install"
        packages_str = parts[3]

This produces RUN pip install ... instead of RUN uv pip install ..., which is inconsistent with the rest of the build and defeats the goal of standardizing on uv for those lines.

You can keep the legacy support but still funnel everything through uv with a small tweak like:

@@ def get_dependencies():
-            if line.startswith("uv pip"):
-                # Legacy format: "uv pip install ..."
-                line = line.replace("uv ", "RUN ", 1)
-                parts = line.split(" ", 3)
-                if len(parts) >= 4:  # We have packages to sort
-                    cmd_parts = parts[:3]  # "RUN pip install"
-                    packages_str = parts[3]
-                else:
-                    standard_deps.append(" ".join(parts))
-                    continue
-            else:
-                # New format: just packages, possibly with flags
-                cmd_parts = ["RUN", "uv", "pip", "install"]
-                packages_str = line
+            if line.startswith("uv pip"):
+                # Legacy format: "uv pip install ..."
+                parts = line.split(" ", 3)
+                if len(parts) >= 4:  # We have packages to sort
+                    # Always normalize to RUN uv pip install
+                    cmd_parts = ["RUN", "uv", "pip", "install"]
+                    packages_str = parts[3]
+                else:
+                    # Fall back to emitting the original line if it’s somehow malformed
+                    standard_deps.append(line)
+                    continue
+            else:
+                # New format: just packages, possibly with flags
+                cmd_parts = ["RUN", "uv", "pip", "install"]
+                packages_str = line

This keeps the new behavior for direct package lists, while ensuring any legacy uv pip lines remain uv-backed in the generated Containerfile.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ec0b301 and 56646ac.

📒 Files selected for processing (6)
  • .pre-commit-config.yaml (1 hunks)
  • distribution/Containerfile (2 hunks)
  • distribution/Containerfile.in (1 hunks)
  • distribution/build.py (4 hunks)
  • distribution/build.yaml (0 hunks)
  • distribution/run.yaml (1 hunks)
💤 Files with no reviewable changes (1)
  • distribution/build.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Summary
🔇 Additional comments (1)
distribution/run.yaml (1)

227-228: Global telemetry flag looks consistent with the new layout

The new top-level telemetry: enabled: true aligns with moving away from provider-scoped telemetry. Structure/indentation looks fine; just make sure this matches the expected schema for the current llama stack main so that telemetry is actually honored.

@leseb leseb force-pushed the fixxx branch 3 times, most recently from bedcc4b to 7be5044 Compare November 17, 2025 10:50
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
distribution/build.py (2)

93-182: Dependency parsing & classification are generally solid, but legacy uv pip lines lose uv

The new get_dependencies() flow (using llama stack list-deps, shlex.split, flag handling, sorting/deduping, and pymilvus→pymilvus[milvus-lite]) is well structured and should produce stable, reproducible Dockerfile RUN lines.

One subtle inconsistency: in the legacy branch

if line.startswith("uv pip"):
    line = line.replace("uv ", "RUN ", 1)
    parts = line.split(" ", 3)
    cmd_parts = parts[:3]  # "RUN pip install"

this transforms a uv pip install ... line into RUN pip install ..., dropping uv. Everywhere else (new-format lines and pinned deps) you explicitly emit RUN uv pip install .... If the intent is that all dependency installs go through uv, consider preserving uv here, e.g. by prefixing with "RUN " instead of replacing "uv ".

Functionally this probably still builds, but it introduces a small behavioral and tooling inconsistency between legacy and new formats.


148-165: Package quoting and pymilvus[milvus-lite] rewrite are careful but a bit aggressive

The combination of:

  • packages = sorted(set(packages)) before and after the pymilvus rewrite, and
  • quoting any token containing < or > to avoid shell redirection, and
  • replacing any occurrence of "pymilvus" with "pymilvus[milvus-lite]" when the extra isn’t already present,

should work for the expected dependency shapes and avoids common shell pitfalls.

Minor caveats to be aware of (not blockers):

  • The substring-based replace("pymilvus", "pymilvus[milvus-lite]") will also trigger if "pymilvus" appears in an unexpected context inside a token.
  • Any future flags or markers that legitimately contain </> would also get quoted and treated as a “package”.

If you’re confident about the shape of list-deps output, this is acceptable; otherwise, consider tightening the pymilvus match (e.g., token-level equality or startswith) in a follow-up.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 56646ac and 7be5044.

📒 Files selected for processing (6)
  • .pre-commit-config.yaml (1 hunks)
  • distribution/Containerfile (2 hunks)
  • distribution/Containerfile.in (1 hunks)
  • distribution/build.py (4 hunks)
  • distribution/build.yaml (0 hunks)
  • distribution/run.yaml (1 hunks)
💤 Files with no reviewable changes (1)
  • distribution/build.yaml
🚧 Files skipped from review as they are similar to previous changes (3)
  • distribution/run.yaml
  • .pre-commit-config.yaml
  • distribution/Containerfile
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Summary
🔇 Additional comments (2)
distribution/build.py (1)

17-18: LLAMA_STACK_VERSION default + uv-based source/pinned installs look consistent

Setting CURRENT_LLAMA_STACK_VERSION = "main" still satisfies is_install_from_source (no dots), so source installs for the default behave the same as before, just targeting the main branch. Likewise, switching source_install_command and pinned deps to RUN uv pip install keeps the flow consistent with the Containerfile template that pre-installs uv.

No correctness issues stand out here; behavior is coherent with the PR goal of using uv for installs.

Also applies to: 34-35, 187-190

distribution/Containerfile.in (1)

4-7: Installing uv before injecting RUN uv pip ... dependencies is correct

Placing RUN pip install uv ahead of {dependencies} and {llama_stack_install_source} aligns with the build script’s generation of RUN uv pip install ... commands, so the resulting image will have uv available before those RUN steps execute.

No issues here; this matches the PR’s intent to move to uv as the package manager.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
distribution/build.py (1)

110-124: Document the two supported output formats.

The code handles both legacy "uv pip install ..." format and a new direct package list format. This dual-format support suggests the command output may vary based on configuration or version. Consider:

  • Documenting which conditions produce which format
  • Adding log statements to indicate which format is being parsed
  • Validating that the format detection logic is robust

This will help with debugging and maintenance when the command behavior changes.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7be5044 and b975965.

📒 Files selected for processing (6)
  • .pre-commit-config.yaml (1 hunks)
  • distribution/Containerfile (2 hunks)
  • distribution/Containerfile.in (1 hunks)
  • distribution/build.py (4 hunks)
  • distribution/build.yaml (0 hunks)
  • distribution/run.yaml (6 hunks)
💤 Files with no reviewable changes (1)
  • distribution/build.yaml
🚧 Files skipped from review as they are similar to previous changes (3)
  • .pre-commit-config.yaml
  • distribution/Containerfile
  • distribution/Containerfile.in
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-test-push (linux/amd64)
  • GitHub Check: Summary
🔇 Additional comments (9)
distribution/run.yaml (4)

65-68: Verify whether both db_path and persistence backends are needed for milvus.

The inline milvus provider (line 65) retains db_path: /opt/app-root/src/.llama/distributions/rh/milvus.db and also adds a new persistence block (lines 66–68) referencing the kv_milvus_inline backend. Clarify whether both configurations coexist intentionally or if db_path is now redundant.

If the new persistence system supersedes the inline db_path, remove the legacy field. If both are needed (e.g., for backward compatibility or dual-write), add an explanatory comment.


91-91: I need to search for information about these packages, as they may be internal/private packages or the pip index command may have limitations in the sandbox environment.

Let me search more specifically for version information about these packages:

Cannot definitively verify versions—manual confirmation required.

Web searches for llama-stack-provider-lmeval showed the latest version is 0.3.1, but version 0.2.4 was not explicitly found in release histories. Similarly, version 0.2.2 for llama-stack-provider-trustyai-fms could not be confirmed (only 0.2.0 was located). Both packages are TrustyAI-maintained external providers with limited version information accessible via standard PyPI queries.

Verify these versions by:

  • Attempting pip install llama-stack-provider-trustyai-fms==0.2.2 and pip install llama-stack-provider-lmeval==0.2.4
  • Checking the GitHub repositories (trustyai-explainability/llama-stack-provider-trustyai-fms and trustyai-explainability/llama-stack-provider-lmeval) for release tags matching 0.2.2 and 0.2.4
  • Confirming these versions are compatible with the rest of the Llama Stack components in use

183-223: All verification points confirmed—no issues found.

The storage section is complete and correct:

  • All 10 backends referenced in persistence blocks are defined (kv_metadata, kv_agents, kv_faiss, kv_milvus_inline, kv_milvus_remote, kv_datasetio_huggingface, kv_datasetio_localfs, sql_inference, sql_agents, sql_files)
  • All db_path values are consistent and point to /opt/app-root/src/.llama/distributions/rh/
  • The stores subsection aligns with downstream usage: metadata store uses kv_metadata backend, inference store uses sql_inference backend, and both are properly referenced in persistence blocks

100-108: Configuration wiring verified; manual verification required for deployment parameters and agent lifecycle methods.

The backend mapping is confirmed correct:

  • kv_agents (kv_sqlite) is properly defined at line 188 and wired to agent_state at line 101
  • sql_agents (sql_sqlite) is properly defined at line 209 and wired to responses at line 104

However, the repository does not contain agent lifecycle implementation code (checks are external/library-based). You must manually verify:

  1. The queue and writer parameters (max_write_queue_size: 10000, num_writers: 4) are appropriate for your deployment scale
  2. Your agents library/framework correctly reads from both agent_state and responses backends during its lifecycle methods
distribution/build.py (5)

17-17: Verify reproducibility implications of using "main" as version.

Using "main" as the version string means builds will target a moving branch rather than a pinned release, which can lead to non-reproducible builds across different times or environments. While this aligns with the PR objectives to use the midstream main branch, consider:

  • Documenting this behavior and its implications for build reproducibility
  • Using commit SHAs in CI/CD pipelines for reproducible builds
  • Implementing version pinning strategies for stable deployments

Do you want to add documentation or comments explaining when to use "main" vs. pinned versions?


151-155: Good defensive quoting for shell metacharacters.

The code properly quotes packages containing > or < operators to prevent bash redirection issues. Consider also checking for other shell metacharacters like &, |, ;, $, etc., though these are less likely in package specifications.


126-146: Excellent use of shlex for robust parsing.

The implementation correctly uses shlex.split() to handle quoted package names and flags, properly tracking multi-word arguments like --extra-index-url values. The manual iteration with index tracking ensures flags and their values are captured together.


189-189: LGTM - Consistent with uv adoption.

The change from pip install to uv pip install aligns with the PR's objective to use uv as the package manager.


34-34: --no-deps flag is intentional and dependencies are properly handled.

The --no-deps flag in line 34 is not a bug. The build process deliberately uses a two-step approach:

  1. Install llama-stack source without its dependencies (line 34)
  2. Fetch dependencies via llama stack list-deps distribution/build.yaml (line 93)
  3. Install categorized dependencies separately (lines 98-194)

This architecture allows curated dependency management through list-deps rather than relying on pip's automatic resolution. Dependencies are not skipped—they are installed in the correct order (standard packages, PyTorch-specific packages, then no-deps packages) after the source installation.

def get_dependencies():
"""Execute the llama stack build command and capture dependencies."""
cmd = "llama stack build --config distribution/build.yaml --print-deps-only"
cmd = "llama stack list-deps distribution/build.yaml"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify the new command syntax works correctly.

The command changed from llama stack build --config ... --print-deps-only to llama stack list-deps ..., removing the --config flag. Ensure this new command:

  • Exists in the version being used
  • Accepts the config path as a positional argument
  • Returns the expected output format

🏁 Script executed:

#!/bin/bash
# Description: Verify the new list-deps command exists and check its output format

# Check if the command is available and what format it returns
llama stack list-deps --help || echo "Error: list-deps command not found"

# Try running the command to see the output format
if [ -f "distribution/build.yaml" ]; then
  echo "=== Testing list-deps command output ==="
  llama stack list-deps distribution/build.yaml | head -20
fi

Length of output: 164


I'll investigate this further by examining the codebase and checking for documentation on the command change.

Good news: the llama stack list-deps command exists in the official Llama Stack documentation. Now I need to verify whether it accepts a file path as an argument. Let me search for more specific usage examples.

I found important documentation. Let me now examine the actual code to understand the context better.


🏁 Script executed:

#!/bin/bash
# Find and examine the get_dependencies function in distribution/build.py

rg -n "get_dependencies" --type=py -A 10 -B 5

Length of output: 2004


Fix the command syntax: llama stack list-deps requires a distribution name, not a file path.

Official documentation shows llama stack list-deps takes a distribution name (e.g., <distro-name>), not a file path. Examples from PyPI show usage as llama stack list-deps meta-reference-gpu or llama stack list-deps starter.

The current code at line 93 uses llama stack list-deps distribution/build.yaml, which will fail. Either:

  • Pass a distribution name instead of a file path, or
  • Revert to the original llama stack build --config distribution/build.yaml --print-deps-only syntax if config file support is needed

Also update the docstring on line 92 from "Execute the llama stack build command" to reflect the new command being used.

Comment on lines +157 to +164
# Modify pymilvus package to include milvus-lite extra
packages = [
package.replace("pymilvus", "pymilvus[milvus-lite]")
if "pymilvus" in package and "[milvus-lite]" not in package
else package
for package in packages
]
packages = sorted(set(packages))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

Verify and document the pymilvus package modification.

The code automatically replaces pymilvus with pymilvus[milvus-lite], which is a significant functional change that:

  • Adds the milvus-lite extra to all pymilvus installations
  • Could affect package size and dependencies
  • May not be necessary or desired in all deployment scenarios

Additionally, note that line 164 sorts the packages again after they were already sorted on line 149, which is redundant.

Consider:

  • Documenting why this extra is required
  • Making this behavior configurable if different deployment targets need different extras
  • Removing the redundant sort on line 164

🌐 Web query:

Is pymilvus[milvus-lite] the recommended way to install pymilvus, and what are the differences from the base package?

💡 Result:

Short answer: Yes — the Milvus docs recommend installing pymilvus (with the milvus‑lite extra or the bundled milvus‑lite) for a local/single‑file "Milvus Lite" experience. Use either pip install -U pymilvus (recent pymilvus releases include milvus‑lite) or pip install -U "pymilvus[milvus-lite]" to explicitly request the extra. [1][2][3]

What changes vs. the base package

  • pymilvus (base): the Python SDK that talks to a remote Milvus server (Standalone/Distributed/Cloud). If you install plain pymilvus versions before milvus‑lite was bundled, milvus‑lite may not be present. [2]
  • pymilvus[milvus-lite] (extra) or pymilvus versions >= 2.4.2: also installs the milvus‑lite implementation (an embedded, SQLite‑backed local Milvus server), its CLI/tools (milvus-lite), and additional dependencies needed to run Milvus Lite locally (suitable for prototyping, demos, notebooks). Milvus docs note Milvus Lite is not recommended for production. [1][2][3]

Recommendation: For local development/prototyping, install either pip install -U pymilvus (if using a recent release that bundles milvus‑lite) or pip install -U "pymilvus[milvus-lite]" to be explicit. For production or large‑scale deployments use a standalone/distributed Milvus server instead. [1][2][3]

Sources:
[1] Milvus docs — Run Milvus Lite locally (installation notes).
[2] Milvus docs (v2.4.x) — milvus‑lite included in pymilvus >= 2.4.2.
[3] milvus‑lite PyPI (usage and CLI info).


I have the web search results clarifying that pymilvus[milvus-lite] is indeed a recommended approach. Now I need to examine the build.py file to verify the redundant sort claim and understand the deployment context.


---

🏁 Script executed:

```shell
#!/bin/bash
# View lines 140-170 of distribution/build.py to see sorting context
fd -t f "build.py" --path "*/distribution/*" -x cat {} | head -170 | tail -40

Length of output: 233



---

🏁 Script executed:

```shell
#!/bin/bash
# Find and view distribution/build.py around lines 140-170
find . -name "build.py" -path "*/distribution/*" -exec sed -n '140,170p' {} \;

Length of output: 1237



---

🏁 Script executed:

```shell
#!/bin/bash
# View broader context of distribution/build.py (first 80 lines + lines around 140-165)
sed -n '1,80p' distribution/build.py && echo "---" && sed -n '130,170p' distribution/build.py

Length of output: 4430



---

🏁 Script executed:

```shell
#!/bin/bash
# Search for environment variables, deployment flags, or scenario logic in build.py
grep -n "environment\|deploy\|production\|dev\|demo\|scenario\|ENV\|getenv" distribution/build.py | head -20

Length of output: 86


Remove redundant sort on line 164 and document the pymilvus modification.

The pymilvus[milvus-lite] extra is indeed recommended for local development and prototyping, but Milvus Lite is not recommended for production. Given this unconditional modification in a build script with no deployment-specific logic, consider:

  • Remove the redundant packages = sorted(set(packages)) on line 164 (packages were already sorted on line 149; the quote-adding and replacement operations don't create new duplicates)
  • Add a comment explaining why pymilvus[milvus-lite] is required (e.g., "for local development/demos")
  • If this script supports production deployments, make the pymilvus modification conditional or remove it entirely in favor of a standalone Milvus server
🤖 Prompt for AI Agents
In distribution/build.py around lines 157 to 164, remove the redundant final
sorting call (delete the packages = sorted(set(packages)) on line 164) because
packages were already sorted earlier and the replacement won't introduce
duplicates; add a short inline comment above the pymilvus replacement explaining
the reason (e.g., "# Add pymilvus[milvus-lite] for local development/demos; not
recommended for production"); and if this build script may be used for
production, make the pymilvus modification conditional (e.g., guard it with an
explicit flag or environment variable like USE_MILVUS_LITE) so production builds
keep the plain pymilvus package.

Comment on lines +226 to +227
model_id: ${env.INFERENCE_MODEL}
provider_id: vllm-inference
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Provide a default value for model_id environment variable.

The model_id is now bound to ${env.INFERENCE_MODEL} without a fallback default. If this environment variable is undefined at runtime, the configuration will fail to load or the model will be missing, breaking the inference service.

Apply this diff to add a safe default:

- model_id: ${env.INFERENCE_MODEL}
+ model_id: ${env.INFERENCE_MODEL:=llama-2-7b}

Consider what default model makes sense for your distribution, or explicitly require this env var to be set in deployment documentation.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
model_id: ${env.INFERENCE_MODEL}
provider_id: vllm-inference
model_id: ${env.INFERENCE_MODEL:=llama-2-7b}
provider_id: vllm-inference
🤖 Prompt for AI Agents
In distribution/run.yaml around lines 226-227, model_id is set directly to
${env.INFERENCE_MODEL} with no fallback; update the value to provide a safe
default or validate the env var at startup — e.g., replace the direct binding
with a fallback pattern (use your config/templating syntax to default to a
sensible model like "llama-2-13b" or your distro's preferred model) or add
startup validation that exits with a clear error if INFERENCE_MODEL is unset,
and document the required env var in deployment docs.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
distribution/build.py (2)

93-93: Verify the command syntax: llama stack list-deps may require a distribution name, not a file path.

A previous review flagged that the official documentation shows llama stack list-deps takes a distribution name (e.g., meta-reference-gpu), not a file path. The current code passes distribution/build.yaml, which may fail.

Run this script to verify the command works as expected:

#!/bin/bash
# Verify that llama stack list-deps accepts a file path argument
llama stack list-deps --help

# Test with the actual file if available
if [ -f "distribution/build.yaml" ]; then
  echo "=== Testing list-deps with file path ==="
  llama stack list-deps distribution/build.yaml 2>&1 | head -20
fi

157-164: Remove redundant sort and document the pymilvus modification.

As flagged in a previous review:

  1. Line 164 performs a redundant sort—packages were already sorted on line 149, and the pymilvus replacement doesn't introduce new duplicates.
  2. The unconditional addition of [milvus-lite] to pymilvus is appropriate for local development but not recommended for production deployments.

Apply this diff:

     # Modify pymilvus package to include milvus-lite extra
+    # Note: milvus-lite is for local development/demos; not recommended for production
     packages = [
         package.replace("pymilvus", "pymilvus[milvus-lite]")
         if "pymilvus" in package and "[milvus-lite]" not in package
         else package
         for package in packages
     ]
-    packages = sorted(set(packages))
distribution/run.yaml (2)

115-115: Verify the hardcoded base_url default for trustyai_lmeval.

As flagged in a previous review, the base_url default changed to http://localhost:8000/v1, which assumes a local VLLM service. If deployments use remote VLLM instances or the service isn't available at this address, requests will be misdirected.

Consider requiring the VLLM_URL environment variable explicitly or documenting that this default must be overridden for non-local deployments.


235-236: Provide a fallback for the model_id environment variable.

As flagged in a previous review, model_id is bound directly to ${env.INFERENCE_MODEL} without a default. If this environment variable is undefined at runtime, the configuration will fail or the inference service will break.

Apply this diff to add a safe default:

- model_id: ${env.INFERENCE_MODEL}
+ model_id: ${env.INFERENCE_MODEL:=llama-3-8b-instruct}

Choose an appropriate default model for your distribution, or explicitly document that this environment variable must be set.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b975965 and 7bb9abd.

📒 Files selected for processing (6)
  • .pre-commit-config.yaml (1 hunks)
  • distribution/Containerfile (2 hunks)
  • distribution/Containerfile.in (1 hunks)
  • distribution/build.py (4 hunks)
  • distribution/build.yaml (0 hunks)
  • distribution/run.yaml (6 hunks)
💤 Files with no reviewable changes (1)
  • distribution/build.yaml
🚧 Files skipped from review as they are similar to previous changes (1)
  • distribution/Containerfile.in
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-test-push (linux/amd64)
  • GitHub Check: Summary
🔇 Additional comments (15)
.pre-commit-config.yaml (1)

59-59: LGTM! Aligned with PR objectives.

The change from pinned version v0.3.0rc3+rhai0 to @main correctly aligns with the PR's goal to use the midstream main branch for llama-stack installation.

distribution/build.py (4)

14-14: LGTM!

The shlex import is appropriate for the new parsing logic that handles quoted package names in the dependency parsing function.


17-17: LGTM! Aligned with PR objectives.

Updating the version constant from "0.3.0rc3+rhai0" to "main" correctly implements the PR's goal to use the midstream main branch.


34-34: LGTM! Consistent with uv migration.

The source install command now uses uv pip install, aligning with the PR's objective to adopt uv as the package manager.


189-189: LGTM! Consistent with uv migration.

The pinned dependencies installation now uses uv pip install, maintaining consistency with the broader migration to the uv package manager.

distribution/run.yaml (7)

66-74: LGTM! Consistent persistence structure.

The new persistence blocks for milvus and faiss providers follow a consistent pattern with proper backend references and namespacing. This aligns with the broader migration from kvstore to persistence backends.


85-87: LGTM! Consistent with persistence migration.

The persistence block for milvus-remote follows the same pattern as other vector_io providers, maintaining consistency across the configuration.


100-108: LGTM! Well-structured agent persistence.

The new persistence structure for the meta-reference agent provider properly separates:

  • agent_state using a key-value backend (kv_agents)
  • responses using a SQL backend (sql_agents) with appropriate queue configuration

This separation is architecturally sound for handling state vs. response data.


138-146: LGTM! Consistent datasetio persistence.

The persistence blocks for huggingface and localfs datasetio providers follow the established pattern with appropriate backend references and namespacing.


180-182: LGTM! Clean metadata store structure.

The new metadata_store configuration for files properly references the sql_files backend and specifies the table name, maintaining consistency with the broader persistence migration.


183-232: LGTM! Well-organized storage architecture.

The new storage structure provides clear separation of concerns:

  • Backends: Define the storage types and paths (kv_sqlite, sql_sqlite)
  • Stores: Map logical stores to backends with proper namespacing and configuration

The architecture supports multiple specialized backends (agents, files, inference, datasetio) while maintaining a consistent structure. The queue configurations for inference and stores are appropriately sized.


254-255: LGTM! Centralized telemetry configuration.

Moving telemetry to the top level (rather than provider-scoped) is a cleaner architecture that aligns with the PR's objective to remove the old telemetry API.

distribution/Containerfile (3)

5-5: LGTM! Bootstrap step for uv migration.

Installing uv before other dependencies is the correct bootstrap step for the migration to the uv package manager.


7-58: LGTM! Consistent uv wrapper adoption.

All pip install commands have been successfully migrated to use the uv pip install wrapper, maintaining consistency throughout the Containerfile. The migration preserves:

  • Install flags (--upgrade, --no-deps, --no-cache, --extra-index-url)
  • Package specifications and version constraints
  • Multi-line formatting for readability

This aligns with the PR's objective to adopt uv as the package manager.


59-61: LGTM! Correctly uses @main branch.

The final llama-stack installation correctly:

  1. Uses the uv pip install wrapper (line 61)
  2. References @main branch instead of the non-existent @vmain (addressed in commit 7be5044)
  3. Uses appropriate flags (--no-cache --no-deps) for git installations

This aligns with changes in .pre-commit-config.yaml and distribution/build.py.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
distribution/run.yaml (2)

115-115: Verify the base_url default is appropriate for your deployment.

The base_url now defaults to http://localhost:8000/v1 when VLLM_URL is not set. While a past review marked this as addressed, ensure this default makes sense for your deployment scenario. If VLLM typically runs remotely, consider requiring the environment variable explicitly.


235-236: Environment variable INFERENCE_MODEL lacks a default value.

The model_id is bound to ${env.INFERENCE_MODEL} without a fallback. If this variable is undefined at runtime, the configuration will fail to load or result in an empty model_id.

Consider adding a sensible default:

- model_id: ${env.INFERENCE_MODEL}
+ model_id: ${env.INFERENCE_MODEL:=llama-3-8b}

Alternatively, document that INFERENCE_MODEL is a required environment variable in your deployment documentation.

distribution/build.py (2)

116-116: Verify that "llama stack list-deps" accepts a file path argument.

The command changed from llama stack build --config ... --print-deps-only to llama stack list-deps distribution/build.yaml. Based on past review comments, the list-deps command may expect a distribution name rather than a file path.

Run this script to verify the command syntax:

#!/bin/bash
# Check if list-deps command exists and its usage
echo "=== Checking llama stack list-deps help ==="
llama stack list-deps --help 2>&1 | head -20

echo ""
echo "=== Attempting to run with file path ==="
llama stack list-deps distribution/build.yaml 2>&1 | head -10

180-187: Remove redundant sort and document pymilvus modification.

Two issues here:

  1. Line 187: The second packages = sorted(set(packages)) is redundant since packages were already sorted and deduplicated at line 172. The pymilvus string replacement doesn't create duplicates.

  2. Lines 180-186: The unconditional replacement of pymilvus with pymilvus[milvus-lite] adds the embedded Milvus implementation suitable for local development. Per web search results, Milvus Lite is not recommended for production deployments.

Apply this diff:

         ]
         packages = [
             package.replace("pymilvus", "pymilvus[milvus-lite]")
             if "pymilvus" in package and "[milvus-lite]" not in package
             else package
             for package in packages
         ]
-        packages = sorted(set(packages))

Consider adding a comment explaining the pymilvus modification:

+        # Add pymilvus[milvus-lite] for local development/demos
+        # Note: Milvus Lite is not recommended for production
         packages = [
             package.replace("pymilvus", "pymilvus[milvus-lite]")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7bb9abd and c3ef94a.

📒 Files selected for processing (6)
  • .pre-commit-config.yaml (1 hunks)
  • distribution/Containerfile (2 hunks)
  • distribution/Containerfile.in (1 hunks)
  • distribution/build.py (6 hunks)
  • distribution/build.yaml (0 hunks)
  • distribution/run.yaml (6 hunks)
💤 Files with no reviewable changes (1)
  • distribution/build.yaml
🚧 Files skipped from review as they are similar to previous changes (1)
  • .pre-commit-config.yaml
🧰 Additional context used
🪛 GitHub Actions: Pre-commit
distribution/Containerfile.in

[error] 1-1: Distribution Documentation hook failed. Could not find llama-stack version in Containerfile.

distribution/Containerfile

[error] 1-1: Distribution Documentation hook failed. Could not find llama-stack version in Containerfile.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-test-push (linux/amd64)
  • GitHub Check: Summary
🔇 Additional comments (15)
distribution/Containerfile.in (2)

4-4: LGTM! UV wrapper installation added correctly.

The addition of RUN pip install uv aligns with the PR's objective to use uv as the package manager. This ensures uv is available before subsequent dependency installations that use uv pip install.


1-18: Pipeline failure confirmed: The pre-commit hook cannot extract a version from the generated Containerfile when using @main git ref.

The hook gen_distro_docs.py expects semantic versions (e.g., 0.3.0, 0.3.0rc3+rhai0) in either:

  • Direct pip install: llama-stack==X.Y.Z
  • Git URL: git+https://...@vX.Y.Z

Since build.py generates a Containerfile with git+https://github.com/opendatahub-io/llama-stack.git@main, neither pattern matches, causing the script to fail with "Could not find llama-stack version in Containerfile".

To resolve, either:

  1. Update gen_distro_docs.py to handle non-semantic git refs (e.g., @main, @develop), or
  2. Keep a semantic version reference in the Containerfile (e.g., in a comment or environment variable) that the hook can extract
distribution/Containerfile (2)

1-2: Auto-generated file - changes look correct.

The warning header correctly indicates this file is generated by distribution/build.py. The uv wrapper usage throughout is consistent with the PR objectives.


61-61: Git ref correctly updated to @main.

The previous issue with @vmain has been resolved. Line 61 now correctly uses @main, which aligns with .pre-commit-config.yaml and distribution/build.py.

distribution/run.yaml (3)

183-232: Storage architecture looks well-structured.

The new storage section with dedicated backends and stores provides good separation of concerns. Each provider has its own backend (e.g., kv_agents, sql_agents), which should prevent configuration conflicts.


254-255: Telemetry configuration simplified and moved to root level.

Moving telemetry from provider-scoped to root-level configuration is a good refactor. This centralized approach is cleaner and easier to manage.


66-68: Backend reference verification confirms all configurations are correct.

The kv_milvus_inline backend referenced in the milvus provider's persistence section (line 67) is properly defined in the storage.backends section (line 194) with complete configuration including db_path. All 11 backend references across the configuration file are consistently defined and have no undefined references.

distribution/build.py (8)

14-14: LGTM! shlex import added for proper quoted argument parsing.

The addition of shlex is necessary for the new dependency parsing logic that handles quoted package names and arguments correctly.


34-34: LGTM! Source install command updated to use uv wrapper.

The source_install_command correctly uses RUN uv pip install instead of RUN pip install, consistent with the PR's migration to uv.


51-55: LGTM! Function generalized from check_llama_installed to check_package_installed.

Renaming and adding a package_name parameter makes this function reusable for checking both uv and llama binaries. Good refactoring.


91-111: LGTM! New function to install llama-stack from source using uv.

The install_llama_stack_from_source function properly uses uv for git-based installation and includes good error handling with stdout/stderr output on failure.


127-178: Parsing logic handles dual formats correctly with good defensive coding.

The new parsing logic successfully handles both legacy "uv pip install ..." format and direct package lists. Key strengths:

  • Uses shlex.split() for proper quoted argument handling
  • Sorts and deduplicates packages
  • Adds quotes around packages with > or < to prevent shell redirection issues

189-204: Command classification logic is well-structured.

The categorization of dependencies into torch_deps, no_deps, no_cache, and standard_deps based on flags is logical and maintains proper grouping for the generated Containerfile.


212-212: LGTM! Pinned dependencies updated to use uv wrapper.

The pinned dependencies command correctly uses RUN uv pip install --upgrade, consistent with the uv migration.


265-269: LGTM! Main function properly checks for uv and installs from source.

The main function correctly:

  1. Checks for uv binary availability
  2. Installs llama-stack from source using the new function
  3. Verifies llama binary is available

Comment on lines +17 to 18
CURRENT_LLAMA_STACK_VERSION = "main"
LLAMA_STACK_VERSION = os.getenv("LLAMA_STACK_VERSION", CURRENT_LLAMA_STACK_VERSION)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Verify version change from pre-release to "main" branch.

CURRENT_LLAMA_STACK_VERSION changed from "0.3.0rc3+rhai0" to "main". This switches from a specific pre-release version to tracking the main branch, which could introduce instability if the upstream main branch has breaking changes.

Consider:

  1. Is your team prepared to handle potential breaking changes from upstream main?
  2. Should you use a specific commit SHA instead for reproducible builds?
  3. Update documentation to reflect this tracking strategy

🌐 Web query:

What are the best practices for pinning dependencies to git branches vs tags in Python projects

💡 Result:

Short answer: pin immutable refs (tags or commit hashes) for production / reproducible builds; use branches only for tracking live development or when you deliberately want moving targets — but validate them with CI and lockfiles. Key practices:

  • Prefer tags (semantic-version tags) or exact commit SHAs for reproducible builds — tags are stable release markers, SHAs are completely immutable. Use these in git: URLs (or Poetry/Pipenv rev/tag) when installing from VCS. [1][2][3]
  • Only use branches when you need a moving target (e.g., testing against a library’s main/next branch). Treat branch pins as risky for production because they change under you. If you must use a branch, pin it in CI and test frequently. [2][4]
  • Always maintain and commit a lock file (poetry.lock, Pipfile.lock, requirements.txt with hashes) so CI and colleagues reproduce identical dependency trees even if manifests point at ranges or branches. Lockfiles + VCS-tagged deps give best reproducibility. [5][1]
  • Prefer semantic version tags for published packages and keep a clear tagging policy (vMAJOR.MINOR.PATCH, annotated tags, changelog). This supports automation and traceability. [4][3]
  • Automate pin maintenance and security updates (Dependabot, Renovate, CI jobs). Configure rules to update pinned git-tag or SHA references (or open PRs when a new tag appears) and always run your test suite before merging. [6][7]
  • When referencing VCS deps in manifests, be explicit: use @tag or @ (or Poetry’s tag/rev keys) rather than leaving the default branch unspecified. Example: requests @ git+https://github.com/owner/requests.git@v1.2.3 or @38eb5d3b. [1][2]
  • Record provenance and release notes: annotate tags and keep changelogs so consumers know what each pinned tag/sha represents. Use CI to publish artifacts from tags so releases are reproducible. [4][3]
  • Policy checklist for projects:
    • Production packages: pin to tag or commit SHA + commit lockfile. [1][2]
    • Development/test only: branch pins acceptable but run frequent CI + automated update checks. [2][6]
    • Use automated tools (Dependabot/Renovate) to surface updates and security fixes; review & test before updating pins. [6][7]

Selected sources:

  • Poetry git dependency docs (how to pin branch/tag/rev). [1]
  • Pipenv best-practices / lockfile guidance. [5]
  • Why use git tags / immutability and SemVer guidance. [4][3]
  • General recommendation to pin for reproducibility (community discussion). [8]
  • GitHub Dependabot / maintaining dependencies guidance (automation & update policy). [6]

If you want, I can: (a) show concrete manifest examples for pip/Poetry/Pipenv, or (b) draft a short CI job that enforces and tests git-pinned deps.


I'll investigate the codebase context to understand the implications of this version change and whether there are existing mitigations (lockfiles, CI validation, etc.).


🏁 Script executed:

#!/bin/bash
# Investigate build.py and version usage patterns

# 1. Check build.py context around lines 17-18
echo "=== build.py context (lines 10-25) ==="
head -30 distribution/build.py | tail -20

# 2. Check for lockfiles
echo -e "\n=== Looking for lockfiles ==="
find . -maxdepth 2 -type f \( -name "*.lock" -o -name "Pipfile*" -o -name "poetry.lock" \) 2>/dev/null | head -10

# 3. Check for CI configuration
echo -e "\n=== CI configuration files ==="
find . -maxdepth 2 -type f \( -name ".github" -o -name "*.yml" -o -name "*.yaml" \) 2>/dev/null | grep -E "(workflows|\.github)" | head -10

# 4. Search for LLAMA_STACK_VERSION usage
echo -e "\n=== LLAMA_STACK_VERSION usage ==="
grep -r "LLAMA_STACK_VERSION" --include="*.py" 2>/dev/null | head -15

# 5. Check README or docs for version strategy
echo -e "\n=== README mention of versions ==="
grep -i "version\|main\|branch" README.md 2>/dev/null | head -10

Length of output: 1761



🏁 Script executed:

#!/bin/bash
# Check how LLAMA_STACK_VERSION is actually used in installation logic

# 1. Find and show install_llama_stack_from_source function
echo "=== install_llama_stack_from_source implementation ==="
grep -n -A 20 "def install_llama_stack_from_source" distribution/build.py

# 2. Check is_install_from_source
echo -e "\n=== is_install_from_source implementation ==="
grep -n -A 10 "def is_install_from_source" distribution/build.py

# 3. Look for any git clone or VCS-based installation
echo -e "\n=== Git/VCS installation patterns ==="
grep -n "git\|clone\|vcs\|@git" distribution/build.py | head -20

# 4. Check if there's a full requirements file or setup.py
echo -e "\n=== Requirements files ==="
find distribution -name "requirements*.txt" -o -name "setup.py" -o -name "setup.cfg"

# 5. Show full BASE_REQUIREMENTS and PINNED_DEPENDENCIES
echo -e "\n=== Full BASE_REQUIREMENTS and PINNED_DEPENDENCIES ==="
grep -n -A 50 "BASE_REQUIREMENTS" distribution/build.py | head -60

Length of output: 4378


🏁 Script executed:

#!/bin/bash
# Check how LLAMA_STACK_VERSION is actually used in installation logic

# 1. Find and show install_llama_stack_from_source function
echo "=== install_llama_stack_from_source implementation ==="
grep -n -A 20 "def install_llama_stack_from_source" distribution/build.py

# 2. Check is_install_from_source
echo -e "\n=== is_install_from_source implementation ==="
grep -n -A 10 "def is_install_from_source" distribution/build.py

# 3. Look for any git clone or VCS-based installation
echo -e "\n=== Git/VCS installation patterns ==="
grep -n "git\|clone\|vcs\|@git" distribution/build.py | head -20

# 4. Check if there's a full requirements file or setup.py
echo -e "\n=== Requirements files ==="
find distribution -name "requirements*.txt" -o -name "setup.py" -o -name "setup.cfg"

# 5. Show full BASE_REQUIREMENTS and PINNED_DEPENDENCIES
echo -e "\n=== Full BASE_REQUIREMENTS and PINNED_DEPENDENCIES ==="
grep -n -A 50 "BASE_REQUIREMENTS" distribution/build.py | head -60

Length of output: 4378


Code correctly routes "main" to git-based installation; review concerns remain valid but not critical.

The code detects that "main" is not a version string (has no dots) and routes it to git installation: git+https://github.com/opendatahub-io/llama-stack.git@main. This works correctly, so the invalid pip syntax in BASE_REQUIREMENTS is bypassed.

However, the review comment's concerns are valid:

  • No lockfile or commit-level pinning: Each build pulls the latest from main, creating reproducibility gaps compared to pinning specific commit SHAs (per best practices: immutable refs (tags or commit hashes) are preferred for reproducible builds; branches should only be used when deliberately tracking a moving target).
  • Breaking changes risk: Upstream main can introduce breaking changes between builds.
  • Documentation needed: The build strategy (main branch tracking) is not documented; the README mentions SHA-based image tags but doesn't clarify version pinning strategy.

The three original suggestions remain sound:

  1. Breaking change preparation: Ensure the team monitors upstream main regularly.
  2. Consider commit SHA pinning: More reproducible than branch tracking; aligns with distribution best practices.
  3. Update documentation: Clarify version strategy in README or build docs.
🤖 Prompt for AI Agents
In distribution/build.py around lines 17-18, CURRENT_LLAMA_STACK_VERSION is set
to "main" which causes builds to pull the moving branch (git+...@main) and
creates non-reproducible images and risk of breakages; change the build flow to
default to a pinned immutable ref (commit SHA or tag) instead of "main", add an
environment/CI mechanism to inject a specific commit SHA for
LLAMA_STACK_VERSION, update README/build docs to state whether builds track
branch vs pinned SHAs, and add a CI/alerting step or checklist to regularly
monitor upstream main if branch tracking is intentionally used.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
distribution/build.py (1)

114-221: llama stack list-deps integration and dependency parsing look correct

The switch to llama stack list-deps distribution/build.yaml plus the dual‑format parsing (legacy uv pip ... vs. plain package lists) is consistent with upstream CLI behavior and examples, and the shlex‑based parsing with flag bucketing/deduping is careful. Given this, I don’t see remaining functional issues in get_dependencies() for current Llama Stack versions.

Does the current `llama stack list-deps` CLI support passing either a distro name or a config path (e.g., `distribution/build.yaml`) as its argument, and does it still emit `uv pip install ...` lines by default?
🧹 Nitpick comments (2)
distribution/build.py (2)

51-55: Docstring for check_package_installed is now misleading

This helper is generic (package_name can be "uv" or "llama"), but the docstring still says “Check if llama binary is installed and accessible.” Consider updating the docstring/message to reflect the generic behavior (e.g., “Check if a CLI named package_name is on PATH.”).


91-112: Avoid shell=True when installing llama-stack from source (line 97)

The install_llama_stack_from_source() function constructs a shell command containing llama_stack_version and calls subprocess.run(..., shell=True, ...). Passing command and arguments as a sequence with shell=False (the default) when building commands from variables avoids invoking a shell and therefore prevents shell metacharacters from being interpreted. Refactor to use an argument list instead:

-        result = subprocess.run(
-            f"uv pip install git+https://github.com/opendatahub-io/llama-stack.git@{llama_stack_version}",
-            shell=True,
-            check=True,
-            capture_output=True,
-            text=True,
-        )
+        result = subprocess.run(
+            [
+                "uv",
+                "pip",
+                "install",
+                f"git+https://github.com/opendatahub-io/llama-stack.git@{llama_stack_version}",
+            ],
+            check=True,
+            capture_output=True,
+            text=True,
+        )

This removes shell parsing from the equation while keeping behavior the same.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c3ef94a and e3cf800.

📒 Files selected for processing (8)
  • .pre-commit-config.yaml (1 hunks)
  • distribution/Containerfile (2 hunks)
  • distribution/Containerfile.in (1 hunks)
  • distribution/README.md (1 hunks)
  • distribution/build.py (6 hunks)
  • distribution/build.yaml (0 hunks)
  • distribution/run.yaml (6 hunks)
  • scripts/gen_distro_docs.py (4 hunks)
💤 Files with no reviewable changes (1)
  • distribution/build.yaml
🚧 Files skipped from review as they are similar to previous changes (2)
  • .pre-commit-config.yaml
  • distribution/Containerfile.in
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-test-push (linux/amd64)
  • GitHub Check: Summary
🔇 Additional comments (5)
distribution/Containerfile (1)

5-15: Containerfile uv adoption and git@main install look consistent with the new build flow

The switch to uv pip install for dependencies and providers, plus the final git@main install for llama-stack, aligns with distribution/build.py and the PR intent. I don’t see functional issues in this generated Dockerfile segment.

Also applies to: 51-61

scripts/gen_distro_docs.py (1)

12-34: Commit/main/tag resolution for docs is solid and matches the generated README

The new resolve_main_commit_hash() helper and the tuple‑returning extract_llama_stack_version() cleanly handle git‑main, tagged, and SHA‑pinned installs, and gen_distro_docs() uses them to generate the correct link + short display (as seen in distribution/README.md). This looks correct and robust for current use.

Also applies to: 36-41, 52-65, 71-73, 75-80, 213-235, 243-247

distribution/README.md (1)

7-7: README versioning matches the generator, but highlights a trustyai_fms mismatch

The header commit link and short hash are consistent with the new generator logic. Note that this README still reports trustyai_fms as 0.2.3, while distribution/run.yaml configures module: llama_stack_provider_trustyai_fms==0.2.2; the fix should happen in the config/build files rather than editing this generated README directly.

Also applies to: 27-27

distribution/run.yaml (2)

66-74: New persistence/storage wiring aligns with upstream patterns

The added persistence blocks for vector_io, agents, datasetio, and files plus the new storage.backends/storage.stores layout mirror upstream Llama Stack configs (kv_* + sql_* backends, metadata/inference/conversations/prompts stores), and the top‑level telemetry: enabled: true matches the newer telemetry model. Structurally this looks correct and should make it much easier to swap storage backends centrally.

Also applies to: 85-87, 100-108, 138-146, 181-217, 219-232, 254-255


112-116: INFERENCE_MODEL binding and trustyai_lmeval base_url behavior confirmed aligned with upstream

Verification confirms your configuration matches TrustyAI LMEval and OpenShift AI guidance:

  1. model_id: ${env.INFERENCE_MODEL} with no default aligns with upstream Llama Stack examples and deployment manifests that set it via Kubernetes secrets/CRs—reasonable as long as your deployment manifests always provide it.

  2. base_url: ${env.VLLM_URL:=http://localhost:8000/v1} for trustyai_lmeval provides a sane local-dev default (vLLM on localhost) while allowing real deployments to override via VLLM_URL. Ensure your external docs clearly state that VLLM_URL must be set for non-local setups so users don't unintentionally hit localhost.

This section is consistent with TrustyAI LMEval, Red Hat OpenShift AI, and upstream Llama Stack guidance.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
distribution/run.yaml (1)

15-15: Inconsistency: vllm-inference provider binding.

Line 15 conditionally includes the vllm-inference provider only when VLLM_URL is set (${env.VLLM_URL:+vllm-inference}), but line 236 unconditionally references it as provider_id: vllm-inference in the models section. If VLLM_URL is undefined, the provider won't be loaded but the model will still reference it, causing a configuration failure.

Align the conditional bindings. Either:

  1. Keep the provider conditional and make the model reference conditional, or
  2. Make the provider unconditional if it should always be available.

Recommended approach if vllm-inference is the primary inference target:

- provider_id: ${env.VLLM_URL:+vllm-inference}
+ provider_id: vllm-inference

And ensure the model configuration is only loaded when the provider is available, or update deployment docs to require VLLM_URL to be set.

Also applies to: 236-236

♻️ Duplicate comments (5)
distribution/build.py (3)

17-21: Tracking llama-stack from main reduces build reproducibility; confirm this is intentional.

Using "main" as CURRENT_LLAMA_STACK_VERSION means every build pulls whatever is on the branch at that time (via git), not an immutable tag/SHA. That’s fine if this repo is explicitly meant to track midstream main, but it does mean:

  • Non-reproducible images over time.
  • Higher risk of surprise breakages from upstream.

If this is the long‑term plan, consider documenting the strategy and/or supporting an override to pin to a specific commit in CI while keeping main for local/dev builds.


115-121: Verify llama stack list-deps distribution/build.yaml is valid for your CLI version and update the docstring.

The code switched from a build --config ... --print-deps-only style to llama stack list-deps distribution/build.yaml. Earlier CLI docs showed list-deps taking a distribution name, not a YAML path; if that’s still true, this command may fail at runtime.

Actions to consider:

  • Confirm current llama stack docs / --help output that distribution/build.yaml as a positional argument is supported.
  • If the CLI expects a distribution name instead, pass that name (and keep the config path in your build.yaml) or revert to the previous build --config ... --print-deps-only style if still supported.
  • Update the function docstring to mention list-deps rather than “build command”.

Example verification:

#!/bin/bash
set -euo pipefail

# Inspect list-deps usage
llama stack list-deps --help || echo "list-deps not available"

# Try running with your current argument to confirm it works
llama stack list-deps distribution/build.yaml | head -20

127-188: Dependency parsing looks solid overall; consider removing redundant sort and documenting pymilvus[milvus-lite].

The shlex-based parsing, flag separation, and classification into standard_deps / torch_deps / no_deps / no_cache all look correct and robust to mixed formats and quoting.

Two nits:

  1. packages is deduped/sorted twice:

    • Line 172: packages = sorted(set(packages))
    • Line 187: packages = sorted(set(packages)) again after the pymilvus replacement.

    The second call is redundant because the replacement doesn’t introduce new entries, only modifies existing ones.

  2. The unconditional replacement of pymilvus with pymilvus[milvus-lite] is a behavioral choice that may not be desired in all environments (Milvus Lite isn’t typically recommended for production). It’s worth:

    • Adding a brief comment explaining the intent (e.g., “use milvus-lite for local/demo builds”), and/or
    • Guarding this behavior behind a flag/env var if this script is also used for production images.

Suggested tweak:

-            # Modify pymilvus package to include milvus-lite extra
-            packages = [
-                package.replace("pymilvus", "pymilvus[milvus-lite]")
-                if "pymilvus" in package and "[milvus-lite]" not in package
-                else package
-                for package in packages
-            ]
-            packages = sorted(set(packages))
+            # Modify pymilvus package to include milvus-lite extra
+            packages = [
+                package.replace("pymilvus", "pymilvus[milvus-lite]")
+                if "pymilvus" in package and "[milvus-lite]" not in package
+                else package
+                for package in packages
+            ]

(and optionally gate the replacement behind a config/env check).

distribution/run.yaml (2)

115-115: [Duplicate] Verify base_url default change for trustyai_lmeval.

Line 115 hardcodes a localhost default: ${env.VLLM_URL:=http://localhost:8000/v1}. This assumes a local VLLM service on port 8000 and silently misdirects requests if the service isn't available or if a remote VLLM instance is intended.

This issue was flagged in a previous review and remains unresolved.

Consider either:

  1. Remove the localhost fallback: base_url: ${env.VLLM_URL:=} and require VLLM_URL to be explicitly set in deployment docs.
  2. Keep a default but add documentation specifying that VLLM_URL must be overridden for non-local deployments.

235-236: [Duplicate] Provide a default value for INFERENCE_MODEL environment variable.

Line 235 binds model_id directly to ${env.INFERENCE_MODEL} without a fallback. If this environment variable is undefined at runtime, the configuration will fail to load, breaking the inference service.

This critical issue was flagged in a previous review and remains unresolved.

Apply this diff to add a safe default:

- model_id: ${env.INFERENCE_MODEL}
+ model_id: ${env.INFERENCE_MODEL:=llama-2-7b}

Or explicitly require this env var in deployment documentation and add startup validation to fail fast with a clear error message if it's unset.

🧹 Nitpick comments (5)
distribution/build.py (4)

34-49: Keep source_install_command and is_install_from_source in sync with actual usage.

source_install_command and is_install_from_source() are now the only pieces deciding whether the generated Containerfile uses a git-based install, while install_llama_stack_from_source() is always called at runtime regardless of is_install_from_source(). If you later introduce non‑source installs again, this split may get confusing.

Consider either:

  • Reusing source_install_command in install_llama_stack_from_source() (single source of truth for the git URL/options), or
  • Making install_llama_stack_from_source() conditional on is_install_from_source() so both code paths agree on what “from source” means.

51-55: Docstring for check_package_installed is now misleading.

The function is generic (package_name param) but the docstring still says “Check if llama binary…”. It’s now used for both uv and llama.

Update the docstring to something like “Check if a binary is installed and accessible on PATH.”


91-112: Unconditional uv pip install of llama-stack may be expensive and uses shell interpolation.

main() now calls install_llama_stack_from_source(LLAMA_STACK_VERSION) on every run, even if:

  • llama is already installed and at the desired version, or
  • LLAMA_STACK_VERSION is a plain version string rather than a commit/main ref.

Additionally, install_llama_stack_from_source() builds the command via f-string with shell=True, so a malicious/accidental value in LLAMA_STACK_VERSION (from the environment) could inject shell syntax.

Consider:

  • Guarding the install behind is_install_from_source() or a “force install” flag, and/or skipping the install when llama is already present at the right version.
  • Switching to subprocess.run([...], shell=False) with explicit args, or at least validating llama_stack_version to be a safe ref (commit/tag/branch) before using it in a shell command.

Also applies to: 264-270


264-283: Startup sequence is sensible; minor opportunity to avoid redundant checks.

main() now:

  • Ensures uv is present,
  • Installs llama-stack from source,
  • Checks llama is on PATH,
  • Skips version check when using “from source”.

That sequencing matches the new design. If you implement the earlier suggestion to make the install conditional, you could also avoid re-running it in environments where llama is already installed and LLAMA_STACK_VERSION hasn’t changed, but that’s an optimization rather than a correctness issue.

distribution/run.yaml (1)

254-255: Top-level telemetry enabled by default.

Lines 254–255 add a top-level telemetry: {enabled: true} block, as mentioned in the PR summary ("removed old telemetry API" and "reintroduced telemetry at the root level").

Confirm:

  1. Telemetry is intended to be enabled by default in this distribution.
  2. If telemetry should be opt-in or configurable via environment variable, add a fallback like enabled: ${env.TELEMETRY_ENABLED:=true} or enabled: ${env.TELEMETRY_ENABLED:=false}.
  3. Validate that the removal of per-provider telemetry blocks (mentioned in the summary) does not break downstream telemetry consumers.

If telemetry should be configurable, consider:

- telemetry:
-   enabled: true
+ telemetry:
+   enabled: ${env.TELEMETRY_ENABLED:=true}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c3ef94a and e3cf800.

📒 Files selected for processing (8)
  • .pre-commit-config.yaml (1 hunks)
  • distribution/Containerfile (2 hunks)
  • distribution/Containerfile.in (1 hunks)
  • distribution/README.md (1 hunks)
  • distribution/build.py (6 hunks)
  • distribution/build.yaml (0 hunks)
  • distribution/run.yaml (6 hunks)
  • scripts/gen_distro_docs.py (4 hunks)
💤 Files with no reviewable changes (1)
  • distribution/build.yaml
🚧 Files skipped from review as they are similar to previous changes (1)
  • distribution/Containerfile.in
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Summary
🔇 Additional comments (9)
distribution/build.py (1)

209-219: Pinned dependency block with uv pip install --upgrade looks consistent.

The pinned deps are installed first with a single RUN uv pip install --upgrade ..., which should give deterministic versions and a clean layering story. This aligns well with the new uv-based flow.

No changes needed here.

distribution/README.md (1)

7-7: Commit-based version reference in README matches new docs tooling.

Using the short commit ([a73f4cc](…)) is clear and aligns with scripts/gen_distro_docs.py’s logic (commit hash + link to full commit). Looks good.

.pre-commit-config.yaml (1)

48-60: Pre-commit hook dependency switch to uv>=0.9.0 is aligned with build.py.

The Distribution Build hook now depends on uv>=0.9.0, which matches the new check_package_installed("uv") and uv pip usage in distribution/build.py. This should ensure uv is available inside the hook’s venv without pinning llama-stack itself.

As a follow-up, just verify that the chosen lower bound (>=0.9.0) is compatible with the llama-stack tooling version you’re targeting.

distribution/Containerfile (1)

5-61: Containerfile uv adoption and llama-stack @main source install are consistent; consider pinning uv.

Highlights:

  • RUN pip install uv followed by RUN uv pip install ... across the rest of the file matches the new uv-centric flow in distribution/build.py.
  • The llama-stack install uses git+https://github.com/opendatahub-io/llama-stack.git@main, which is consistent with CURRENT_LLAMA_STACK_VERSION = "main" and the docs tooling.

Two minor suggestions:

  • For reproducibility, consider pinning uv to a specific version (or at least a constrained range) in the Containerfile, similar to the pre-commit hook.
  • If you later move away from tracking main, this line and the build.py constant should be updated together to avoid drift.

Otherwise the changes look coherent.

scripts/gen_distro_docs.py (3)

12-33: resolve_main_commit_hash implementation is straightforward and failure-safe.

Using git ls-remote <repo_url> main with a 10s timeout and catching CalledProcessError/TimeoutExpired/generic exceptions is a reasonable way to derive the commit hash. Falling back to None and letting callers handle the “main” fallback is also sensible.

No changes needed here; just be aware this introduces a network + git dependency when generating docs.


36-83: Version and repo-owner extraction cover main/tagged cases well; patterns look correct.

Notable points:

  • The main_pattern and git_pattern regexes correctly handle:
    • git+https://github.com/<owner>/llama-stack.git@main
    • git+https://github.com/<owner>/llama-stack.git@v0.3.0... or @0.3.0...
  • When @main is detected, you resolve the commit hash and return (commit_hash, repo_owner), falling back to ("main", repo_owner) if resolution fails.
  • For pinned wheel installs (llama-stack==...), you default the owner to "opendatahub-io", which is fine given the current usage.

This gives a clean (version, repo_owner) interface for downstream code.


213-244: Commit-vs-tag link generation and README header formatting are clear.

The is_commit_hash heuristic (7–40 hex chars) and the branching:

  • commit → link to /commit/<hash> with short 7-char display,
  • main → link to /tree/main,
  • otherwise → link to /releases/tag/v<version>,

produce intuitive, stable links in the README header. The resulting line:

The image is currently shipping with the Open Data Hub version of Llama Stack version [{version_display}]({version_link})

matches the updated README content.

Looks good.

distribution/run.yaml (2)

66-87: Storage backend and persistence refactoring—verify semantic equivalence and migration safety.

The file replaces inline kvstore blocks with a centralized storage section (lines 183–232) featuring explicit backends and namespace mappings. This is a significant structural change across:

  • Vector IO providers (Milvus, FAISS): lines 66–87
  • Agents (meta-reference): lines 100–108
  • DatasetIO (HuggingFace, LocalFS): lines 138–146
  • Files provider: lines 181–182

The new storage architecture introduces separate database files per backend (e.g., milvus_inline_registry.db, agents_store.db, faiss.db) and unified namespace conventions. Verify:

  1. Data migration: If this is deployed over an existing instance, confirm whether old inline kvstore data migrates to new backends or if a data import step is required.
  2. Namespace collisions: Ensure namespace paths (e.g., vector_io::milvus, agents::meta_reference, datasetio::huggingface) don't conflict with other config or runtime systems.
  3. Storage backend types: All backends use kv_sqlite or sql_sqlite. Confirm these types are supported by the runtime and match the provider expectations.
  4. Backward compatibility: If this config is used to upgrade existing deployments, ensure the storage layer can read or migrate from the old kvstore format.

Also applies to: 100-108, 138-146, 181-182, 183-232


91-91: I need to verify whether the specific older versions pinned in the PR actually exist on PyPI and understand the reason for not using the latest versions.

I need to search more specifically for version history to confirm whether these specific pinned versions exist on PyPI.

Use latest available versions—0.2.3 (trustyai_fms) and 0.3.1 (lmeval) released in October 2025.

The latest versions on PyPI are llama_stack_provider_trustyai_fms 0.2.3 (Oct 7, 2025) and llama_stack_provider_lmeval 0.3.1 (Oct 17, 2025). The PR pins older versions (0.2.2 and 0.2.4 respectively). While the pinned versions likely exist, I cannot confirm their precise availability on PyPI through web search.

Verify that the pinned versions:

  • Exist and are installable on PyPI
  • Are intentionally chosen for compatibility/stability over the latest releases
  • Address any telemetry API removal objectives mentioned in the PR

Update to the latest versions if no breaking changes exist or if telemetry API fixes are included in the newer releases.

@leseb
Copy link
Copy Markdown
Collaborator Author

leseb commented Nov 17, 2025

Current CI failure is from trusty:

  File "/opt/app-root/lib64/python3.12/site-packages/llama_stack_provider_trustyai_fms/__init__.py", line 5, in <module>
    from llama_stack.apis.safety import Safety
ModuleNotFoundError: No module named 'llama_stack.apis'

@ruivieira PTAL

return None


def extract_llama_stack_version():
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a type hint here since there's a specific return type mentioned in the docstring?

@Elbehery
Copy link
Copy Markdown
Collaborator

@leseb can u rebase against main branch plz ?

@ruivieira
Copy link
Copy Markdown
Member

The following provider versions should fix the issue in #115 (comment)

The main branch of this repo will now use the midstream main branch to
install llama-stack.
llama-stack pyproject expects "uv" as the package manager.

This include fixes on the build/run files to remove the old telemetry
API.

Signed-off-by: Sébastien Han <seb@redhat.com>
@leseb leseb force-pushed the fixxx branch 2 times, most recently from ca8699b to 9d62f1c Compare November 28, 2025 09:53
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (5)
scripts/gen_distro_docs.py (1)

36-41: Add return type hint as previously requested.

The docstring describes the return type, but a type hint would improve IDE support and static analysis.

-def extract_llama_stack_version():
+def extract_llama_stack_version() -> tuple[str, str]:
     """Extract Llama Stack version and repo owner from the Containerfile.
distribution/build.py (2)

114-116: Verify llama stack list-deps accepts a file path argument.

Previous review flagged that llama stack list-deps may expect a distribution name rather than a file path. Ensure this command syntax works correctly with the distribution/build.yaml path.

#!/bin/bash
# Check list-deps command help to verify expected arguments
llama stack list-deps --help 2>&1 || echo "Command not available in sandbox"

# Check if distribution/build.yaml exists
cat distribution/build.yaml | head -5

180-187: Remove redundant sort operation.

The sorted(set(packages)) at line 187 is redundant since line 172 already performs the same deduplication and sorting. The pymilvus replacement at lines 181-186 only modifies existing strings and doesn't introduce duplicates.

             packages = [
                 package.replace("pymilvus", "pymilvus[milvus-lite]")
                 if "pymilvus" in package and "[milvus-lite]" not in package
                 else package
                 for package in packages
             ]
-            packages = sorted(set(packages))
distribution/run.yaml (2)

115-115: ⚠️ DUPLICATE: base_url default remains hardcoded to localhost (unresolved from past review).

This issue was flagged previously but remains unresolved. The hardcoded default http://localhost:8000/v1 assumes a local VLLM service and will silently misdirect requests if the actual VLLM instance is remote or unavailable locally.

Apply one of these fixes:

Option 1: Require the env var explicitly (no fallback):

-        base_url: ${env.VLLM_URL:=http://localhost:8000/v1}
+        base_url: ${env.VLLM_URL}

Option 2: Update deployment docs to require VLLM_URL override for non-local setups, and add a config validation check that fails loudly if VLLM_URL is unset.


235-235: ⚠️ DUPLICATE: model_id lacks a fallback default (unresolved from past review).

This issue was flagged previously but remains unresolved. If the INFERENCE_MODEL environment variable is undefined at runtime, the configuration will fail to load, breaking the inference service.

Apply this fix to add a sensible default:

-  model_id: ${env.INFERENCE_MODEL}
+  model_id: ${env.INFERENCE_MODEL:=llama-2-7b}

Alternatively, validate the env var at startup and fail with a clear error message if it is unset. Document the required env var in deployment documentation.

🧹 Nitpick comments (3)
distribution/Containerfile (2)

5-6: Inconsistent pip usage for sqlalchemy installation.

Line 5 correctly uses pip to bootstrap uv, but line 6 uses plain pip for sqlalchemy while all subsequent installs use uv pip. Consider using uv pip for consistency, unless there's a specific reason for the workaround mentioned in the comment.

 RUN pip install uv
-RUN pip install sqlalchemy # somehow sqlalchemy[asyncio] is not sufficient
+RUN uv pip install sqlalchemy  # somehow sqlalchemy[asyncio] is not sufficient

66-69: Stale comment references outdated version.

The TODO comment mentions "LLS client version is 0.3.0, while the server version is 0.3.0rc3+rhai0" but the build now uses main branch. Consider updating or removing this comment to reflect the current state.

 #TODO: remove this once we have a stable version of llama-stack
-# LLS server version is not aligned with the client version, so we disable the version check
-# Currently, LLS client version is 0.3.0, while the server version is 0.3.0rc3+rhai0
+# LLS server version is built from main branch; disable version check until stable release
 ENV LLAMA_STACK_DISABLE_VERSION_CHECK=true
distribution/build.py (1)

51-55: Update docstring to match generalized function.

The docstring still says "Check if llama binary is installed" but the function now accepts any package name.

 def check_package_installed(package_name):
-    """Check if llama binary is installed and accessible."""
+    """Check if a package/binary is installed and accessible."""
     if not shutil.which(package_name):
         print(f"Error: {package_name} not found. Please install it first.")
         sys.exit(1)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e3cf800 and 9d62f1c.

📒 Files selected for processing (8)
  • .pre-commit-config.yaml (1 hunks)
  • distribution/Containerfile (2 hunks)
  • distribution/Containerfile.in (1 hunks)
  • distribution/README.md (3 hunks)
  • distribution/build.py (6 hunks)
  • distribution/build.yaml (1 hunks)
  • distribution/run.yaml (6 hunks)
  • scripts/gen_distro_docs.py (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • distribution/Containerfile.in
  • .pre-commit-config.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-test-push (linux/amd64)
  • GitHub Check: Summary
🔇 Additional comments (13)
scripts/gen_distro_docs.py (3)

12-33: LGTM - Robust implementation with proper timeout and error handling.

The function correctly uses git ls-remote with a 10-second timeout to resolve the commit hash, and gracefully falls back to None on failure. The exception handling covers subprocess failures appropriately.


52-64: Good addition of main branch detection with commit hash resolution.

The logic correctly extracts the repo owner from the git URL pattern and attempts to resolve the actual commit hash for reproducibility. The fallback to "main" is sensible.


218-234: Solid commit hash detection logic.

The hex character check for 7-40 character strings correctly identifies commit hashes, and the conditional link generation properly handles all three cases (commit hash, "main" branch, version tag).

distribution/build.yaml (2)

19-28: Provider version updates align with recommended fixes.

These version bumps (trustyai_fms 0.3.0, lmeval 0.4.0, ragas 0.5.1) match the versions recommended by ruivieira in the PR comments to address the ModuleNotFoundError: No module named 'llama_stack.apis' CI failure.


36-40: Good expansion of tool_runtime providers.

Adding inline::rag-runtime and remote::model-context-protocol extends the distribution's capabilities.

distribution/README.md (1)

1-37: Auto-generated documentation is consistent with source files.

The README correctly reflects the updated provider versions from build.yaml and the commit-based version reference from the updated gen_distro_docs.py logic. The short hash display (57b22a0) links to the full commit URL as intended.

distribution/Containerfile (1)

52-62: Provider installations and llama-stack source install look correct.

Provider versions (lmeval 0.4.0, ragas 0.5.1, trustyai_fms 0.3.0) align with build.yaml. The llama-stack installation correctly uses @main (not @vmain as previously flagged).

distribution/build.py (2)

91-111: Good error handling in source installation.

The function properly captures both stdout and stderr, and provides informative error messages on failure. The shell=True usage is acceptable here since the command is constructed from controlled inputs.


264-269: Clean main() flow with proper sequencing.

The updated main function correctly:

  1. Checks for uv availability first
  2. Installs llama-stack from source
  3. Verifies llama is available before proceeding

This ensures prerequisites are met before dependency extraction.

distribution/run.yaml (4)

183-232: Verify storage section schema compatibility with llama-stack runtime.

The storage section introduces a centralized persistence configuration with multiple backends and stores. This is a major architectural change from inline kvstore blocks. Ensure that:

  1. The storage backend names (kv_agents, sql_agents, sql_files, etc.) are recognized by the llama-stack runtime version being installed.
  2. The backend types (kv_sqlite, sql_sqlite) and structure match the runtime's expectations.
  3. All referenced backends in provider persistence blocks (e.g., lines 67, 73, 86, 102, 139, 145) are defined in the backends section.
  4. The stores section (lines 218-232) aligns with the runtime's store configuration expectations.

Test with the actual llama-stack version to confirm the config loads without errors.


254-255: Verify telemetry enablement aligns with PR objective.

The PR description mentions "fixes in build/run files to remove the old telemetry API," but line 254-255 enables telemetry at the root level. Clarify whether:

  1. This represents a migration from per-provider kvstore telemetry (old API) to root-level telemetry (new API), or
  2. Telemetry should be disabled entirely per the PR objective.

If this is a migration, the changes are consistent. If telemetry should be disabled, update the configuration accordingly.


91-91: I'll help you verify this review comment by examining the relevant code and context. Let me start by gathering the necessary information.
<function_calls>


#!/bin/bash

First, let's read the relevant lines from distribution/run.yaml to verify the actual versions

cat -n distribution/run.yaml | sed -n '80,250p'


</function_calls>


15-21: I'm unable to access the repository through multiple methods (cloning and direct web access). However, based on the code snippet provided in the review comment (lines 15-21), I can provide important context:

The conditional syntax ${env.VLLM_URL:+vllm-inference} shown at line 15 is valid and does create the potential mismatch described: if VLLM_URL is unset, this expands to an empty string, meaning the provider would not be registered with a valid provider_id.

To complete verification, I need you to confirm:

  1. Does line 236 unconditionally reference provider_id: vllm-inference (as claimed)?
  2. Is there any conditional gating on the model configuration (lines 235-237) that mirrors the provider-side conditional?
  3. Does llama-stack runtime have error handling for missing provider references?

Without direct access to lines 235-237, I cannot definitively confirm whether this is a real critical issue or if both provider and model are conditionally gated together.

They include the support of the llama-stack-api.

Signed-off-by: Sébastien Han <seb@redhat.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9d62f1c and 65711f4.

📒 Files selected for processing (3)
  • distribution/Containerfile (2 hunks)
  • distribution/README.md (3 hunks)
  • distribution/build.yaml (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • distribution/Containerfile
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-test-push (linux/amd64)
  • GitHub Check: Summary
🔇 Additional comments (4)
distribution/build.yaml (2)

19-19: Provider version upgrades align with PR recommendations.

The bumped versions for llama_stack_provider_trustyai_fms (0.3.0), llama_stack_provider_lmeval (0.4.1), and llama_stack_provider_ragas (0.5.1) match the specific versions recommended in PR feedback to resolve the llama_stack.apis import issue.

Also applies to: 24-24, 26-26, 28-28


39-43: New sections support persistence and container metadata.

The additions—inline::rag-runtime, remote::model-context-protocol under tool_runtime, new files providers, and async database packages (aiosqlite, sqlalchemy[asyncio], asyncpg, psycopg2-binary)—are consistent with the broader shift to unified persistence backends mentioned in the enriched summary.

Also applies to: 45-49

distribution/README.md (2)

16-18: Provider versions in README are consistent with build.yaml.

The documentation correctly reflects all three upgraded providers:

  • inline::trustyai_ragas → 0.5.1
  • remote::trustyai_lmeval → 0.4.1
  • remote::trustyai_ragas → 0.5.1
  • remote::trustyai_fms → 0.3.0

Version synchronization is maintained across both configuration and documentation.

Also applies to: 27-27


1-1: I apologize—I'm encountering persistent infrastructure issues preventing access to the repository. All methods to retrieve the codebase, PR details, or file contents are failing with clone errors.

Given this constraint, I cannot:

  1. Inspect distribution/README.md to verify the current commit hash on line 7
  2. Examine scripts/gen_distro_docs.py to understand how it extracts the commit hash
  3. Check the PR diff to see if build.yaml was modified or if the README was regenerated
  4. Confirm whether the script was executed as part of this PR

Manual verification is required for this review comment.

The developer should:

  1. Confirm scripts/gen_distro_docs.py was executed locally and committed the updated distribution/README.md
  2. Verify the commit hash on line 7 matches the latest from the midstream main branch
  3. Check if any build.yaml changes warrant README regeneration

Accomodate
llamastack/llama-stack@d5cd0ee

Signed-off-by: Sébastien Han <seb@redhat.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
distribution/run.yaml (2)

115-115: Resolve hardcoded base_url default for trustyai_lmeval (prior review concern).

Line 115 hardcodes base_url: ${env.VLLM_URL:=http://localhost:8000/v1}, which silently misdirects requests if VLLM_URL is unset or points to a remote instance. Prior review flagged this as risky. Either require the env var explicitly or document that it must be overridden.

Apply this diff to require explicit configuration:

-        base_url: ${env.VLLM_URL:=http://localhost:8000/v1}
+        base_url: ${env.VLLM_URL:?VLLM_URL must be set}

If a localhost default is acceptable for your distribution, add a prominent note to deployment docs.


235-236: Provide a default value or validation for model_id (prior review concern).

Line 235 sets model_id: ${env.INFERENCE_MODEL} without a fallback. If the environment variable is undefined, inference will fail. Prior review flagged this as critical.

Apply this diff to add a sensible default:

- model_id: ${env.INFERENCE_MODEL}
+ model_id: ${env.INFERENCE_MODEL:=llama-2-7b}

Or enforce validation at startup and document the required env var.

🧹 Nitpick comments (1)
distribution/run.yaml (1)

183-232: Verify storage backend paths are writable in deployment environment.

The new storage section (lines 183–232) introduces explicit persistence backends with hardcoded SQLite paths under /opt/app-root/src/.llama/distributions/rh/. Verify:

  1. This directory is writable by the application process in your deployment (container, K8s, local)
  2. Database files have sufficient disk space allocation
  3. The base directory is preserved across container restarts (volume mount, persistent storage)

If running in non-writeable or ephemeral environments, these paths will fail silently or lose data on restart.

Consider:

  • Parameterizing the base path (e.g., ${env.LLAMA_STACK_DB_DIR:=/opt/app-root/src/.llama/distributions/rh})
  • Adding observability/logging on database initialization failure
  • Documenting volume/storage requirements in deployment guides
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 65711f4 and 6739e8d.

📒 Files selected for processing (1)
  • distribution/run.yaml (8 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-test-push (linux/amd64)
  • GitHub Check: Summary
🔇 Additional comments (2)
distribution/run.yaml (2)

100-108: Verify agent storage configuration aligns with expected concurrency.

Lines 100–108 configure agent storage with max_write_queue_size: 10000 and num_writers: 4. This suggests the distribution expects up to 10K pending writes and 4 concurrent writer threads. Verify this matches your deployment's agent workload; if agents are idle or rarely used, these settings waste resources; if agents are heavily used, they may be insufficient.

Add a comment documenting the rationale for these queue/writer settings, or cross-reference relevant load-test results.


254-255: Telemetry is now enabled by default.

Lines 254–255 set telemetry.enabled: true globally. This change aligns with the PR objective to update telemetry configuration. Ensure this is intentional for your distribution:

  • Document whether telemetry data is collected and where it is sent.
  • Consider adding an ${env.LLAMA_STACK_TELEMETRY_ENABLED:=true} binding if users should be able to opt out.

Confirm with your security/compliance team that telemetry collection is approved and properly disclosed to end users.

* Add batches API
* fix kvstore

Signed-off-by: Sébastien Han <seb@redhat.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
distribution/run.yaml (2)

90-96: Update provider versions to resolve CI failure per PR objectives.

The provider versions are outdated and do not match the recommended upgrades documented in the PR objectives. These upgrades are necessary to fix the CI error ModuleNotFoundError: No module named 'llama_stack.apis'.

Apply these diffs to pin the recommended versions:

  - provider_id: trustyai_fms
    provider_type: remote::trustyai_fms
-   module: llama_stack_provider_trustyai_fms==0.2.2
+   module: llama_stack_provider_trustyai_fms==0.3.0
    config:
      orchestrator_url: ${env.FMS_ORCHESTRATOR_URL:=}
  - provider_id: trustyai_lmeval
    provider_type: remote::trustyai_lmeval
-   module: llama_stack_provider_lmeval==0.2.4
+   module: llama_stack_provider_lmeval==0.4.0
    config:
  - provider_id: ${env.EMBEDDING_MODEL:+trustyai_ragas_inline}
    provider_type: inline::trustyai_ragas
-   module: llama_stack_provider_ragas.inline
+   module: llama_stack_provider_ragas.inline==0.5.1
    config:
      embedding_model: ${env.EMBEDDING_MODEL:=}
  - provider_id: ${env.KUBEFLOW_LLAMA_STACK_URL:+trustyai_ragas_remote}
    provider_type: remote::trustyai_ragas
-   module: llama_stack_provider_ragas.remote
+   module: llama_stack_provider_ragas.remote==0.5.1
    config:

Also applies to: 111-116, 117-125


241-245: Provide a default value for model_id.

The model_id is bound directly to ${env.INFERENCE_MODEL} without a fallback. If this environment variable is undefined at runtime, the configuration will fail to load or the model will be missing, breaking the inference service.

Apply this diff to add a safe default:

  models:
  - metadata: {}
-   model_id: ${env.INFERENCE_MODEL}
+   model_id: ${env.INFERENCE_MODEL:=llama-2-7b}
    provider_id: vllm-inference

Adjust the default model name to match your distribution's preferred model.

🧹 Nitpick comments (1)
distribution/run.yaml (1)

262-263: Telemetry is enabled by default; document privacy and compliance implications.

Telemetry is now centralized at the top level and enabled by default (telemetry.enabled: true). Operators should be aware that metrics collection is active and may need to:

  • Document this in privacy/compliance policies
  • Provide an environment variable override to disable if required (e.g., telemetry.enabled: ${env.TELEMETRY_ENABLED:=true})
  • Specify what telemetry data is collected and where it's sent

Consider adding an environment variable override so operators can disable telemetry if needed:

  telemetry:
-   enabled: true
+   enabled: ${env.TELEMETRY_ENABLED:=true}

Then document this env var in operational runbooks.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 69177c8 and 7a7b964.

📒 Files selected for processing (3)
  • distribution/README.md (2 hunks)
  • distribution/run.yaml (9 hunks)
  • tests/smoke.sh (1 hunks)
🧰 Additional context used
🪛 GitHub Actions: Pre-commit
distribution/README.md

[error] 1-1: Distribution Documentation hook (doc-gen) modified distribution/README.md. Please commit the changes to git; pre-commit check failed with exit code 1.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-test-push (linux/amd64)
  • GitHub Check: Summary
🔇 Additional comments (3)
tests/smoke.sh (1)

51-63: Endpoint path correction looks good.

Removing the duplicate /v1/openai prefix aligns with the standard OpenAI API path structure.

distribution/run.yaml (1)

62-89: Persistence backend restructuring looks sound.

The migration from inline kvstore to explicit persistence backends with dedicated stores is well-organized:

  • Consistent namespace conventions (e.g., vector_io::*, datasetio::*, agents::*)
  • Appropriate separation between key-value (kv_sqlite) and relational (sql_sqlite) backends
  • Explicit write queue and writer configuration for high-throughput stores (agents, inference)
  • All backends have clear db_path locations

Verify the following operational aspects before deployment:

  1. All backend databases are properly initialized at service startup (especially sql_* backends with table schemas).
  2. If migrating from a prior version with inline kvstore, ensure a migration/initialization path exists to populate the new backends from old data or safely reset.
  3. Database file permissions and ownership are correct for the container runtime user.

Also applies to: 139-190, 191-240

distribution/README.md (1)

1-38: Partial verification of version numbers; cannot confirm run.yaml comparison or pre-commit failure without repository access.

I can confirm from the README snippet provided that the version numbers cited in the review are correct:

  • trustyai_lmeval: 0.4.1 (line 18) ✓
  • trustyai_ragas: 0.5.1 (lines 17, 19) ✓
  • trustyai_fms: 0.3.0 (line 28) ✓

However, I cannot access the repository to verify the critical claims regarding:

  1. The pre-commit hook (doc-gen) failure mentioned in the review
  2. The alleged version discrepancies in run.yaml that the review references
  3. Whether run.yaml actually contains the versions claimed (0.2.2, 0.2.4, no pins)

The README is confirmed to be auto-generated (per the header comment), but without repository access, I cannot validate whether regeneration is needed or whether run.yaml requires updates to match the documented versions.

Comment on lines 114 to 117
config:
use_k8s: ${env.TRUSTYAI_LMEVAL_USE_K8S:=true}
base_url: ${env.VLLM_URL:=}
base_url: ${env.VLLM_URL:=http://localhost:8000/v1}
- provider_id: ${env.EMBEDDING_MODEL:+trustyai_ragas_inline}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash

# First, verify the current state of distribution/run.yaml around lines 114-117
echo "=== Current state of distribution/run.yaml (lines 110-120) ==="
sed -n '110,120p' distribution/run.yaml

echo ""
echo "=== Searching for all VLLM_URL references in the codebase ==="
rg -n "VLLM_URL" --type yaml --type py --type txt

echo ""
echo "=== Searching for any warnings or startup code related to VLLM ==="
rg -n -C2 "localhost:8000" --type yaml --type py

echo ""
echo "=== Check for documentation files mentioning VLLM configuration ==="
fd -e md -e txt -e rst | xargs rg -l "VLLM_URL" 2>/dev/null || echo "No documentation files found with VLLM_URL"

echo ""
echo "=== Search for any environment variable documentation or setup ==="
fd -e md -e txt -e rst | head -20 | xargs rg -l "environment.*variable" 2>/dev/null | head -5

Repository: opendatahub-io/llama-stack-distribution

Length of output: 222


I was unable to clone the repository, but I can work with the code snippet and context provided in the review comment. Let me analyze what we know:

The review comment provides the current code snippet at lines 114-117:

base_url: ${env.VLLM_URL:=http://localhost:8000/v1}

Key observations:

  1. The hardcoded localhost default is still present in the provided snippet—exactly as the original review flagged
  2. The original review noted this was supposedly "✅ Addressed in commit 6739e8d", yet the current code still shows the problematic pattern
  3. The concern is technically valid: hardcoded localhost fallbacks in distributed/containerized deployments can cause silent failures or misdirected traffic

The issue remains legitimate:

  • In non-local deployments (Kubernetes, cloud environments), an unset VLLM_URL will silently default to http://localhost:8000/v1
  • This creates difficult-to-diagnose failures when the service attempts to reach localhost instead of the intended remote VLLM service
  • The silent fallback behavior is a configuration anti-pattern in cloud-native applications

Since the code snippet provided in the review shows the issue still exists, the original review comment is appropriate and the concern should be addressed.


Require explicit VLLM_URL configuration to prevent silent misdirection in distributed deployments.

The base_url default of http://localhost:8000/v1 assumes a local VLLM service. In containerized or remote deployments, this fallback silently misdirects requests when VLLM_URL is unset, causing difficult-to-diagnose failures.

Recommended fix:

- base_url: ${env.VLLM_URL:=http://localhost:8000/v1}
+ base_url: ${env.VLLM_URL}

Then document that VLLM_URL must be set for the service to function. For local development, operators should explicitly set VLLM_URL=http://localhost:8000/v1.

Alternatively, add startup validation that fails early and logs a clear error if VLLM_URL is not configured.

🤖 Prompt for AI Agents
In distribution/run.yaml around lines 114-117, the base_url currently falls back
to a hardcoded localhost default (base_url:
${env.VLLM_URL:=http://localhost:8000/v1}), which can silently misdirect traffic
in non-local deployments; remove the localhost default so the value is required
(e.g., use the plain env substitution without a fallback), add startup
validation that checks VLLM_URL is set and logs a clear error and exits if
missing, and update docs/environment examples to state that VLLM_URL must be
provided (with localhost example only for explicit local development).

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7a7b964 and b8798d4.

📒 Files selected for processing (1)
  • distribution/README.md (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-test-push (linux/amd64)
  • GitHub Check: Summary
🔇 Additional comments (1)
distribution/README.md (1)

18-18: I'll help you verify this version discrepancy. Let me check the available versions on PyPI and investigate the PR context.
<function_calls>

llama-stack-provider-lmeval PyPI available versions 0.4.0 0.4.1

</function_calls>

Please provide the web search results, and I'll also need to check the repository for PR comments or documentation about the recommended version.

This image contains the official Open Data Hub Llama Stack distribution, with all the packages and configuration needed to run a Llama Stack server in a containerized environment.

The image is currently shipping with the Open Data Hub version of Llama Stack version [0.3.0rc3+rhai0](https://github.com/opendatahub-io/llama-stack/releases/tag/v0.3.0rc3+rhai0)
The image is currently shipping with the Open Data Hub version of Llama Stack version [4ac3141](https://github.com/opendatahub-io/llama-stack/commit/4ac31416bbea73c1805cc2a9be6d0c8779dd1fc2)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# 1. Verify the commit hash exists in opendatahub-io/llama-stack repository
echo "=== Checking commit hash 4ac3141 in opendatahub-io/llama-stack ==="
git ls-remote --heads https://github.com/opendatahub-io/llama-stack.git main | grep -o '^[a-f0-9]*'

# 2. Check current README file for lmeval version and commit hash
echo ""
echo "=== Checking distribution/README.md ==="
cd distribution 2>/dev/null && sed -n '7p;18p' README.md || sed -n '7p;18p' distribution/README.md

# 3. Check the generation script to understand how commit hash is determined
echo ""
echo "=== Checking scripts/gen_distro_docs.py ==="
if [ -f scripts/gen_distro_docs.py ]; then
  head -50 scripts/gen_distro_docs.py | grep -i "commit\|4ac31"
fi

Repository: opendatahub-io/llama-stack-distribution

Length of output: 222


🌐 Web query:

opendatahub-io/llama-stack commit 4ac31416bbea73c1805cc2a9be6d0c8779dd1fc2

💡 Result:

I couldn't find that commit in the opendatahub-io/llama-stack GitHub repo. I searched the opendatahub-io org and the Llama Stack repositories (official llamastack/llama-stack and forks) but did not locate commit 4ac31416bbea73c1805cc2a9be6d0c8779dd1fc2. Sources checked: opendatahub org and Llama Stack repositories. [1][2]

If you want, I can:

  • search more widely (other forks) for that exact commit hash, or
  • fetch the commit details if you give me the exact repository URL (e.g. https://github.com//llama-stack).

🌐 Web query:

lmeval version 0.4.0 0.4.1 llama stack

💡 Result:

Summary and pointers

  • LMEval (Google) — open‑source large‑model evaluation framework announced May 2025. See Google/LMEval repo/news. [1]

  • lm-eval / lm-evaluation-harness (EleutherAI) remains a widely‑used evaluation harness (separate project; PyPI/package “lmeval‑trunk” mirrors exist). Releases and notes (including v0.4.x series references) are on PyPI/GitHub. [2][3]

  • Llama Stack integration — Llama Stack supports external eval providers and there is an out‑of‑tree LMEval provider package (llama-stack-provider-lmeval / TrustyAI) that shows how to wire LM‑Eval into a Llama Stack server. Use that provider to run LMEval-style benchmarks from Llama Stack. [4][5]

Useful links (quick):

  1. Google LMEval announcement / repo coverage. [1]
  2. lm‑evaluation / lmeval‑trunk (PyPI / project page). [2][3]
  3. Llama Stack (docs / GitHub). [4]
  4. TrustyAI / llama‑stack‑provider‑lmeval (how to connect LMEval to Llama Stack). [5]

If you want, I can:

  • fetch the exact changelog entries for lmeval (0.4.0 → 0.4.1) and show breaking/bugfix notes; or
  • give step‑by‑step commands to install and use llama‑stack-provider-lmeval with a specific lmeval version. Which would you like?

Based on the web search results, I've found a critical issue: The commit hash 4ac31416bbea73c1805cc2a9be6d0c8779dd1fc2 could not be found in the opendatahub-io/llama-stack repository. The search checked the opendatahub organization and multiple Llama Stack repositories but did not locate this commit.

This is a significant finding because:

  1. It mirrors the past issue where commit hash 57b22a07 was also invalid
  2. The file is auto-generated, so either the generation script has a bug or the source data is incorrect
  3. The README contains an unverifiable/broken reference

Let me generate the rewritten review comment based on this verification:


The commit hash in line 7 appears to be invalid and cannot be verified in the opendatahub-io/llama-stack repository.

The commit 4ac31416bbea73c1805cc2a9be6d0c8779dd1fc2 (shown as 4ac3141) was not found in opendatahub-io/llama-stack main branch. This mirrors the previous review issue with commit hash 57b22a07. Since this file is auto-generated via scripts/gen_distro_docs.py, verify that:

  1. The generation script correctly resolves the commit SHA from opendatahub-io/llama-stack main
  2. The source data (build.yaml or run.yaml) contains a valid commit hash that exists in that repository
  3. The script doesn't have a bug in how it retrieves or formats the commit reference
🤖 Prompt for AI Agents
distribution/README.md around line 7: the README references commit
4ac31416bbea73c1805cc2a9be6d0c8779dd1fc2 which cannot be found in the
opendatahub-io/llama-stack repo; update the generation pipeline to verify and
replace this broken SHA by modifying scripts/gen_distro_docs.py to (1) fetch the
commit SHA directly from the opendatahub-io/llama-stack main branch (or validate
the SHA via the repo API) before writing the README, (2) fall back to a tag or
short meaningful identifier if the exact commit is missing, and (3) add
validation that build.yaml/run.yaml contain a valid, resolvable commit;
regenerate distribution/README.md with the corrected, verifiable reference.

@leseb leseb force-pushed the fixxx branch 3 times, most recently from f6dfab9 to b95cba3 Compare November 28, 2025 14:40
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
tests/run_integration_tests.sh (1)

49-49: Confirm that unfiltered tag listing is acceptable.

The tag listing changed from a filtered grep (likely grep "^v") to an unfiltered tail -10. While this limits output and is acceptable for debugging, it may occasionally surface non-version tags if they exist in the repository. Consider adding a comment explaining this choice if it's intentional.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ac7535f and b95cba3.

📒 Files selected for processing (3)
  • .github/workflows/redhat-distro-container.yml (1 hunks)
  • tests/run_integration_tests.sh (2 hunks)
  • tests/smoke.sh (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • .github/workflows/redhat-distro-container.yml
  • tests/smoke.sh
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-test-push (linux/amd64)
  • GitHub Check: Summary
🔇 Additional comments (2)
tests/run_integration_tests.sh (2)

41-51: Branch and tag checkout logic is sound.

The conditional correctly differentiates between "main" branch and version-tagged releases, and the error message now appropriately references the intended checkout target. This aligns well with the PR's shift to support both midstream main and versioned releases.


72-72: Based on the web search results, I can now provide analysis on the review comment's concerns:

Web search findings:

  1. uv availability (Concern 1 - VALID): uv is production-ready but requires explicit setup in CI/CD environments. GitHub Actions requires the astral-sh/setup-uv action or manual installation—it's not automatically available. The review comment correctly identifies this as a dependency that must be verified in the test environment configuration.

  2. llama-stack-client compatibility (Concern 2 - VALID): Web search confirms that llama-stack-client (Python SDK) should be pinned to specific tested versions. The official guidance warns against loose version pinning, noting that non-major releases may introduce breaking changes. Removing the hard-coded 0.3.0 constraint increases the risk of runtime incompatibilities.

Limitation: I cannot verify the actual code changes in the PR (whether version pinning was actually removed) due to repository access failure. The review comment's concerns are substantively sound based on industry best practices, but confirmation requires access to the actual PR diff.

Verify uv availability and llama-stack-client compatibility by confirming:

  • The test environment (container, GitHub Actions runner) includes uv installation setup (e.g., via astral-sh/setup-uv@v7 action or manual install).
  • The llama-stack version in pyproject.toml is tested with the unrestricted llama-stack-client version, or a compatible version constraint is documented.

@leseb leseb force-pushed the fixxx branch 2 times, most recently from 6af7265 to e8ab102 Compare November 28, 2025 15:12
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
.github/workflows/redhat-distro-container.yml (1)

76-76: Verify that opendatahub-io/llama-stack fork is actively maintained and stays in sync with upstream.

This change shifts the repository source from the official upstream llamastack/llama-stack to the opendatahub-io fork. While this aligns with the PR objective to use midstream main, verify that:

  • The fork is actively maintained and regularly synced with upstream changes
  • Releases and tags follow the same versioning conventions as upstream
  • This doesn't introduce unexpected behavioral divergences in builds or tests

I can help verify the fork's maintenance status and tag consistency. Would you like me to check the fork's recent commit history and tag alignment with upstream?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b95cba3 and e8ab102.

📒 Files selected for processing (3)
  • .github/workflows/redhat-distro-container.yml (2 hunks)
  • tests/run_integration_tests.sh (3 hunks)
  • tests/smoke.sh (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/smoke.sh
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-test-push (linux/amd64)
  • GitHub Check: Summary
🔇 Additional comments (3)
.github/workflows/redhat-distro-container.yml (1)

47-47: Verify embedding model naming is consistent across the codebase.

The environment variable changed from granite-embedding-125m to granite-embedding-125m-english. Ensure this matches:

  • The model identifier expected by the inference/embedding provider
  • All downstream references in test scripts (e.g., tests/run_integration_tests.sh line 80 and similar)
tests/run_integration_tests.sh (2)

42-52: Verify version checkout logic handles edge cases in tag naming.

The enhanced clone_llama_stack() function now conditionally prepends "v" to version strings based on whether the version is "main". This assumes:

  • Tags in opendatahub-io/llama-stack follow a v<version> convention
  • "main" branch is present and tracked as main (not master or other variants)

Verify that the opendatahub-io fork follows these naming conventions. The error fallback (lines 47–52) will show the last 10 tags, but consider adding a comment explaining the expected tag format for future maintainers.


73-81: Verify uv commands and embedding model path format are correct for the test environment.

This section migrates from direct pip/pytest invocations to using uv venv, uv pip install, and uv run pytest. Key verification points:

  1. uv availability: The workflow installs uv (line 59), so it should be available. Confirm uv venv creates .venv in the expected location.
  2. Embedding model path (line 80): Changed from bare $EMBEDDING_MODEL to sentence-transformers/$EMBEDDING_MODEL. This assumes the embedding provider expects HuggingFace model paths with the sentence-transformers/ namespace. If the inference service expects a different format (e.g., just the model name, or a different registry), tests will fail with "model not found" errors.

Please verify:

  • Does the inference/embedding service expect the sentence-transformers/<model-name> path format?
  • Are there any test runs or logs that confirm this path format works with the chosen embedding model?

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
distribution/run.yaml (3)

116-116: ⚠️ Hardcoded localhost default for VLLM_URL creates silent misdirection in distributed deployments.

This issue was flagged in a previous review but remains unfixed. The base_url falls back to http://localhost:8000/v1 when VLLM_URL is unset, which causes silent failures in containerized/cloud deployments.

Recommended fix:
Remove the localhost fallback so VLLM_URL is explicitly required:

- base_url: ${env.VLLM_URL:=http://localhost:8000/v1}
+ base_url: ${env.VLLM_URL}

Then document that VLLM_URL must be set for the service to function. For local development, operators should explicitly set the environment variable.


92-92: 🔴 Critical: Provider versions do not match PR recommendations; CI failures will persist.

The provider versions remain outdated and do not match the recommended upgrades from the PR objectives:

Provider Current Recommended Reason
trustyai_fms (line 92) 0.2.2 0.3.0 Resolves ModuleNotFoundError
trustyai_lmeval (line 113) 0.2.4 0.4.0 Resolves ModuleNotFoundError
ragas inline (line 119) unpinned 0.5.1 Resolves ModuleNotFoundError
ragas remote (line 124) unpinned 0.5.1 Resolves ModuleNotFoundError

These versions were explicitly recommended by @ruivieira to resolve the CI failure: ModuleNotFoundError: No module named 'llama_stack.apis'. Using older versions will perpetuate these failures.

Apply this diff to pin the recommended versions:

  - provider_id: trustyai_fms
    provider_type: remote::trustyai_fms
-   module: llama_stack_provider_trustyai_fms==0.2.2
+   module: llama_stack_provider_trustyai_fms==0.3.0
    config:
  - provider_id: trustyai_lmeval
    provider_type: remote::trustyai_lmeval
-   module: llama_stack_provider_lmeval==0.2.4
+   module: llama_stack_provider_lmeval==0.4.0
    config:
  - provider_id: ${env.EMBEDDING_MODEL:+trustyai_ragas_inline}
    provider_type: inline::trustyai_ragas
-   module: llama_stack_provider_ragas.inline
+   module: llama_stack_provider_ragas.inline==0.5.1
    config:
  - provider_id: ${env.KUBEFLOW_LLAMA_STACK_URL:+trustyai_ragas_remote}
    provider_type: remote::trustyai_ragas
-   module: llama_stack_provider_ragas.remote
+   module: llama_stack_provider_ragas.remote==0.5.1
    config:

Also applies to: 113-113, 119-119, 124-124


242-245: 🔴 Critical: INFERENCE_MODEL lacks a default fallback value.

The model_id is bound directly to ${env.INFERENCE_MODEL} without a default. If this environment variable is unset at runtime, model registration will fail and inference will be unavailable, breaking the service.

Apply this diff to add a safe default:

- model_id: ${env.INFERENCE_MODEL}
+ model_id: ${env.INFERENCE_MODEL:=Qwen/Qwen3-0.6B}

Choose a sensible default model for your distribution, or ensure deployment documentation explicitly requires INFERENCE_MODEL to be set. This is especially critical given the PR objectives documented embedding model registration failures—ensuring proper model defaults helps prevent silent failures.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e8ab102 and 4177ac6.

📒 Files selected for processing (4)
  • .github/workflows/redhat-distro-container.yml (2 hunks)
  • distribution/run.yaml (8 hunks)
  • tests/run_integration_tests.sh (3 hunks)
  • tests/smoke.sh (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/workflows/redhat-distro-container.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-test-push (linux/amd64)
  • GitHub Check: Summary
🔇 Additional comments (13)
tests/run_integration_tests.sh (3)

8-8: ✅ LGTM — Embedding model environment configuration.

The default aligns with the distribution configuration and allows runtime override via environment variable.


42-52: ✅ LGTM — Branch/tag checkout logic for main and version targets.

The conditional correctly handles main branch and tagged releases with appropriate error messaging.


73-80: ✅ LGTM — Virtual environment setup and uv package manager usage.

The uv venv, uv pip install, and uv run commands follow correct patterns. Shellcheck directive suppresses false positive on source command.

tests/smoke.sh (3)

38-53: ✅ LGTM — Model list verification loop.

Tests that both inference and embedding models are registered. Error handling correctly propagates failure via return 1 which is checked in main().


58-58: ✅ LGTM — Endpoint path and model format alignment.

The endpoint simplification and model format vllm-inference/$INFERENCE_MODEL are consistent with the provider configuration in distribution/run.yaml.


74-77: ✅ LGTM — Main function error handling and flow control.

The check for test_model_list failure prevents cascading errors and exits with clear status.

distribution/run.yaml (7)

5-5: ✅ LGTM — Batches API addition to exposed APIs.

New API endpoint correctly registered in the api list.


19-19: ✅ LGTM — Field naming standardization to base_url.

Both vllm-inference and watsonx providers correctly updated to use base_url field. The vllm-inference provider correctly requires explicit VLLM_URL env var (empty default at line 19).

Also applies to: 42-42


67-88: ✅ LGTM — Persistence backend configurations for vector providers.

Milvus (inline/remote) and FAISS now explicitly declare their persistence backends with proper namespace isolation. Backends are correctly defined in the storage section.


101-109: ✅ LGTM — Agent state and responses persistence configuration.

Agent state uses KV store while responses use SQL store with appropriate concurrency controls (max_write_queue_size, num_writers). Backends correctly mapped in storage section.


140-147: ✅ LGTM — Datasetio provider persistence backends.

HuggingFace and LocalFS providers correctly configured with isolated KV store namespaces.


192-240: ✅ LGTM — Storage backends, stores, and telemetry configuration.

Storage backends are properly defined with appropriate database paths and namespaces. Stores section correctly configures metadata, inference, conversations, and prompts. Telemetry enabled at top level (consistent with removal of old telemetry API per PR objectives).

Also applies to: 265-265


184-190: ✅ LGTM — Batches provider configuration.

New batches provider correctly configured with reference type and kv_default backend.

Signed-off-by: Sébastien Han <seb@redhat.com>
We're running integrations test here that we're not
ever run upstream using vllm and qwen. Disablign whats failing
to allow us to crerate an image, while we test each one with a
view to re-enabling it if possible.

Signed-off-by: Derek Higgins <derekh@redhat.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (5)
distribution/run.yaml (5)

92-92: Provider versions do not match PR recommendations—CI failure risk persists.

The trustyai_fms and llama_stack_provider_lmeval providers have updated versions available, but the configuration still pins outdated versions despite past review recommendations and PR objectives citing ruivieira's suggestion to upgrade these packages to resolve the "ModuleNotFoundError: No module named 'llama_stack.apis'" CI failure.

Current state (incorrect):

  • Line 92: llama_stack_provider_trustyai_fms==0.2.2 → should be 0.3.0
  • Line 113: llama_stack_provider_lmeval==0.2.4 → should be 0.4.0
  • Lines 119, 124: llama_stack_provider_ragas (no version pinning) → should pin to 0.5.1

Apply these diffs to resolve the CI failure:

   - provider_id: trustyai_fms
     provider_type: remote::trustyai_fms
-    module: llama_stack_provider_trustyai_fms==0.2.2
+    module: llama_stack_provider_trustyai_fms==0.3.0
     config:
       orchestrator_url: ${env.FMS_ORCHESTRATOR_URL:=}
   - provider_id: trustyai_lmeval
     provider_type: remote::trustyai_lmeval
-    module: llama_stack_provider_lmeval==0.2.4
+    module: llama_stack_provider_lmeval==0.4.0
     config:
   - provider_id: ${env.EMBEDDING_MODEL:+trustyai_ragas_inline}
     provider_type: inline::trustyai_ragas
-    module: llama_stack_provider_ragas.inline
+    module: llama_stack_provider_ragas.inline==0.5.1
     config:
   - provider_id: ${env.KUBEFLOW_LLAMA_STACK_URL:+trustyai_ragas_remote}
     provider_type: remote::trustyai_ragas
-    module: llama_stack_provider_ragas.remote
+    module: llama_stack_provider_ragas.remote==0.5.1
     config:

Also applies to: 113-113, 119-119, 124-124


116-116: Hardcoded localhost default for trustyai_lmeval base_url is problematic in distributed deployments.

Line 116 sets base_url: ${env.VLLM_URL:=http://localhost:8000/v1} with a hardcoded localhost fallback. In containerized or Kubernetes deployments, this silently misdirects requests when VLLM_URL is unset, causing difficult-to-diagnose failures.

Recommended fix:

-        base_url: ${env.VLLM_URL:=http://localhost:8000/v1}
+        base_url: ${env.VLLM_URL}

Document that VLLM_URL must be explicitly configured, or add startup validation that fails early with a clear error message if the variable is unset.


250-252: Embedding model remains hardcoded despite test scripts expecting dynamic configuration.

Lines 250–252 hardcode the embedding model to granite-embedding-125m, but the test scripts (tests/run_integration_tests.sh line 80 and tests/smoke.sh line 38) expect to dynamically configure this via the EMBEDDING_MODEL environment variable. This inconsistency creates confusion and limits testing flexibility.

Recommended fix:

  - metadata:
      embedding_dimension: 768
-   model_id: granite-embedding-125m
+   model_id: ${env.EMBEDDING_MODEL:=granite-embedding-125m}
    provider_id: sentence-transformers
-   provider_model_id: ibm-granite/granite-embedding-125m-english
+   provider_model_id: ${env.EMBEDDING_PROVIDER_MODEL_ID:=ibm-granite/granite-embedding-125m-english}
    model_type: embedding

Alternatively, update test scripts to match the hardcoded configuration if dynamic embedding model selection is not intended.


116-116: Hardcoded localhost default for trustyai_lmeval base_url is problematic in distributed deployments.

Line 116 sets base_url: ${env.VLLM_URL:=http://localhost:8000/v1} with a hardcoded localhost fallback. In containerized or Kubernetes deployments, this silently misdirects requests when VLLM_URL is unset, causing difficult-to-diagnose failures.

Recommended fix:

-        base_url: ${env.VLLM_URL:=http://localhost:8000/v1}
+        base_url: ${env.VLLM_URL}

Document that VLLM_URL must be explicitly configured, or add startup validation that fails early with a clear error message if the variable is unset.


250-252: Embedding model remains hardcoded despite test scripts expecting dynamic configuration.

Lines 250–252 hardcode the embedding model to granite-embedding-125m, but test scripts (tests/run_integration_tests.sh line 80 and tests/smoke.sh line 38) expect to dynamically configure this via the EMBEDDING_MODEL environment variable. This inconsistency creates confusion and limits testing flexibility.

Recommended fix:

  - metadata:
      embedding_dimension: 768
-   model_id: granite-embedding-125m
+   model_id: ${env.EMBEDDING_MODEL:=granite-embedding-125m}
    provider_id: sentence-transformers
-   provider_model_id: ibm-granite/granite-embedding-125m-english
+   provider_model_id: ${env.EMBEDDING_PROVIDER_MODEL_ID:=ibm-granite/granite-embedding-125m-english}
    model_type: embedding

Alternatively, update test scripts to match the hardcoded configuration if dynamic embedding model selection is not intended.

🧹 Nitpick comments (2)
tests/run_integration_tests.sh (2)

42-52: Clarify fallback behavior when checkout fails.

When git checkout fails, the script shows the last 10 tags without v-prefix filtering. This is reasonable for debugging but consider adding guidance in the error message about expected tag format or pointing users to the Containerfile for version specifications.

     if ! git checkout "$checkout_to"; then
         echo "Error: Could not checkout $checkout_to"
         echo "Available tags:"
+        echo "(Expected format: main or vVERSION)"
         git tag | tail -10
         exit 1
     fi

42-52: Clarify fallback behavior when checkout fails.

When git checkout fails, the script shows the last 10 tags without v-prefix filtering. This is reasonable for debugging but consider adding guidance in the error message about expected tag format.

     if ! git checkout "$checkout_to"; then
         echo "Error: Could not checkout $checkout_to"
         echo "Available tags:"
+        echo "(Expected format: main or vVERSION)"
         git tag | tail -10
         exit 1
     fi
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4177ac6 and 31f1cab.

📒 Files selected for processing (4)
  • .github/workflows/redhat-distro-container.yml (2 hunks)
  • distribution/run.yaml (8 hunks)
  • tests/run_integration_tests.sh (4 hunks)
  • tests/smoke.sh (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/workflows/redhat-distro-container.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build-test-push (linux/amd64)
  • GitHub Check: Summary
🔇 Additional comments (17)
tests/run_integration_tests.sh (4)

73-82: Adopt uv for test environment aligns with PR objectives.

Using uv venv and uv pip install within the test environment correctly implements the use of uv as the package manager, consistent with midstream llama-stack main branch expectations.


67-71: Good defensive check for configuration file.

The explicit verification that $STACK_CONFIG_PATH exists before proceeding prevents cryptic errors downstream and improves debuggability.


73-82: Adopt uv for test environment aligns with PR objectives.

Using uv venv and uv pip install within the test environment correctly implements the use of uv as the package manager, consistent with midstream llama-stack main branch expectations.


67-71: Good defensive check for configuration file.

The explicit verification that $STACK_CONFIG_PATH exists before proceeding prevents cryptic errors downstream and improves debuggability.

tests/smoke.sh (6)

38-54: Enhanced test coverage for both inference and embedding models.

Testing both INFERENCE_MODEL and EMBEDDING_MODEL via a loop ensures both models are available before running inference tests. The loop structure with continue on success and return 1 on failure is correct.


58-58: API endpoint and model naming follow provider-based routing.

The endpoint change from /v1/openai/v1/chat/completions to /v1/chat/completions removes the duplicate path segment, and the model prefix vllm-inference/$INFERENCE_MODEL matches the provider-based model routing pattern. Verify that the vllm-inference provider is registered in the run.yaml configuration.

Confirm that run.yaml defines a vllm-inference provider with proper endpoint configuration.


74-77: Fail-fast behavior for model availability check improves test robustness.

The guard ensures that if models are unavailable, integration tests don't proceed. This prevents confusing cascading failures and makes it clear which component failed.


38-54: Enhanced test coverage for both inference and embedding models.

Testing both INFERENCE_MODEL and EMBEDDING_MODEL ensures both models are available before running inference tests. The loop structure with continue on success and return 1 on failure is correct.


58-58: API endpoint and model naming follow provider-based routing.

The endpoint change from /v1/openai/v1/chat/completions to /v1/chat/completions removes the duplicate path segment, and the model prefix vllm-inference/$INFERENCE_MODEL matches the provider-based model routing pattern.

Confirm that run.yaml defines a vllm-inference provider with proper endpoint configuration.


74-77: Fail-fast behavior for model availability improves test robustness.

The guard ensures that if models are unavailable, integration tests don't proceed, preventing confusing cascading failures.

distribution/run.yaml (7)

184-190: Batches provider configuration aligns with new API entry.

The batches provider uses kv_default backend with batches namespace, which is consistent with the addition of the batches API on line 5. The storage backend is properly wired.


192-240: Storage backends migration to explicit persistence configuration improves clarity.

The introduction of explicit storage backends (kv_default, kv_agents, sql_agents, etc.) with dedicated database paths replaces ad-hoc inline kvstore definitions. This improves maintainability and makes it easier to customize storage per deployment. Each backend is wired to a specific SQLite database, which is appropriate for the default standalone distribution.


264-265: Telemetry moved to top-level configuration.

The telemetry block is now at the top level with enabled: true, removing inline telemetry definitions. This aligns with the PR objective to "remove the old telemetry API" and centralizes observability configuration.


184-190: Batches provider configuration aligns with new API entry.

The batches provider uses kv_default backend with batches namespace, which is consistent with the addition of the batches API on line 5. The configuration is properly wired.


192-240: Storage backends migration improves clarity and maintainability.

The introduction of explicit storage backends (kv_default, kv_agents, sql_agents, etc.) with dedicated database paths replaces ad-hoc inline kvstore definitions. This improves maintainability and makes it easier to customize storage per deployment.


264-265: Telemetry moved to top-level configuration.

The telemetry block is now at the top level with enabled: true, removing inline telemetry definitions. This aligns with the PR objective to "remove the old telemetry API" and centralizes observability configuration.


19-19: VLLM_URL default handling allows misconfiguration.

Line 19 sets base_url: ${env.VLLM_URL:=} with an empty default. If VLLM_URL is unset, the base URL becomes an empty string, which would likely cause failures when the inference client attempts to establish a connection at runtime.

Recommended fix:
Remove the empty default to require explicit VLLM_URL configuration:

-      base_url: ${env.VLLM_URL:=}
+      base_url: ${env.VLLM_URL}

This makes VLLM_URL mandatory rather than allowing silent misconfiguration. Ensure the configuration loading system provides clear error messages when required environment variables are missing.

provider_type: remote::vllm
config:
url: ${env.VLLM_URL:=}
base_url: ${env.VLLM_URL:=}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, explore the repository structure to understand the project layout
git ls-files | head -20

Repository: opendatahub-io/llama-stack-distribution

Length of output: 222


🏁 Script executed:

# Search for VLLM_URL references in the codebase
rg "VLLM_URL" -n --max-count=50

Repository: opendatahub-io/llama-stack-distribution

Length of output: 222


🏁 Script executed:

# Look for configuration validation logic
fd "config" -type f | grep -E "\.(py|yaml)$" | head -20

Repository: opendatahub-io/llama-stack-distribution

Length of output: 222


🌐 Web query:

llama-stack VLLM_URL environment variable configuration validation

💡 Result:

Summary — what VLLM_URL should be and how to validate it for llama-stack

Quick validation checklist

  1. Environment variable present
    • Shell: test -n "$VLLM_URL" || echo "VLLM_URL not set". [2]
  2. Syntax: valid URL with http or https scheme
    • Quick test (bash + Python): python -c "import sys; from urllib.parse import urlparse; u=urlparse(sys.argv[1]); print(u.scheme in ('http','https') and u.hostname is not None)" "$VLLM_URL"
    • Or use curl --head to see if it responds. Example: curl -sS -I "$VLLM_URL" . [1][3]
  3. API path and port: ensure the URL points to the vLLM API endpoint your deployment exposes (some setups expose at root, others at /v1). Try a simple request (GET /models or other health route configured by your vLLM server). Example: curl -sS "$VLLM_URL"/models (adjust path if your server serves root). Check HTTP 200/401/403 as appropriate. [1][3]
  4. TLS verification: if using https and custom CA, llama-stack remote::vllm supports tls_verify as bool or CA path (VLLM_TLS_VERIFY). If using self-signed certs, set VLLM_TLS_VERIFY to the CA path or false (not recommended). [1]
  5. Auth token: if vLLM deployment requires a token, configure VLLM_API_TOKEN (remote::vllm api_token) and test an authenticated request. [1][2]

Minimal test commands

  • Presence: echo "${VLLM_URL:-not-set}"
  • Basic HTTP check: curl -fsS "$VLLM_URL" -o /dev/null && echo "OK" || echo "no response"
  • With path: curl -fsS "${VLLM_URL%/}/models" (replace /models with your health endpoint)

References

  • llama-stack remote::vllm provider config and env examples. [1]
  • Llama Stack configuration / env var substitution details (how ${env.VARIABLE:=} is used). [2]
  • vLLM docs showing llama-stack integration and example URL. [3]

🌐 Web query:

llama-stack yaml configuration error handling startup validation

💡 Result:

Short checklist + recommended patterns for handling YAML config errors and startup validation in Llama Stack:

  1. Let the CLI/server validate at startup

    • llama-stack performs schema checks when loading run.yaml and will fail startup on missing/invalid runtime fields. Run via llama stack run (or container) and inspect startup error output. [1][2]
  2. Validate locally before launching

    • Use a YAML linter + jsonschema/Pydantic against the known Llama Stack config schema. Treat the official example run.yaml as the schema source of truth and validate any env-var substitutions before runtime. [1][4]
  3. Handle environment-variable substitution failures

    • Llama Stack supports ${env.VAR} and ${env.VAR:=default}; undefined required env vars will cause startup errors. Ensure CI/deployment sets required env vars or provide safe defaults. [1]
  4. Fail-fast & clear logging

    • Configure the server to fail-fast (default) so configuration errors surface in logs. Capture server stdout/stderr (or Pod logs) to get full validation messages. For Kubernetes, use a ConfigMap for run.yaml so updates restart the pod and logs show validation errors. [2][3]
  5. Add runtime sanity checks (optional)

    • Add a small startup script that: (a) substitutes env vars, (b) runs jsonschema or a Pydantic model validation of the final YAML, (c) prints human-friendly errors and exits non-zero if invalid. Use jsonschema for full instance-schema validation or Pydantic for typed checks. [4]
  6. Testing and CI

    • Include a CI job that lints and validates the YAML (including env substitution with test values) and rejects merges with schema or missing-field errors.
  7. Debugging steps when startup fails

      1. Inspect server/Pod logs for validation trace. 2) Run llama stack run <config> locally to reproduce. 3) Check for missing env vars or incorrect provider blocks (provider_id/provider_type). 4) If using external providers, confirm module/provider files are reachable. [1][2][3]

Useful references:

  • Llama Stack configuration (examples, env-var substitution): Llama Stack docs. [1]
  • llama stack run behavior and restarting with new configs (CLI/Kubernetes notes). [2]
  • Using ConfigMap / external provider config in k8s (restarts on update). [3]
  • Use jsonschema/Pydantic for custom validation; Pydantic does not validate JSON Schema itself—use jsonschema when appropriate. [4]

If you want, tell me your run.yaml (redact secrets) and I can validate it and show exact startup-fail fixes.


VLLM_URL empty default can cause runtime connection failures rather than startup validation.

Line 19 sets base_url: ${env.VLLM_URL:=} with an empty default. If VLLM_URL is unset, the base URL becomes an empty string, which passes YAML schema validation but causes connection failures during inference execution. While llama-stack performs startup validation, an empty string is syntactically valid and may not trigger provider-specific validation errors, deferring failure to runtime.

Recommended fix:
Remove the empty default to require explicit VLLM_URL configuration:

-      base_url: ${env.VLLM_URL}
+      base_url: ${env.VLLM_URL}

This ensures startup fails immediately if VLLM_URL is missing, rather than silently accepting an empty base URL and failing later during inference attempts.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In distribution/run.yaml around line 19 the environment expansion sets base_url:
${env.VLLM_URL:=} which supplies an empty default, allowing an empty string to
pass validation and cause runtime connection failures; remove the empty default
so the variable is required (use ${env.VLLM_URL} or equivalent) so startup will
fail when VLLM_URL is not set, preventing later inference-time connection
errors.

- provider_id: trustyai_fms
provider_type: remote::trustyai_fms
module: llama_stack_provider_trustyai_fms
module: llama_stack_provider_trustyai_fms==0.2.2
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Provider versions do not match PR recommendations—CI failure risk persists.

The configuration still pins outdated versions despite PR objectives citing ruivieira's recommendation to upgrade these packages to resolve the "ModuleNotFoundError: No module named 'llama_stack.apis'" CI failure.

Current state (incorrect):

  • Line 92: llama_stack_provider_trustyai_fms==0.2.2 → should be 0.3.0
  • Line 113: llama_stack_provider_lmeval==0.2.4 → should be 0.4.0
  • Lines 119, 124: llama_stack_provider_ragas (no version pinning) → should pin to 0.5.1

Apply these diffs to resolve the CI failure:

   - provider_id: trustyai_fms
     provider_type: remote::trustyai_fms
-    module: llama_stack_provider_trustyai_fms==0.2.2
+    module: llama_stack_provider_trustyai_fms==0.3.0
     config:
       orchestrator_url: ${env.FMS_ORCHESTRATOR_URL:=}
   - provider_id: trustyai_lmeval
     provider_type: remote::trustyai_lmeval
-    module: llama_stack_provider_lmeval==0.2.4
+    module: llama_stack_provider_lmeval==0.4.0
     config:
   - provider_id: ${env.EMBEDDING_MODEL:+trustyai_ragas_inline}
     provider_type: inline::trustyai_ragas
-    module: llama_stack_provider_ragas.inline
+    module: llama_stack_provider_ragas.inline==0.5.1
     config:
   - provider_id: ${env.KUBEFLOW_LLAMA_STACK_URL:+trustyai_ragas_remote}
     provider_type: remote::trustyai_ragas
-    module: llama_stack_provider_ragas.remote
+    module: llama_stack_provider_ragas.remote==0.5.1
     config:

Also applies to: 113-113, 119-119, 124-124

🤖 Prompt for AI Agents
In distribution/run.yaml around lines 92, 113, 119 and 124, the provider package
versions are pinned incorrectly and cause CI ModuleNotFoundError; update the
pins so that line 92 reads llama_stack_provider_trustyai_fms==0.3.0, line 113
reads llama_stack_provider_lmeval==0.4.0, and lines 119 and 124 pin
llama_stack_provider_ragas==0.5.1; save the file and run CI dependency check to
confirm the updated versions resolve the failure.

Copy link
Copy Markdown
Collaborator

@Elbehery Elbehery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Collaborator

@derekhiggins derekhiggins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@derekhiggins derekhiggins merged commit 3db14ea into opendatahub-io:main Dec 1, 2025
6 checks passed
@nathan-weinberg
Copy link
Copy Markdown
Collaborator

@derekhiggins please let Mergify do the merges, thanks

@leseb leseb deleted the fixxx branch December 2, 2025 08:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants