feat: make inference provider optional by leseb · Pull Request #40 · opendatahub-io/llama-stack-distribution

leseb · 2025-09-18T09:33:20Z

Inference provider implementations behave inconsistently—some can load normally without an API key (even though it’s required for proper functionality), while others simply fail. To ensure consistency, we should handle inference providers the same way we do vector I/O providers: treat most of them as optional, with vLLM being the exception.

Relates to: RHAIENG-1178

What does this PR do?

Test Plan

Summary by CodeRabbit

New Features
- Automatic activation of Bedrock integration when AWS credentials are present (no manual provider selection needed).
- Automatic activation of watsonx integration when a watsonx API key is present.
- Safer defaults: providers remain inactive unless relevant environment variables are set, avoiding misconfiguration.
Chores
- Streamlined provider configuration to respect environment-based availability without altering existing settings.

coderabbitai · 2025-09-18T09:33:28Z

Walkthrough

The run configuration now conditionally sets provider_id for bedrock and watsonx based on environment variables (AWS_ACCESS_KEY_ID and WATSONX_API_KEY). Provider types and configurations remain unchanged. No other parts of the file were modified.

Changes

Cohort / File(s)	Summary
Env-gated provider IDs `distribution/run.yaml`	Updated `provider_id` values to `${env.AWS_ACCESS_KEY_ID:+bedrock}` and `${env.WATSONX_API_KEY:+watsonx}` to conditionally enable providers based on presence of respective environment variables. No other configuration changes.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant Runtime as Runtime Loader
  participant Env as Environment
  participant Providers as Providers (bedrock, watsonx)

  User->>Runtime: Start with run.yaml
  Runtime->>Env: Check AWS_ACCESS_KEY_ID
  alt AWS key present
    Runtime->>Providers: Set bedrock provider_id = "bedrock"
  else No AWS key
    Runtime->>Providers: Omit bedrock provider_id (disabled)
  end

  Runtime->>Env: Check WATSONX_API_KEY
  alt Watsonx key present
    Runtime->>Providers: Set watsonx provider_id = "watsonx"
  else No Watsonx key
    Runtime->>Providers: Omit watsonx provider_id (disabled)
  end

  Runtime-->>User: Providers initialized per available env keys

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

I flipped two switches, soft and slight,
Env keys bloom—providers light.
If clouds aren’t here, we pause the flight;
When keys appear, we launch on sight.
A rabbit nods at configs tight—
Hop, hop, hooray for runtime right! 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The PR title "feat: make inference provider optional" is concise, uses conventional commit style, and directly summarizes the primary change (making inference providers optional). It aligns with the PR description and the file diffs (gating provider identities on environment variables) and is specific enough for someone scanning the commit history.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

✨ Finishing touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8e2c99b and 160d1fc.

📒 Files selected for processing (1)

distribution/run.yaml (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

distribution/run.yaml

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: build-test-push (linux/amd64)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

leseb · 2025-09-18T09:35:17Z

waiting for DS merge

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

distribution/run.yaml (2)
23-36: Bedrock enablement key may be too narrow; confirm empty-id filtering.

Gating on AWS_ACCESS_KEY_ID works but excludes valid auth paths (AWS_PROFILE, SSO, IMDS). Also, if ACCESS_KEY_ID is set without SECRET, the provider enables then fails later. Please confirm the loader drops providers whose provider_id resolves to empty, otherwise this entry could register with an empty id.

Optional tweak (clearer opt-in):
-  - provider_id: ${env.AWS_ACCESS_KEY_ID:+bedrock}
+  - provider_id: ${env.ENABLE_BEDROCK:+bedrock}
If you prefer auto-detect, consider gating on region instead:
-  - provider_id: ${env.AWS_ACCESS_KEY_ID:+bedrock}
+  - provider_id: ${env.AWS_DEFAULT_REGION:+bedrock}
Follow-up checks:

Start the distro with no AWS env set and ensure startup succeeds and no bedrock provider is registered.

Start with only AWS_PROFILE (no ACCESS_KEY_ID) if you want profile-based auth supported by default. If so, swap to one of the alternatives above.

39-45: Watsonx gating OK; consider also requiring project_id or explicit flag.

Using WATSONX_API_KEY is reasonable, but missing WATSONX_PROJECT_ID often yields runtime errors. If the loader doesn’t prevalidate, consider gating on an explicit flag or both vars.

Options:
-  - provider_id: ${env.WATSONX_API_KEY:+watsonx}
+  - provider_id: ${env.ENABLE_WATSONX:+watsonx}
or
-  - provider_id: ${env.WATSONX_API_KEY:+watsonx}
+  - provider_id: ${env.WATSONX_API_KEY:+${env.WATSONX_PROJECT_ID:+watsonx}}
Please verify:

With only API key set and empty project_id, do we get a clear validation error at boot?

If both are unset, provider is not registered and startup remains clean.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b535695 and a4f0eba.

📒 Files selected for processing (1)

distribution/run.yaml (2 hunks)

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a4f0eba and 3b41d0c.

📒 Files selected for processing (1)

.github/workflows/redhat-distro-container.yml (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: build-test-push (linux/amd64)

coderabbitai · 2025-09-18T11:39:48Z

.github/workflows/redhat-distro-container.yml

+
+      - name: Setup tmate session
+        if: ${{ always() }}
+        uses: mxschmitt/action-tmate@v3


⚠️ Potential issue

🧩 Analysis chain

Don’t run tmate unconditionally; gate, restrict access, and reposition before cleanup.

Unconditional tmate on every run (including pushes) will hang the job until manual teardown and poses a security/ops risk. Also, placing it after the cleanup step (Lines 116–121) makes the session less useful for post‑failure triage.

Gate it to PR failures only, same‑repo PRs, and a dedicated “ci-debug” label.

Restrict access to the triggering actor.

Move this step before “cleanup” so containers/logs are still present.

Pin the action to a commit SHA like other steps.

Consider a job‑level timeout-minutes guard (e.g., 60–90).

Apply this minimal change here; then relocate the block above the cleanup step:

- - name: Setup tmate session - if: ${{ always() }} - uses: mxschmitt/action-tmate@v3 + - name: Setup tmate session + if: ${{ failure() && github.event_name == 'pull_request' && !github.event.pull_request.head.repo.fork && contains(github.event.pull_request.labels.*.name, 'ci-debug') }} + uses: mxschmitt/action-tmate@v3 + with: + limit-access-to-actor: true

Follow-ups:

Pin to a specific commit SHA for mxschmitt/action-tmate (as done for other actions).

Add job-level timeout-minutes: 90 and move this step above Lines 116–121.

🏁 Script executed:

#!/bin/bash set -euo pipefail file=".github/workflows/redhat-distro-container.yml" if [ -f "$file" ]; then echo "== File: $file ==" nl -ba "$file" | sed -n '1,240p' else echo "File not found: $file" fi echo echo "== Search for tmate occurrences ==" rg -n --hidden -S "tmate|mxschmitt/action-tmate|action-tmate" -g '!.git/**' || true echo echo "== Show uses: lines in file ==" rg -n --hidden -S "uses:" "$file" -n -C2 || true echo echo "== Search for cleanup step names (cleanup, teardown) ==" rg -n --hidden -S "\bcleanup\b|\bteardown\b|remove.*container|stop.*container" "$file" -n -C3 || true echo echo "== Search for timeout-minutes in workflows ==" rg -n --hidden -S "timeout-minutes" .github/workflows -n -C2 || true

Length of output: 98

🏁 Script executed:

#!/bin/bash set -euo pipefail file=".github/workflows/redhat-distro-container.yml" if [ -f "$file" ]; then echo "== File: $file ==" awk '{printf "%4d: %s\n", NR, $0}' "$file" || true else echo "File not found: $file" exit 0 fi echo echo "== Search for tmate occurrences ==" if command -v rg >/dev/null 2>&1; then rg -n --hidden -S "tmate|mxschmitt/action-tmate|action-tmate" -g '!.git/**' || true else grep -nE "tmate|mxschmitt/action-tmate|action-tmate" -R .github || true fi echo echo "== Show uses: lines in file ==" if command -v rg >/dev/null 2>&1; then rg -n --hidden -S "uses:" "$file" -n -C2 || true else grep -n "uses:" "$file" || true fi echo echo "== Search for cleanup step names (cleanup, teardown) ==" if command -v rg >/dev/null 2>&1; then rg -n --hidden -S "\bcleanup\b|\bteardown\b|remove.*container|stop.*container" "$file" -n -C3 || true else grep -nE "\bcleanup\b|\bteardown\b|remove.*container|stop.*container" "$file" || true fi echo echo "== Search for timeout-minutes in workflows ==" if command -v rg >/dev/null 2>&1; then rg -n --hidden -S "timeout-minutes" .github/workflows -n -C2 || true else grep -nR "timeout-minutes" .github/workflows || true fi

Length of output: 7790

Don't run tmate unconditionally — gate it, restrict access, pin it, and move it before cleanup.

Unconditional tmate (lines 144–146) runs on every event and is placed after the cleanup step (116–121), which both risks hanging jobs and destroys containers/logs needed for debugging.

Gate to PR failures, same‑repo PRs, and a dedicated label (e.g., ci-debug).

Restrict access to the triggering actor and pin the action to a commit SHA (like other actions).

Move this step above the cleanup step and add job-level timeout-minutes (60–90).

Apply this minimal change (then relocate above the cleanup step):

- - name: Setup tmate session - if: ${{ always() }} - uses: mxschmitt/action-tmate@v3 + - name: Setup tmate session + if: ${{ failure() && github.event_name == 'pull_request' && !github.event.pull_request.head.repo.fork && contains(github.event.pull_request.labels.*.name, 'ci-debug') }} + uses: mxschmitt/action-tmate@v3 + with: + limit-access-to-actor: true

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- name: Setup tmate session

if: ${{ always() }}

uses: mxschmitt/action-tmate@v3

- name: Setup tmate session

if: ${{ failure() && github.event_name == 'pull_request' && !github.event.pull_request.head.repo.fork && contains(github.event.pull_request.labels.*.name, 'ci-debug') }}

uses: mxschmitt/action-tmate@v3

with:

limit-access-to-actor: true

🤖 Prompt for AI Agents

In .github/workflows/redhat-distro-container.yml around lines 143–146, the tmate step is run unconditionally and placed after the cleanup step; gate it to only run on failing PRs from the same repository and when a dedicated label (e.g., ci-debug) is present, restrict access to the triggering actor, pin the action to a specific commit SHA instead of a floating tag, move the step to execute before the cleanup step (which is around lines 116–121), and ensure the job has a timeout-minutes value (set 60–90) so tmate cannot hang indefinitely.

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tests/run_integration_tests.sh (1)
53-57: Don't hardcode a single pytest node — make the test target configurable

Hardcoding a single test in CI hides regressions; use an env var (default to the full integration dir) and keep single-test overrides only for PR troubleshooting.

Location: tests/run_integration_tests.sh:53-57
-    uv run pytest -s -vvvv  tests/integration/inference/test_text_inference.py::test_text_chat_completion_structured_output \
+    NODEID="${NODEID:-tests/integration/inference}"  # set NODEID to a specific test to narrow scope when needed
+    uv run pytest -s -vvvv "$NODEID" \
         --stack-config=server:"$STACK_CONFIG_PATH" \
         --text-model=vllm-inference/"$INFERENCE_MODEL" \
         --embedding-model=granite-embedding-125m \
         -k "not ($SKIP_TESTS)"
Follow-up: run the full suite on pushes to main and add a simple fallback to python -m pytest if uv is unavailable.

♻️ Duplicate comments (1)

.github/workflows/redhat-distro-container.yml (1)
144-146: Don’t run tmate unconditionally; gate, restrict access, and pin.

Running tmate on every run is a security/ops risk and can hang jobs. Limit to failing PRs from same-repo with a debug label, restrict access to the triggering actor, and pin to a commit SHA. Also move before any cleanup step when re-enabled.
-      - name: Setup tmate session
-        if: ${{ always() }}
-        uses: mxschmitt/action-tmate@v3
+      - name: Setup tmate session
+        if: ${{ failure() && github.event_name == 'pull_request' && !github.event.pull_request.head.repo.fork && contains(github.event.pull_request.labels.*.name, 'ci-debug') }}
+        uses: mxschmitt/action-tmate@<PINNED_COMMIT_SHA>
+        with:
+          limit-access-to-actor: true
Optional hardening: add timeout-minutes: 90 at the jobs.build-test-push level to prevent indefinite sessions.
What is the latest stable commit SHA for mxschmitt/action-tmate v3 to pin in GitHub Actions?

🧹 Nitpick comments (1)

.github/workflows/redhat-distro-container.yml (1)

77-115: Keep basic logs on failure to aid triage (even with gated tmate).

Commenting out log/artifact steps removes vital diagnostics. Re-enable a minimal set on failures.

-      # - name: Gather logs and debugging information
-      #   if: always()
-      #   shell: bash
-      #   run: |
-      #     # Create logs directory
-      #     mkdir -p logs
-      #
-      #     docker logs llama-stack > logs/llama-stack.log 2>&1 || echo "Failed to get llama-stack logs" > logs/llama-stack.log
-      #     docker logs vllm > logs/vllm.log 2>&1 || echo "Failed to get vllm logs" > logs/vllm.log
-      #
-      #     # Gather system information
-      #     echo "=== System information ==="
-      #     {
-      #       echo "Disk usage:"
-      #       df -h
-      #       echo "Memory usage:"
-      #       free -h
-      #       echo "Docker images:"
-      #       docker images
-      #       echo "Docker containers:"
-      #       docker ps -a
-      #     } > logs/system-info.log 2>&1
+      - name: Gather logs and debugging information
+        if: failure()
+        shell: bash
+        run: |
+          mkdir -p logs
+          docker logs llama-stack > logs/llama-stack.log 2>&1 || echo "no llama-stack logs" > logs/llama-stack.log
+          docker logs vllm > logs/vllm.log 2>&1 || echo "no vllm logs" > logs/vllm.log
+          {
+            echo "Disk usage:"; df -h
+            echo "Memory usage:"; free -h
+            echo "Docker containers:"; docker ps -a
+          } > logs/system-info.log 2>&1
-
-      # - name: Upload logs as artifacts
-      #   if: always()
-      #   uses: actions/upload-artifact@b4b15b8c7c6ac21ea08fcf65892d2ee8f75cf882 # v4.7.0
-      #   with:
-      #     name: ci-logs-${{ github.sha }}
-      #     path: logs/
-      #     retention-days: 7
+      - name: Upload logs as artifacts
+        if: failure()
+        uses: actions/upload-artifact@b4b15b8c7c6ac21ea08fcf65892d2ee8f75cf882 # v4.7.0
+        with:
+          name: ci-logs-${{ github.sha }}
+          path: logs/
+          retention-days: 7

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3b41d0c and e83d31d.

📒 Files selected for processing (2)

.github/workflows/redhat-distro-container.yml (1 hunks)
tests/run_integration_tests.sh (1 hunks)

nathan-weinberg

Holding until the cleanup is restored

leseb · 2025-09-18T15:06:37Z

Holding until the cleanup is restored

Which cleanup?

nathan-weinberg · 2025-09-18T15:10:55Z

Holding until the cleanup is restored

Which cleanup?

The workflow cleanup is commented out right now

Inference provider implementations behave inconsistently—some can load normally without an API key (even though it’s required for proper functionality), while others simply fail. To ensure consistency, we should handle inference providers the same way we do vector I/O providers: treat most of them as optional, with vLLM being the exception. Relates to: RHAIENG-1178 Signed-off-by: Sébastien Han <seb@redhat.com>

chore: bump wheel release to use 0.4.2

leseb marked this pull request as ready for review September 18, 2025 11:10

leseb requested review from Elbehery, VaishnaviHire, cdoern, derekhiggins, nathan-weinberg, rhdedgar, rhuss and skamenan7 as code owners September 18, 2025 11:11

coderabbitai bot reviewed Sep 18, 2025

View reviewed changes

leseb force-pushed the prov-opti branch 2 times, most recently from 30c8852 to e83d31d Compare September 18, 2025 12:00

coderabbitai bot reviewed Sep 18, 2025

View reviewed changes

nathan-weinberg requested changes Sep 18, 2025

View reviewed changes

nathan-weinberg force-pushed the prov-opti branch from e83d31d to 8e2c99b Compare September 18, 2025 17:39

nathan-weinberg added the do-not-merge Apply to PRs that should not be merged (yet) label Sep 18, 2025

nathan-weinberg force-pushed the prov-opti branch from 8e2c99b to aa9d751 Compare September 18, 2025 18:17

leseb force-pushed the prov-opti branch 2 times, most recently from a1a4e89 to 67f7d89 Compare September 18, 2025 18:20

leseb requested a review from nathan-weinberg September 18, 2025 18:20

nathan-weinberg approved these changes Sep 18, 2025

View reviewed changes

nathan-weinberg removed the do-not-merge Apply to PRs that should not be merged (yet) label Sep 18, 2025

nathan-weinberg force-pushed the prov-opti branch from 67f7d89 to 160d1fc Compare September 18, 2025 18:22

cdoern approved these changes Sep 18, 2025

View reviewed changes

nathan-weinberg merged commit d9dc908 into opendatahub-io:main Sep 18, 2025
5 checks passed

leseb deleted the prov-opti branch September 19, 2025 07:28

derekhiggins pushed a commit to derekhiggins/llama-stack-distribution that referenced this pull request Jan 29, 2026

Merge pull request opendatahub-io#40 from skamenan7/bump-v0.4.2

c9d9b6d

chore: bump wheel release to use 0.4.2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: make inference provider optional#40

feat: make inference provider optional#40
nathan-weinberg merged 1 commit intoopendatahub-io:mainfrom
leseb:prov-opti

leseb commented Sep 18, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 18, 2025 •

edited

Loading

Uh oh!

leseb commented Sep 18, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Sep 18, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

nathan-weinberg left a comment

Uh oh!

leseb commented Sep 18, 2025

Uh oh!

nathan-weinberg commented Sep 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

leseb commented Sep 18, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Test Plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

leseb commented Sep 18, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

nathan-weinberg left a comment

Choose a reason for hiding this comment

Uh oh!

leseb commented Sep 18, 2025

Uh oh!

nathan-weinberg commented Sep 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

leseb commented Sep 18, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 18, 2025 •

edited

Loading