Skip to content

feat: make inference provider optional#40

Merged
nathan-weinberg merged 1 commit intoopendatahub-io:mainfrom
leseb:prov-opti
Sep 18, 2025
Merged

feat: make inference provider optional#40
nathan-weinberg merged 1 commit intoopendatahub-io:mainfrom
leseb:prov-opti

Conversation

@leseb
Copy link
Copy Markdown
Collaborator

@leseb leseb commented Sep 18, 2025

Inference provider implementations behave inconsistently—some can load normally without an API key (even though it’s required for proper functionality), while others simply fail. To ensure consistency, we should handle inference providers the same way we do vector I/O providers: treat most of them as optional, with vLLM being the exception.

Relates to: RHAIENG-1178

What does this PR do?

Test Plan

Summary by CodeRabbit

  • New Features
    • Automatic activation of Bedrock integration when AWS credentials are present (no manual provider selection needed).
    • Automatic activation of watsonx integration when a watsonx API key is present.
    • Safer defaults: providers remain inactive unless relevant environment variables are set, avoiding misconfiguration.
  • Chores
    • Streamlined provider configuration to respect environment-based availability without altering existing settings.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Sep 18, 2025

Walkthrough

The run configuration now conditionally sets provider_id for bedrock and watsonx based on environment variables (AWS_ACCESS_KEY_ID and WATSONX_API_KEY). Provider types and configurations remain unchanged. No other parts of the file were modified.

Changes

Cohort / File(s) Summary
Env-gated provider IDs
distribution/run.yaml
Updated provider_id values to ${env.AWS_ACCESS_KEY_ID:+bedrock} and ${env.WATSONX_API_KEY:+watsonx} to conditionally enable providers based on presence of respective environment variables. No other configuration changes.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant Runtime as Runtime Loader
  participant Env as Environment
  participant Providers as Providers (bedrock, watsonx)

  User->>Runtime: Start with run.yaml
  Runtime->>Env: Check AWS_ACCESS_KEY_ID
  alt AWS key present
    Runtime->>Providers: Set bedrock provider_id = "bedrock"
  else No AWS key
    Runtime->>Providers: Omit bedrock provider_id (disabled)
  end

  Runtime->>Env: Check WATSONX_API_KEY
  alt Watsonx key present
    Runtime->>Providers: Set watsonx provider_id = "watsonx"
  else No Watsonx key
    Runtime->>Providers: Omit watsonx provider_id (disabled)
  end

  Runtime-->>User: Providers initialized per available env keys
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

I flipped two switches, soft and slight,
Env keys bloom—providers light.
If clouds aren’t here, we pause the flight;
When keys appear, we launch on sight.
A rabbit nods at configs tight—
Hop, hop, hooray for runtime right! 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The PR title "feat: make inference provider optional" is concise, uses conventional commit style, and directly summarizes the primary change (making inference providers optional). It aligns with the PR description and the file diffs (gating provider identities on environment variables) and is specific enough for someone scanning the commit history.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8e2c99b and 160d1fc.

📒 Files selected for processing (1)
  • distribution/run.yaml (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • distribution/run.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build-test-push (linux/amd64)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@leseb
Copy link
Copy Markdown
Collaborator Author

leseb commented Sep 18, 2025

waiting for DS merge

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
distribution/run.yaml (2)

23-36: Bedrock enablement key may be too narrow; confirm empty-id filtering.

Gating on AWS_ACCESS_KEY_ID works but excludes valid auth paths (AWS_PROFILE, SSO, IMDS). Also, if ACCESS_KEY_ID is set without SECRET, the provider enables then fails later. Please confirm the loader drops providers whose provider_id resolves to empty, otherwise this entry could register with an empty id.

Optional tweak (clearer opt-in):

-  - provider_id: ${env.AWS_ACCESS_KEY_ID:+bedrock}
+  - provider_id: ${env.ENABLE_BEDROCK:+bedrock}

If you prefer auto-detect, consider gating on region instead:

-  - provider_id: ${env.AWS_ACCESS_KEY_ID:+bedrock}
+  - provider_id: ${env.AWS_DEFAULT_REGION:+bedrock}

Follow-up checks:

  • Start the distro with no AWS env set and ensure startup succeeds and no bedrock provider is registered.
  • Start with only AWS_PROFILE (no ACCESS_KEY_ID) if you want profile-based auth supported by default. If so, swap to one of the alternatives above.

39-45: Watsonx gating OK; consider also requiring project_id or explicit flag.

Using WATSONX_API_KEY is reasonable, but missing WATSONX_PROJECT_ID often yields runtime errors. If the loader doesn’t prevalidate, consider gating on an explicit flag or both vars.

Options:

-  - provider_id: ${env.WATSONX_API_KEY:+watsonx}
+  - provider_id: ${env.ENABLE_WATSONX:+watsonx}

or

-  - provider_id: ${env.WATSONX_API_KEY:+watsonx}
+  - provider_id: ${env.WATSONX_API_KEY:+${env.WATSONX_PROJECT_ID:+watsonx}}

Please verify:

  • With only API key set and empty project_id, do we get a clear validation error at boot?
  • If both are unset, provider is not registered and startup remains clean.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b535695 and a4f0eba.

📒 Files selected for processing (1)
  • distribution/run.yaml (2 hunks)

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a4f0eba and 3b41d0c.

📒 Files selected for processing (1)
  • .github/workflows/redhat-distro-container.yml (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build-test-push (linux/amd64)

Comment on lines +143 to +146

- name: Setup tmate session
if: ${{ always() }}
uses: mxschmitt/action-tmate@v3
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

🧩 Analysis chain

Don’t run tmate unconditionally; gate, restrict access, and reposition before cleanup.

Unconditional tmate on every run (including pushes) will hang the job until manual teardown and poses a security/ops risk. Also, placing it after the cleanup step (Lines 116–121) makes the session less useful for post‑failure triage.

  • Gate it to PR failures only, same‑repo PRs, and a dedicated “ci-debug” label.
  • Restrict access to the triggering actor.
  • Move this step before “cleanup” so containers/logs are still present.
  • Pin the action to a commit SHA like other steps.
  • Consider a job‑level timeout-minutes guard (e.g., 60–90).

Apply this minimal change here; then relocate the block above the cleanup step:

-      - name: Setup tmate session
-        if: ${{ always() }}
-        uses: mxschmitt/action-tmate@v3
+      - name: Setup tmate session
+        if: ${{ failure() && github.event_name == 'pull_request' && !github.event.pull_request.head.repo.fork && contains(github.event.pull_request.labels.*.name, 'ci-debug') }}
+        uses: mxschmitt/action-tmate@v3
+        with:
+          limit-access-to-actor: true

Follow-ups:

  • Pin to a specific commit SHA for mxschmitt/action-tmate (as done for other actions).
  • Add job-level timeout-minutes: 90 and move this step above Lines 116–121.

🏁 Script executed:

#!/bin/bash
set -euo pipefail

file=".github/workflows/redhat-distro-container.yml"

if [ -f "$file" ]; then
  echo "== File: $file =="
  nl -ba "$file" | sed -n '1,240p'
else
  echo "File not found: $file"
fi

echo
echo "== Search for tmate occurrences =="
rg -n --hidden -S "tmate|mxschmitt/action-tmate|action-tmate" -g '!.git/**' || true

echo
echo "== Show uses: lines in file =="
rg -n --hidden -S "uses:" "$file" -n -C2 || true

echo
echo "== Search for cleanup step names (cleanup, teardown) =="
rg -n --hidden -S "\bcleanup\b|\bteardown\b|remove.*container|stop.*container" "$file" -n -C3 || true

echo
echo "== Search for timeout-minutes in workflows =="
rg -n --hidden -S "timeout-minutes" .github/workflows -n -C2 || true

Length of output: 98


🏁 Script executed:

#!/bin/bash
set -euo pipefail

file=".github/workflows/redhat-distro-container.yml"

if [ -f "$file" ]; then
  echo "== File: $file =="
  awk '{printf "%4d: %s\n", NR, $0}' "$file" || true
else
  echo "File not found: $file"
  exit 0
fi

echo
echo "== Search for tmate occurrences =="
if command -v rg >/dev/null 2>&1; then
  rg -n --hidden -S "tmate|mxschmitt/action-tmate|action-tmate" -g '!.git/**' || true
else
  grep -nE "tmate|mxschmitt/action-tmate|action-tmate" -R .github || true
fi

echo
echo "== Show uses: lines in file =="
if command -v rg >/dev/null 2>&1; then
  rg -n --hidden -S "uses:" "$file" -n -C2 || true
else
  grep -n "uses:" "$file" || true
fi

echo
echo "== Search for cleanup step names (cleanup, teardown) =="
if command -v rg >/dev/null 2>&1; then
  rg -n --hidden -S "\bcleanup\b|\bteardown\b|remove.*container|stop.*container" "$file" -n -C3 || true
else
  grep -nE "\bcleanup\b|\bteardown\b|remove.*container|stop.*container" "$file" || true
fi

echo
echo "== Search for timeout-minutes in workflows =="
if command -v rg >/dev/null 2>&1; then
  rg -n --hidden -S "timeout-minutes" .github/workflows -n -C2 || true
else
  grep -nR "timeout-minutes" .github/workflows || true
fi

Length of output: 7790


Don't run tmate unconditionally — gate it, restrict access, pin it, and move it before cleanup.

Unconditional tmate (lines 144–146) runs on every event and is placed after the cleanup step (116–121), which both risks hanging jobs and destroys containers/logs needed for debugging.

  • Gate to PR failures, same‑repo PRs, and a dedicated label (e.g., ci-debug).
  • Restrict access to the triggering actor and pin the action to a commit SHA (like other actions).
  • Move this step above the cleanup step and add job-level timeout-minutes (60–90).

Apply this minimal change (then relocate above the cleanup step):

-      - name: Setup tmate session
-        if: ${{ always() }}
-        uses: mxschmitt/action-tmate@v3
+      - name: Setup tmate session
+        if: ${{ failure() && github.event_name == 'pull_request' && !github.event.pull_request.head.repo.fork && contains(github.event.pull_request.labels.*.name, 'ci-debug') }}
+        uses: mxschmitt/action-tmate@v3
+        with:
+          limit-access-to-actor: true
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Setup tmate session
if: ${{ always() }}
uses: mxschmitt/action-tmate@v3
- name: Setup tmate session
if: ${{ failure() && github.event_name == 'pull_request' && !github.event.pull_request.head.repo.fork && contains(github.event.pull_request.labels.*.name, 'ci-debug') }}
uses: mxschmitt/action-tmate@v3
with:
limit-access-to-actor: true
🤖 Prompt for AI Agents
In .github/workflows/redhat-distro-container.yml around lines 143–146, the tmate
step is run unconditionally and placed after the cleanup step; gate it to only
run on failing PRs from the same repository and when a dedicated label (e.g.,
ci-debug) is present, restrict access to the triggering actor, pin the action to
a specific commit SHA instead of a floating tag, move the step to execute before
the cleanup step (which is around lines 116–121), and ensure the job has a
timeout-minutes value (set 60–90) so tmate cannot hang indefinitely.

@leseb leseb force-pushed the prov-opti branch 2 times, most recently from 30c8852 to e83d31d Compare September 18, 2025 12:00
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/run_integration_tests.sh (1)

53-57: Don't hardcode a single pytest node — make the test target configurable

Hardcoding a single test in CI hides regressions; use an env var (default to the full integration dir) and keep single-test overrides only for PR troubleshooting.

Location: tests/run_integration_tests.sh:53-57

-    uv run pytest -s -vvvv  tests/integration/inference/test_text_inference.py::test_text_chat_completion_structured_output \
+    NODEID="${NODEID:-tests/integration/inference}"  # set NODEID to a specific test to narrow scope when needed
+    uv run pytest -s -vvvv "$NODEID" \
         --stack-config=server:"$STACK_CONFIG_PATH" \
         --text-model=vllm-inference/"$INFERENCE_MODEL" \
         --embedding-model=granite-embedding-125m \
         -k "not ($SKIP_TESTS)"

Follow-up: run the full suite on pushes to main and add a simple fallback to python -m pytest if uv is unavailable.

♻️ Duplicate comments (1)
.github/workflows/redhat-distro-container.yml (1)

144-146: Don’t run tmate unconditionally; gate, restrict access, and pin.

Running tmate on every run is a security/ops risk and can hang jobs. Limit to failing PRs from same-repo with a debug label, restrict access to the triggering actor, and pin to a commit SHA. Also move before any cleanup step when re-enabled.

-      - name: Setup tmate session
-        if: ${{ always() }}
-        uses: mxschmitt/action-tmate@v3
+      - name: Setup tmate session
+        if: ${{ failure() && github.event_name == 'pull_request' && !github.event.pull_request.head.repo.fork && contains(github.event.pull_request.labels.*.name, 'ci-debug') }}
+        uses: mxschmitt/action-tmate@<PINNED_COMMIT_SHA>
+        with:
+          limit-access-to-actor: true

Optional hardening: add timeout-minutes: 90 at the jobs.build-test-push level to prevent indefinite sessions.

What is the latest stable commit SHA for mxschmitt/action-tmate v3 to pin in GitHub Actions?
🧹 Nitpick comments (1)
.github/workflows/redhat-distro-container.yml (1)

77-115: Keep basic logs on failure to aid triage (even with gated tmate).

Commenting out log/artifact steps removes vital diagnostics. Re-enable a minimal set on failures.

-      # - name: Gather logs and debugging information
-      #   if: always()
-      #   shell: bash
-      #   run: |
-      #     # Create logs directory
-      #     mkdir -p logs
-      #
-      #     docker logs llama-stack > logs/llama-stack.log 2>&1 || echo "Failed to get llama-stack logs" > logs/llama-stack.log
-      #     docker logs vllm > logs/vllm.log 2>&1 || echo "Failed to get vllm logs" > logs/vllm.log
-      #
-      #     # Gather system information
-      #     echo "=== System information ==="
-      #     {
-      #       echo "Disk usage:"
-      #       df -h
-      #       echo "Memory usage:"
-      #       free -h
-      #       echo "Docker images:"
-      #       docker images
-      #       echo "Docker containers:"
-      #       docker ps -a
-      #     } > logs/system-info.log 2>&1
+      - name: Gather logs and debugging information
+        if: failure()
+        shell: bash
+        run: |
+          mkdir -p logs
+          docker logs llama-stack > logs/llama-stack.log 2>&1 || echo "no llama-stack logs" > logs/llama-stack.log
+          docker logs vllm > logs/vllm.log 2>&1 || echo "no vllm logs" > logs/vllm.log
+          {
+            echo "Disk usage:"; df -h
+            echo "Memory usage:"; free -h
+            echo "Docker containers:"; docker ps -a
+          } > logs/system-info.log 2>&1
-
-      # - name: Upload logs as artifacts
-      #   if: always()
-      #   uses: actions/upload-artifact@b4b15b8c7c6ac21ea08fcf65892d2ee8f75cf882 # v4.7.0
-      #   with:
-      #     name: ci-logs-${{ github.sha }}
-      #     path: logs/
-      #     retention-days: 7
+      - name: Upload logs as artifacts
+        if: failure()
+        uses: actions/upload-artifact@b4b15b8c7c6ac21ea08fcf65892d2ee8f75cf882 # v4.7.0
+        with:
+          name: ci-logs-${{ github.sha }}
+          path: logs/
+          retention-days: 7
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3b41d0c and e83d31d.

📒 Files selected for processing (2)
  • .github/workflows/redhat-distro-container.yml (1 hunks)
  • tests/run_integration_tests.sh (1 hunks)

Copy link
Copy Markdown
Collaborator

@nathan-weinberg nathan-weinberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Holding until the cleanup is restored

@leseb
Copy link
Copy Markdown
Collaborator Author

leseb commented Sep 18, 2025

Holding until the cleanup is restored

Which cleanup?

@nathan-weinberg
Copy link
Copy Markdown
Collaborator

Holding until the cleanup is restored

Which cleanup?

The workflow cleanup is commented out right now

@nathan-weinberg nathan-weinberg added the do-not-merge Apply to PRs that should not be merged (yet) label Sep 18, 2025
@leseb leseb force-pushed the prov-opti branch 2 times, most recently from a1a4e89 to 67f7d89 Compare September 18, 2025 18:20
@nathan-weinberg nathan-weinberg removed the do-not-merge Apply to PRs that should not be merged (yet) label Sep 18, 2025
Inference provider implementations behave inconsistently—some can load
normally without an API key (even though it’s required for proper
functionality), while others simply fail. To ensure consistency, we
should handle inference providers the same way we do vector I/O
providers: treat most of them as optional, with vLLM being the
exception.

Relates to: RHAIENG-1178
Signed-off-by: Sébastien Han <seb@redhat.com>
@nathan-weinberg nathan-weinberg merged commit d9dc908 into opendatahub-io:main Sep 18, 2025
5 checks passed
@leseb leseb deleted the prov-opti branch September 19, 2025 07:28
derekhiggins pushed a commit to derekhiggins/llama-stack-distribution that referenced this pull request Jan 29, 2026
chore: bump wheel release to use 0.4.2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants