feat: make inference provider optional#40
feat: make inference provider optional#40nathan-weinberg merged 1 commit intoopendatahub-io:mainfrom
Conversation
WalkthroughThe run configuration now conditionally sets provider_id for bedrock and watsonx based on environment variables (AWS_ACCESS_KEY_ID and WATSONX_API_KEY). Provider types and configurations remain unchanged. No other parts of the file were modified. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant Runtime as Runtime Loader
participant Env as Environment
participant Providers as Providers (bedrock, watsonx)
User->>Runtime: Start with run.yaml
Runtime->>Env: Check AWS_ACCESS_KEY_ID
alt AWS key present
Runtime->>Providers: Set bedrock provider_id = "bedrock"
else No AWS key
Runtime->>Providers: Omit bedrock provider_id (disabled)
end
Runtime->>Env: Check WATSONX_API_KEY
alt Watsonx key present
Runtime->>Providers: Set watsonx provider_id = "watsonx"
else No Watsonx key
Runtime->>Providers: Omit watsonx provider_id (disabled)
end
Runtime-->>User: Providers initialized per available env keys
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
waiting for DS merge |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
distribution/run.yaml (2)
23-36: Bedrock enablement key may be too narrow; confirm empty-id filtering.Gating on AWS_ACCESS_KEY_ID works but excludes valid auth paths (AWS_PROFILE, SSO, IMDS). Also, if ACCESS_KEY_ID is set without SECRET, the provider enables then fails later. Please confirm the loader drops providers whose provider_id resolves to empty, otherwise this entry could register with an empty id.
Optional tweak (clearer opt-in):
- - provider_id: ${env.AWS_ACCESS_KEY_ID:+bedrock} + - provider_id: ${env.ENABLE_BEDROCK:+bedrock}If you prefer auto-detect, consider gating on region instead:
- - provider_id: ${env.AWS_ACCESS_KEY_ID:+bedrock} + - provider_id: ${env.AWS_DEFAULT_REGION:+bedrock}Follow-up checks:
- Start the distro with no AWS env set and ensure startup succeeds and no bedrock provider is registered.
- Start with only AWS_PROFILE (no ACCESS_KEY_ID) if you want profile-based auth supported by default. If so, swap to one of the alternatives above.
39-45: Watsonx gating OK; consider also requiring project_id or explicit flag.Using WATSONX_API_KEY is reasonable, but missing WATSONX_PROJECT_ID often yields runtime errors. If the loader doesn’t prevalidate, consider gating on an explicit flag or both vars.
Options:
- - provider_id: ${env.WATSONX_API_KEY:+watsonx} + - provider_id: ${env.ENABLE_WATSONX:+watsonx}or
- - provider_id: ${env.WATSONX_API_KEY:+watsonx} + - provider_id: ${env.WATSONX_API_KEY:+${env.WATSONX_PROJECT_ID:+watsonx}}Please verify:
- With only API key set and empty project_id, do we get a clear validation error at boot?
- If both are unset, provider is not registered and startup remains clean.
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/workflows/redhat-distro-container.yml(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build-test-push (linux/amd64)
|
|
||
| - name: Setup tmate session | ||
| if: ${{ always() }} | ||
| uses: mxschmitt/action-tmate@v3 |
There was a problem hiding this comment.
🧩 Analysis chain
Don’t run tmate unconditionally; gate, restrict access, and reposition before cleanup.
Unconditional tmate on every run (including pushes) will hang the job until manual teardown and poses a security/ops risk. Also, placing it after the cleanup step (Lines 116–121) makes the session less useful for post‑failure triage.
- Gate it to PR failures only, same‑repo PRs, and a dedicated “ci-debug” label.
- Restrict access to the triggering actor.
- Move this step before “cleanup” so containers/logs are still present.
- Pin the action to a commit SHA like other steps.
- Consider a job‑level timeout-minutes guard (e.g., 60–90).
Apply this minimal change here; then relocate the block above the cleanup step:
- - name: Setup tmate session
- if: ${{ always() }}
- uses: mxschmitt/action-tmate@v3
+ - name: Setup tmate session
+ if: ${{ failure() && github.event_name == 'pull_request' && !github.event.pull_request.head.repo.fork && contains(github.event.pull_request.labels.*.name, 'ci-debug') }}
+ uses: mxschmitt/action-tmate@v3
+ with:
+ limit-access-to-actor: trueFollow-ups:
- Pin to a specific commit SHA for mxschmitt/action-tmate (as done for other actions).
- Add job-level
timeout-minutes: 90and move this step above Lines 116–121.
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file=".github/workflows/redhat-distro-container.yml"
if [ -f "$file" ]; then
echo "== File: $file =="
nl -ba "$file" | sed -n '1,240p'
else
echo "File not found: $file"
fi
echo
echo "== Search for tmate occurrences =="
rg -n --hidden -S "tmate|mxschmitt/action-tmate|action-tmate" -g '!.git/**' || true
echo
echo "== Show uses: lines in file =="
rg -n --hidden -S "uses:" "$file" -n -C2 || true
echo
echo "== Search for cleanup step names (cleanup, teardown) =="
rg -n --hidden -S "\bcleanup\b|\bteardown\b|remove.*container|stop.*container" "$file" -n -C3 || true
echo
echo "== Search for timeout-minutes in workflows =="
rg -n --hidden -S "timeout-minutes" .github/workflows -n -C2 || trueLength of output: 98
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file=".github/workflows/redhat-distro-container.yml"
if [ -f "$file" ]; then
echo "== File: $file =="
awk '{printf "%4d: %s\n", NR, $0}' "$file" || true
else
echo "File not found: $file"
exit 0
fi
echo
echo "== Search for tmate occurrences =="
if command -v rg >/dev/null 2>&1; then
rg -n --hidden -S "tmate|mxschmitt/action-tmate|action-tmate" -g '!.git/**' || true
else
grep -nE "tmate|mxschmitt/action-tmate|action-tmate" -R .github || true
fi
echo
echo "== Show uses: lines in file =="
if command -v rg >/dev/null 2>&1; then
rg -n --hidden -S "uses:" "$file" -n -C2 || true
else
grep -n "uses:" "$file" || true
fi
echo
echo "== Search for cleanup step names (cleanup, teardown) =="
if command -v rg >/dev/null 2>&1; then
rg -n --hidden -S "\bcleanup\b|\bteardown\b|remove.*container|stop.*container" "$file" -n -C3 || true
else
grep -nE "\bcleanup\b|\bteardown\b|remove.*container|stop.*container" "$file" || true
fi
echo
echo "== Search for timeout-minutes in workflows =="
if command -v rg >/dev/null 2>&1; then
rg -n --hidden -S "timeout-minutes" .github/workflows -n -C2 || true
else
grep -nR "timeout-minutes" .github/workflows || true
fiLength of output: 7790
Don't run tmate unconditionally — gate it, restrict access, pin it, and move it before cleanup.
Unconditional tmate (lines 144–146) runs on every event and is placed after the cleanup step (116–121), which both risks hanging jobs and destroys containers/logs needed for debugging.
- Gate to PR failures, same‑repo PRs, and a dedicated label (e.g., ci-debug).
- Restrict access to the triggering actor and pin the action to a commit SHA (like other actions).
- Move this step above the cleanup step and add job-level timeout-minutes (60–90).
Apply this minimal change (then relocate above the cleanup step):
- - name: Setup tmate session
- if: ${{ always() }}
- uses: mxschmitt/action-tmate@v3
+ - name: Setup tmate session
+ if: ${{ failure() && github.event_name == 'pull_request' && !github.event.pull_request.head.repo.fork && contains(github.event.pull_request.labels.*.name, 'ci-debug') }}
+ uses: mxschmitt/action-tmate@v3
+ with:
+ limit-access-to-actor: true📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - name: Setup tmate session | |
| if: ${{ always() }} | |
| uses: mxschmitt/action-tmate@v3 | |
| - name: Setup tmate session | |
| if: ${{ failure() && github.event_name == 'pull_request' && !github.event.pull_request.head.repo.fork && contains(github.event.pull_request.labels.*.name, 'ci-debug') }} | |
| uses: mxschmitt/action-tmate@v3 | |
| with: | |
| limit-access-to-actor: true |
🤖 Prompt for AI Agents
In .github/workflows/redhat-distro-container.yml around lines 143–146, the tmate
step is run unconditionally and placed after the cleanup step; gate it to only
run on failing PRs from the same repository and when a dedicated label (e.g.,
ci-debug) is present, restrict access to the triggering actor, pin the action to
a specific commit SHA instead of a floating tag, move the step to execute before
the cleanup step (which is around lines 116–121), and ensure the job has a
timeout-minutes value (set 60–90) so tmate cannot hang indefinitely.
30c8852 to
e83d31d
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/run_integration_tests.sh (1)
53-57: Don't hardcode a single pytest node — make the test target configurableHardcoding a single test in CI hides regressions; use an env var (default to the full integration dir) and keep single-test overrides only for PR troubleshooting.
Location: tests/run_integration_tests.sh:53-57
- uv run pytest -s -vvvv tests/integration/inference/test_text_inference.py::test_text_chat_completion_structured_output \ + NODEID="${NODEID:-tests/integration/inference}" # set NODEID to a specific test to narrow scope when needed + uv run pytest -s -vvvv "$NODEID" \ --stack-config=server:"$STACK_CONFIG_PATH" \ --text-model=vllm-inference/"$INFERENCE_MODEL" \ --embedding-model=granite-embedding-125m \ -k "not ($SKIP_TESTS)"Follow-up: run the full suite on pushes to main and add a simple fallback to
python -m pytestifuvis unavailable.
♻️ Duplicate comments (1)
.github/workflows/redhat-distro-container.yml (1)
144-146: Don’t run tmate unconditionally; gate, restrict access, and pin.Running tmate on every run is a security/ops risk and can hang jobs. Limit to failing PRs from same-repo with a debug label, restrict access to the triggering actor, and pin to a commit SHA. Also move before any cleanup step when re-enabled.
- - name: Setup tmate session - if: ${{ always() }} - uses: mxschmitt/action-tmate@v3 + - name: Setup tmate session + if: ${{ failure() && github.event_name == 'pull_request' && !github.event.pull_request.head.repo.fork && contains(github.event.pull_request.labels.*.name, 'ci-debug') }} + uses: mxschmitt/action-tmate@<PINNED_COMMIT_SHA> + with: + limit-access-to-actor: trueOptional hardening: add
timeout-minutes: 90at thejobs.build-test-pushlevel to prevent indefinite sessions.What is the latest stable commit SHA for mxschmitt/action-tmate v3 to pin in GitHub Actions?
🧹 Nitpick comments (1)
.github/workflows/redhat-distro-container.yml (1)
77-115: Keep basic logs on failure to aid triage (even with gated tmate).Commenting out log/artifact steps removes vital diagnostics. Re-enable a minimal set on failures.
- # - name: Gather logs and debugging information - # if: always() - # shell: bash - # run: | - # # Create logs directory - # mkdir -p logs - # - # docker logs llama-stack > logs/llama-stack.log 2>&1 || echo "Failed to get llama-stack logs" > logs/llama-stack.log - # docker logs vllm > logs/vllm.log 2>&1 || echo "Failed to get vllm logs" > logs/vllm.log - # - # # Gather system information - # echo "=== System information ===" - # { - # echo "Disk usage:" - # df -h - # echo "Memory usage:" - # free -h - # echo "Docker images:" - # docker images - # echo "Docker containers:" - # docker ps -a - # } > logs/system-info.log 2>&1 + - name: Gather logs and debugging information + if: failure() + shell: bash + run: | + mkdir -p logs + docker logs llama-stack > logs/llama-stack.log 2>&1 || echo "no llama-stack logs" > logs/llama-stack.log + docker logs vllm > logs/vllm.log 2>&1 || echo "no vllm logs" > logs/vllm.log + { + echo "Disk usage:"; df -h + echo "Memory usage:"; free -h + echo "Docker containers:"; docker ps -a + } > logs/system-info.log 2>&1 - - # - name: Upload logs as artifacts - # if: always() - # uses: actions/upload-artifact@b4b15b8c7c6ac21ea08fcf65892d2ee8f75cf882 # v4.7.0 - # with: - # name: ci-logs-${{ github.sha }} - # path: logs/ - # retention-days: 7 + - name: Upload logs as artifacts + if: failure() + uses: actions/upload-artifact@b4b15b8c7c6ac21ea08fcf65892d2ee8f75cf882 # v4.7.0 + with: + name: ci-logs-${{ github.sha }} + path: logs/ + retention-days: 7
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
.github/workflows/redhat-distro-container.yml(1 hunks)tests/run_integration_tests.sh(1 hunks)
nathan-weinberg
left a comment
There was a problem hiding this comment.
Holding until the cleanup is restored
Which cleanup? |
The workflow cleanup is commented out right now |
e83d31d to
8e2c99b
Compare
8e2c99b to
aa9d751
Compare
a1a4e89 to
67f7d89
Compare
Inference provider implementations behave inconsistently—some can load normally without an API key (even though it’s required for proper functionality), while others simply fail. To ensure consistency, we should handle inference providers the same way we do vector I/O providers: treat most of them as optional, with vLLM being the exception. Relates to: RHAIENG-1178 Signed-off-by: Sébastien Han <seb@redhat.com>
67f7d89 to
160d1fc
Compare
chore: bump wheel release to use 0.4.2
Inference provider implementations behave inconsistently—some can load normally without an API key (even though it’s required for proper functionality), while others simply fail. To ensure consistency, we should handle inference providers the same way we do vector I/O providers: treat most of them as optional, with vLLM being the exception.
Relates to: RHAIENG-1178
What does this PR do?
Test Plan
Summary by CodeRabbit