RHAIENG-1134: Configuring LLS to use OAuth #1032

kelbrown20 · 2025-10-31T17:17:41Z

Description:
Adds full procedure on how to enable Oauth on RHOAI using llama stack configurations.

Module preview

https://github.com/kelbrown20/opendatahub-documentation/blob/RHAIENG-1134-Auth-lls-workflow/modules/auth-on-llama-stack.adoc

Summary by CodeRabbit

Documentation
- Added a step-by-step guide for configuring OAuth-based RBAC for Llama Stack on OpenShift and included it in the main docs.
- Covers prerequisites, generating and inspecting OAuth tokens, and verifying token claims.
- Shows configuring multiple inference and telemetry providers, applying the runtime configuration, and deploying the distribution.
- Provides verification steps and examples, including restricting access to a model based on a specific role.

coderabbitai · 2025-10-31T17:17:51Z

Warning

Rate limit exceeded

@kelbrown20 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 3 minutes and 10 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 92c1718 and f481c59.

📒 Files selected for processing (2)

modules/auth-on-llama-stack.adoc (1 hunks)
working-with-llama-stack.adoc (1 hunks)

Walkthrough

Adds a new documentation module (modules/auth-on-llama-stack.adoc) describing how to configure OAuth-based RBAC for Llama Stack on OpenShift, and includes it into working-with-llama-stack.adoc; an existing RAG-stack include was re-applied without substantive change.

Changes

Cohort / File(s)	Summary
OAuth RBAC Configuration Documentation `modules/auth-on-llama-stack.adoc`	New comprehensive doc covering prerequisites, generating/inspecting OAuth tokens, OIDC issuer/JWKS inspection, example `run.yaml` with `vllm-inference` and `openai` providers, server-side OAuth config (jwks, issuer, audience, claims mapping), access policies, creating `llamastack-custom-config` ConfigMap, deploying `LlamaStackDistribution`, token generation/verification, Route setup, and example access tests for authenticated/unauthenticated/unauthorized scenarios.
Documentation Index Update `working-with-llama-stack.adoc`	Added include for `modules/auth-on-llama-stack.adoc[leveloffset=+1]`. Re-applied existing include for `assemblies/deploying-a-rag-stack-in-a-data-science-project.adoc` with no substantive change.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Client
  participant Proxy as LlamaStack Proxy
  participant Auth as OIDC Provider (Issuer / JWKS)
  participant vLLM as vllm-inference
  participant OpenAI as openai-inference

  Note over Client,Proxy: Request flow with OAuth token
  Client->>Proxy: HTTP request + Bearer token
  Proxy->>Auth: Validate token (fetch JWKS, verify issuer/audience/claims)
  alt Token valid & allowed for vLLM
    Proxy->>vLLM: Forward inference request
    vLLM-->>Proxy: Model response (200)
    Proxy-->>Client: 200 + response
  else Token valid & allowed for OpenAI
    Proxy->>OpenAI: Forward inference request
    OpenAI-->>Proxy: Model response (200)
    Proxy-->>Client: 200 + response
  else Token invalid or lacks required role/claim
    Proxy-->>Client: 401 or 403 error
  end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Review focus:
- modules/auth-on-llama-stack.adoc — validate command examples, YAML snippets (service accounts, run.yaml), OIDC/JWKS URLs, claims mapping, and access-policy statements for accuracy and reproducibility.
- working-with-llama-stack.adoc — ensure include path and leveloffset usage are correct.

Pre-merge checks and finishing touches

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately reflects the main change—adding documentation for configuring Llama Stack to use OAuth-based authentication.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

modules/auth-on-llama-stack.adoc (1)
333-333: Standardize source block syntax formatting.

Line 333 uses [source, terminal] with a space before "terminal", while other source blocks use [source,terminal] without a space. Maintain consistency across the document.

Apply this diff:
-[source, terminal]
+[source,terminal]

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3c55a02 and e42fee2.

📒 Files selected for processing (2)

modules/auth-on-llama-stack.adoc (1 hunks)
working-with-llama-stack.adoc (1 hunks)

🔇 Additional comments (5)

modules/auth-on-llama-stack.adoc (4)

25-51: Service account creation section is clear and well-structured.

The YAML formatting, callout annotations, and step-by-step instructions for creating and applying service accounts are appropriate.

53-81: OIDC configuration retrieval steps are accurate.

Commands and instructions for extracting OIDC issuer and JWKS URI are correct and well-documented with example output.

238-266: Token generation and storage procedures are well-organized.

Clear separation between vLLM and OpenAI token generation with appropriate 24-hour duration for testing scenarios.

268-343: Verification section covers critical access scenarios comprehensively.

Tests for successful vLLM access, OpenAI access, unauthorized access (401), and incorrect authorization (404) provide good coverage of expected behaviors.

working-with-llama-stack.adoc (1)

22-23: Include directive for new OAuth documentation module is correct.

The new include on line 23 uses proper AsciiDoc syntax, consistent leveloffset value, and is logically placed after the deployment section. The context variable will correctly resolve to "rag" for the module's ID anchor.

modules/auth-on-llama-stack.adoc

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

modules/auth-on-llama-stack.adoc (2)
6-6: Fix grammatical error: "can be access" should be "can be accessed".

This issue was flagged in a previous review but remains unaddressed.

Apply this diff:
-You can configure Llama Stack to enable Role-Based Access Control (RBAC) for model access using OAuth authentication on{productname-short}. The following example shows how to configure Llama Stack so that a vLLM model can be access by all authenticated users, while an OpenAI model is restricted to specific users.
+You can configure Llama Stack to enable Role-Based Access Control (RBAC) for model access using OAuth authentication on {productname-short}. The following example shows how to configure Llama Stack so that a vLLM model can be accessed by all authenticated users, while an OpenAI model is restricted to specific users.
Note: Also added missing space after authentication on.

144-164: Document security implications of TLS verification setting.

This issue was flagged in a previous review. Line 151 disables TLS verification (verify_tls: false) without explanation. For production deployments, users must understand the security risk and set verify_tls: true.

Add a security warning admonition before or immediately after the run.yaml code block:
      claims_mapping:
        sub: "roles"
    access_policy: 
+
+[WARNING]
+====
+The `verify_tls: false` setting disables TLS certificate validation. This is suitable only for development and testing environments. For production deployments, set `verify_tls: true` and ensure your OIDC provider uses a valid, trusted certificate.
+====
+
      - permit: <2>

🧹 Nitpick comments (2)

modules/auth-on-llama-stack.adoc (2)
289-291: Add missing period for consistency.

Line 291 ends without terminal punctuation:
-* Testing successful OpenAI access.
+* Testing successful OpenAI access.
 +
-This will allow you to access OpenAI and vLLM models 
+This will allow you to access OpenAI and vLLM models.
333-333: Standardize code block formatting: remove space before "terminal".

Line 333 uses inconsistent formatting. Match the convention used throughout the document:
-[source, terminal]
+[source,terminal]
All other code blocks in this document use [source,terminal] without a space (lines 48, 57, 173, 226, etc.).

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e42fee2 and 6034b35.

📒 Files selected for processing (2)

modules/auth-on-llama-stack.adoc (1 hunks)
working-with-llama-stack.adoc (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

working-with-llama-stack.adoc

🔇 Additional comments (1)

modules/auth-on-llama-stack.adoc (1)
186-186: Fix grammatical error: "files" should be "file" (singular).

Though the filename typo was corrected in a previous commit, this grammatical error remains.

Apply this diff:
-.. Create a `llamastack-distribution.yaml` files with the following parameters:
+.. Create a `llamastack-distribution.yaml` file with the following parameters:

modules/auth-on-llama-stack.adoc

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (3)

modules/auth-on-llama-stack.adoc (3)
141-164: Add security admonition for disabled TLS verification.

Line 151 sets verify_tls: false without warning users about the security implications. This configuration is insecure for production and should include a prominent warning explaining when and why this setting is acceptable.

Add a WARNING admonition immediately after the YAML code block (after line 164) and before the callout annotations:
 ----
+
+[WARNING]
+====
+The `verify_tls: false` setting disables TLS certificate validation and is suitable only for development and testing environments. For production deployments, set `verify_tls: true` and ensure your OIDC provider uses a valid, trusted certificate.
+====
186-186: Fix plural noun in distribution file creation step.

Line 186 reads "Create a llamastack-distribution.yaml files" — "files" should be singular "file" since only one file is being created.

Apply this diff:
-.. Create a `llamastack-distribution.yaml` files with the following parameters:
+.. Create a `llamastack-distribution.yaml` file with the following parameters:
6-6: Fix grammatical error and spacing in introduction.

Line 6 contains two issues: (1) incorrect verb form "can be access" should be "can be accessed"; (2) missing space after "on" before the variable {productname-short}.

Apply this diff:
-You can configure Llama Stack to enable Role-Based Access Control (RBAC) for model access using OAuth authentication on{productname-short}. The following example shows how to configure Llama Stack so that a vLLM model can be access by all authenticated users, while an OpenAI model is restricted to specific users.
+You can configure Llama Stack to enable Role-Based Access Control (RBAC) for model access using OAuth authentication on {productname-short}. The following example shows how to configure Llama Stack so that a vLLM model can be accessed by all authenticated users, while an OpenAI model is restricted to specific users.

🧹 Nitpick comments (1)

modules/auth-on-llama-stack.adoc (1)
300-311: Clarify service account access levels in verification section.

In the OpenAI access section (lines 289-311), the comment on line 291 states "This will allow you to access OpenAI and vLLM models" but the access policy in the run.yaml (line 162) restricts OpenAI access only to the llamastack-openai-inference service account. The comment is accurate but could be clearer: OpenAI token has access to both models due to its higher privilege level, while the vLLM token only accesses vLLM models. This distinction is important for understanding RBAC behavior.

Consider revising the comment to emphasize the privilege hierarchy:
-This will allow you to access OpenAI and vLLM models 
+This service account has elevated privileges and can access both OpenAI and vLLM models

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b437632 and 41bacfa.

📒 Files selected for processing (2)

modules/auth-on-llama-stack.adoc (1 hunks)
working-with-llama-stack.adoc (1 hunks)

🔇 Additional comments (1)

working-with-llama-stack.adoc (1)

22-23: LGTM!

The include directives are syntactically correct and appropriately positioned. The new OAuth module integrates naturally into the documentation flow.

modules/auth-on-llama-stack.adoc

derekhiggins

Thanks, @kelbrown20 , I havn't tried out the command but once the inline changes are sorted I'll run through the instructions end to end.

modules/auth-on-llama-stack.adoc

derekhiggins · 2025-11-04T08:54:06Z

modules/auth-on-llama-stack.adoc

+[source,terminal]
+----
+$ LLAMASTACK_URL="http://localhost:8321"  
+$ LLAMASTACK_URL=https://llamastack-distribution-redhat-ods-operator.apps.rosa.derekh-cluster.qrj7.p3.openshiftapps.com


This isn't doing anything, maybe we could run the curl command from above without the Authorization Header? to show it doesn't work.

Right, gotcha, that makes sense!
Just added that, is that right @derekhiggins

modules/auth-on-llama-stack.adoc

derekhiggins · 2025-11-04T08:59:33Z

modules/auth-on-llama-stack.adoc

+[id="auth-on-llama-stack_{context}"]
+= Configuring Llama Stack with OAuth Authentication 
+
+You can configure Llama Stack to enable Role-Based Access Control (RBAC) for model access using OAuth authentication on {productname-short} using KeyCloak The following example shows how to configure Llama Stack so that a vLLM model can be access by all authenticated users, while an OpenAI model is restricted to specific users.


We should mention somewhere that setting up keycloak is out of scope but

Before starting Keycloak should be setup a
realm: "llamastack-demo"
client: "llamastack" with direct access grants enabled
role: "inference_max" grants access to restricted models and a protocol mapper that adds realm roles to the access token under the claim name llamastack_roles
two test users:
user1 as a basic user with no assigned roles
user2 as an advanced user assigned the inference_max role (who will have access to both unrestricted and restricted models). The client secret generated by Keycloak must be saved as you will need it for token requests.

Gotcha!
For the keycloak configs, what would that look like as a code snippit
@derekhiggins

modules/auth-on-llama-stack.adoc

derekhiggins · 2025-11-04T09:04:15Z

modules/auth-on-llama-stack.adoc

+
+[source,terminal]
+----
+curl -X POST ${LLAMASTACK_URL}/v1/openai/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: Bearer ${USER1_TOKEN}" -d '{"model": "vllm-inference/llama-3.2", "messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 50}'


We need to set $LLAMASTACK_URL somewhere and to do that we need to create a route.

apiVersion: route.openshift.io/v1 kind: Route metadata: name: llamastack-distribution namespace: redhat-ods-operator labels: app: llamastack-distribution spec: to: kind: Service name: llamastack-distribution-service weight: 100 port: targetPort: 8321 tls: termination: edge insecureEdgeTerminationPolicy: Redirect wildcardPolicy: None

apply this yaml (oc apply -f route.yaml) and then run the command

LLAMASTACK_URL=https://$(oc get route -n redhat-ods-operator llamastack-distribution -o jsonpath='{.spec.host}' 2>/dev/null )

I think I added that to the right spot, @derekhiggins correct me if im wrong

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (2)

modules/auth-on-llama-stack.adoc (2)
6-6: Fix grammatical errors in introduction.

The sentence contains multiple errors: "access" should be "accessed" and "authenticate" should be "authenticated".

Apply this diff:
-You can configure Llama Stack to enable Role-Based Access Control (RBAC) for model access using OAuth authentication on {productname-short} using KeyCloak The following example shows how to configure Llama Stack so that a vLLM model can be access by all authenticate users, while an OpenAI model is restricted to specific users.
+You can configure Llama Stack to enable Role-Based Access Control (RBAC) for model access using OAuth authentication on {productname-short} using Keycloak. The following example shows how to configure Llama Stack so that a vLLM model can be accessed by all authenticated users, while an OpenAI model is restricted to specific users.
(Note: Also added missing period after "Keycloak" and standardized "KeyCloak" → "Keycloak".)

355-365: Fix test scenario logic: "without authorization" section still uses a Bearer token.

The "Testing without any authorization" section (lines 349–365) should demonstrate API calls without the Authorization header to show proper rejection. Currently, line 359 includes "Authorization: Bearer ${USER2_TOKEN}", which contradicts the intent to test unauthenticated access.

Remove the Bearer token header and optionally run the curl command twice: once without the header to show rejection, then explain the expected 401 response. Lines 355–356 also appear to be debugging artifacts (local URL and hardcoded cluster hostname) that should be removed.

Apply this diff:
  * Testing without any authorization: 
  +
  . Attempt to access models without providing an Authorization header: 
  +
  [source,terminal]
  ----
- $ LLAMASTACK_URL="http://localhost:8321"  
- $ LLAMASTACK_URL=https://llamastack-distribution-redhat-ods-operator.apps.rosa.derekh-cluster.qrj7.p3.openshiftapps.com
  $ curl -X POST ${LLAMASTACK_URL}/v1/openai/v1/chat/completions \
    -H "Content-Type: application/json" \
-   -H "Authorization: Bearer ${USER2_TOKEN}" \
    -d '{
      "model": "openai/gpt-4o-mini",
      "messages": [{"role": "user", "content": "Hello!"}],
      "max_tokens": 50
    }'
  ----

🧹 Nitpick comments (1)

modules/auth-on-llama-stack.adoc (1)
27-27: Clarify awkward phrasing in first procedure step.

Line 27 reads "to access models" which is redundant given the previous context. Simplify to "for model access" for consistency with the introduction.

Apply this diff:
-. To configure Llama Stack to use Role-Based Access Control (RBAC) to access models, you need to view and verify the OAuth provider token structure.
+. To configure Llama Stack to use Role-Based Access Control (RBAC) for model access, you need to view and verify the OAuth provider token structure.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0edff6d and 92c1718.

📒 Files selected for processing (2)

modules/auth-on-llama-stack.adoc (1 hunks)
working-with-llama-stack.adoc (1 hunks)

🔇 Additional comments (3)

working-with-llama-stack.adoc (1)

22-23: Verify the include path for the auth module.

Line 23 includes assemblies/auth-on-llama-stack.adoc, but the file is located at modules/auth-on-llama-stack.adoc. Confirm whether this is an intentional assembly wrapper or if the path should point directly to the module.

modules/auth-on-llama-stack.adoc (2)

126-126: Confirm TLS verification setting is intentional and documented.

Line 126 sets verify_tls: true, which is the secure default for production. Earlier feedback flagged the risk of verify_tls: false in similar configurations. Verify this is the correct and intended setting for the example, and consider adding a brief note after the config block clarifying that verify_tls: false should only be used for development environments.

301-301: Verify the API endpoint path structure.

Lines 301, 325, and 339 use the path ${LLAMASTACK_URL}/v1/openai/v1/chat/completions, which contains /v1 twice. Verify this is the correct endpoint structure for Llama Stack's OpenAI-compatible API or if it should be simplified to ${LLAMASTACK_URL}/v1/chat/completions.

Also applies to: 325-325, 339-339

modules/auth-on-llama-stack.adoc

kelbrown20 force-pushed the RHAIENG-1134-Auth-lls-workflow branch 2 times, most recently from 5773034 to e42fee2 Compare October 31, 2025 17:23

kelbrown20 marked this pull request as ready for review October 31, 2025 17:24

coderabbitai bot reviewed Oct 31, 2025

View reviewed changes

modules/auth-on-llama-stack.adoc Outdated Show resolved Hide resolved

modules/auth-on-llama-stack.adoc Outdated Show resolved Hide resolved

modules/auth-on-llama-stack.adoc Outdated Show resolved Hide resolved

kelbrown20 force-pushed the RHAIENG-1134-Auth-lls-workflow branch from e42fee2 to 6034b35 Compare October 31, 2025 17:33

coderabbitai bot reviewed Oct 31, 2025

View reviewed changes

modules/auth-on-llama-stack.adoc Outdated Show resolved Hide resolved

kelbrown20 force-pushed the RHAIENG-1134-Auth-lls-workflow branch 2 times, most recently from b437632 to 41bacfa Compare November 3, 2025 15:45

coderabbitai bot reviewed Nov 3, 2025

View reviewed changes

modules/auth-on-llama-stack.adoc Outdated Show resolved Hide resolved

kelbrown20 force-pushed the RHAIENG-1134-Auth-lls-workflow branch from 41bacfa to 6b5cc37 Compare November 3, 2025 15:49

derekhiggins suggested changes Nov 4, 2025

View reviewed changes

kelbrown20 force-pushed the RHAIENG-1134-Auth-lls-workflow branch 2 times, most recently from 0edff6d to 92c1718 Compare November 4, 2025 19:35

coderabbitai bot reviewed Nov 4, 2025

View reviewed changes

modules/auth-on-llama-stack.adoc Outdated Show resolved Hide resolved

modules/auth-on-llama-stack.adoc Outdated Show resolved Hide resolved

modules/auth-on-llama-stack.adoc Outdated Show resolved Hide resolved

RHAIENG-1134: Configuring LLS to use OAuth

f481c59

kelbrown20 force-pushed the RHAIENG-1134-Auth-lls-workflow branch from 92c1718 to f481c59 Compare November 4, 2025 20:02

RHAIENG-1134: Configuring LLS to use OAuth #1032

Are you sure you want to change the base?

RHAIENG-1134: Configuring LLS to use OAuth #1032

Uh oh!

Conversation

kelbrown20 commented Oct 31, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

derekhiggins left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

derekhiggins Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

kelbrown20 Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

derekhiggins Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

kelbrown20 Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

derekhiggins Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

kelbrown20 Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kelbrown20 commented Oct 31, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 31, 2025 •

edited

Loading