-
Notifications
You must be signed in to change notification settings - Fork 37
RHAIENG-1134: Configuring LLS to use OAuth #1032
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
RHAIENG-1134: Configuring LLS to use OAuth #1032
Conversation
|
Warning Rate limit exceeded@kelbrown20 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 3 minutes and 10 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (2)
WalkthroughAdds a new documentation module ( Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client
participant Proxy as LlamaStack Proxy
participant Auth as OIDC Provider (Issuer / JWKS)
participant vLLM as vllm-inference
participant OpenAI as openai-inference
Note over Client,Proxy: Request flow with OAuth token
Client->>Proxy: HTTP request + Bearer token
Proxy->>Auth: Validate token (fetch JWKS, verify issuer/audience/claims)
alt Token valid & allowed for vLLM
Proxy->>vLLM: Forward inference request
vLLM-->>Proxy: Model response (200)
Proxy-->>Client: 200 + response
else Token valid & allowed for OpenAI
Proxy->>OpenAI: Forward inference request
OpenAI-->>Proxy: Model response (200)
Proxy-->>Client: 200 + response
else Token invalid or lacks required role/claim
Proxy-->>Client: 401 or 403 error
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Pre-merge checks and finishing touches✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
5773034 to
e42fee2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (1)
modules/auth-on-llama-stack.adoc (1)
333-333: Standardize source block syntax formatting.Line 333 uses
[source, terminal]with a space before "terminal", while other source blocks use[source,terminal]without a space. Maintain consistency across the document.Apply this diff:
-[source, terminal] +[source,terminal]
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
modules/auth-on-llama-stack.adoc(1 hunks)working-with-llama-stack.adoc(1 hunks)
🔇 Additional comments (5)
modules/auth-on-llama-stack.adoc (4)
25-51: Service account creation section is clear and well-structured.The YAML formatting, callout annotations, and step-by-step instructions for creating and applying service accounts are appropriate.
53-81: OIDC configuration retrieval steps are accurate.Commands and instructions for extracting OIDC issuer and JWKS URI are correct and well-documented with example output.
238-266: Token generation and storage procedures are well-organized.Clear separation between vLLM and OpenAI token generation with appropriate 24-hour duration for testing scenarios.
268-343: Verification section covers critical access scenarios comprehensively.Tests for successful vLLM access, OpenAI access, unauthorized access (401), and incorrect authorization (404) provide good coverage of expected behaviors.
working-with-llama-stack.adoc (1)
22-23: Include directive for new OAuth documentation module is correct.The new include on line 23 uses proper AsciiDoc syntax, consistent leveloffset value, and is logically placed after the deployment section. The context variable will correctly resolve to "rag" for the module's ID anchor.
e42fee2 to
6034b35
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
modules/auth-on-llama-stack.adoc (2)
6-6: Fix grammatical error: "can be access" should be "can be accessed".This issue was flagged in a previous review but remains unaddressed.
Apply this diff:
-You can configure Llama Stack to enable Role-Based Access Control (RBAC) for model access using OAuth authentication on{productname-short}. The following example shows how to configure Llama Stack so that a vLLM model can be access by all authenticated users, while an OpenAI model is restricted to specific users. +You can configure Llama Stack to enable Role-Based Access Control (RBAC) for model access using OAuth authentication on {productname-short}. The following example shows how to configure Llama Stack so that a vLLM model can be accessed by all authenticated users, while an OpenAI model is restricted to specific users.Note: Also added missing space after
authentication on.
144-164: Document security implications of TLS verification setting.This issue was flagged in a previous review. Line 151 disables TLS verification (
verify_tls: false) without explanation. For production deployments, users must understand the security risk and setverify_tls: true.Add a security warning admonition before or immediately after the run.yaml code block:
claims_mapping: sub: "roles" access_policy: + +[WARNING] +==== +The `verify_tls: false` setting disables TLS certificate validation. This is suitable only for development and testing environments. For production deployments, set `verify_tls: true` and ensure your OIDC provider uses a valid, trusted certificate. +==== + - permit: <2>
🧹 Nitpick comments (2)
modules/auth-on-llama-stack.adoc (2)
289-291: Add missing period for consistency.Line 291 ends without terminal punctuation:
-* Testing successful OpenAI access. +* Testing successful OpenAI access. + -This will allow you to access OpenAI and vLLM models +This will allow you to access OpenAI and vLLM models.
333-333: Standardize code block formatting: remove space before "terminal".Line 333 uses inconsistent formatting. Match the convention used throughout the document:
-[source, terminal] +[source,terminal]All other code blocks in this document use
[source,terminal]without a space (lines 48, 57, 173, 226, etc.).
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
modules/auth-on-llama-stack.adoc(1 hunks)working-with-llama-stack.adoc(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- working-with-llama-stack.adoc
🔇 Additional comments (1)
modules/auth-on-llama-stack.adoc (1)
186-186: Fix grammatical error: "files" should be "file" (singular).Though the filename typo was corrected in a previous commit, this grammatical error remains.
Apply this diff:
-.. Create a `llamastack-distribution.yaml` files with the following parameters: +.. Create a `llamastack-distribution.yaml` file with the following parameters:
b437632 to
41bacfa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (3)
modules/auth-on-llama-stack.adoc (3)
141-164: Add security admonition for disabled TLS verification.Line 151 sets
verify_tls: falsewithout warning users about the security implications. This configuration is insecure for production and should include a prominent warning explaining when and why this setting is acceptable.Add a WARNING admonition immediately after the YAML code block (after line 164) and before the callout annotations:
---- + +[WARNING] +==== +The `verify_tls: false` setting disables TLS certificate validation and is suitable only for development and testing environments. For production deployments, set `verify_tls: true` and ensure your OIDC provider uses a valid, trusted certificate. +====
186-186: Fix plural noun in distribution file creation step.Line 186 reads "Create a
llamastack-distribution.yamlfiles" — "files" should be singular "file" since only one file is being created.Apply this diff:
-.. Create a `llamastack-distribution.yaml` files with the following parameters: +.. Create a `llamastack-distribution.yaml` file with the following parameters:
6-6: Fix grammatical error and spacing in introduction.Line 6 contains two issues: (1) incorrect verb form "can be access" should be "can be accessed"; (2) missing space after "on" before the variable
{productname-short}.Apply this diff:
-You can configure Llama Stack to enable Role-Based Access Control (RBAC) for model access using OAuth authentication on{productname-short}. The following example shows how to configure Llama Stack so that a vLLM model can be access by all authenticated users, while an OpenAI model is restricted to specific users. +You can configure Llama Stack to enable Role-Based Access Control (RBAC) for model access using OAuth authentication on {productname-short}. The following example shows how to configure Llama Stack so that a vLLM model can be accessed by all authenticated users, while an OpenAI model is restricted to specific users.
🧹 Nitpick comments (1)
modules/auth-on-llama-stack.adoc (1)
300-311: Clarify service account access levels in verification section.In the OpenAI access section (lines 289-311), the comment on line 291 states "This will allow you to access OpenAI and vLLM models" but the access policy in the run.yaml (line 162) restricts OpenAI access only to the
llamastack-openai-inferenceservice account. The comment is accurate but could be clearer: OpenAI token has access to both models due to its higher privilege level, while the vLLM token only accesses vLLM models. This distinction is important for understanding RBAC behavior.Consider revising the comment to emphasize the privilege hierarchy:
-This will allow you to access OpenAI and vLLM models +This service account has elevated privileges and can access both OpenAI and vLLM models
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
modules/auth-on-llama-stack.adoc(1 hunks)working-with-llama-stack.adoc(1 hunks)
🔇 Additional comments (1)
working-with-llama-stack.adoc (1)
22-23: LGTM!The include directives are syntactically correct and appropriately positioned. The new OAuth module integrates naturally into the documentation flow.
41bacfa to
6b5cc37
Compare
derekhiggins
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @kelbrown20 , I havn't tried out the command but once the inline changes are sorted I'll run through the instructions end to end.
| [source,terminal] | ||
| ---- | ||
| $ LLAMASTACK_URL="http://localhost:8321" | ||
| $ LLAMASTACK_URL=https://llamastack-distribution-redhat-ods-operator.apps.rosa.derekh-cluster.qrj7.p3.openshiftapps.com |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't doing anything, maybe we could run the curl command from above without the Authorization Header? to show it doesn't work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, gotcha, that makes sense!
Just added that, is that right @derekhiggins
| [id="auth-on-llama-stack_{context}"] | ||
| = Configuring Llama Stack with OAuth Authentication | ||
|
|
||
| You can configure Llama Stack to enable Role-Based Access Control (RBAC) for model access using OAuth authentication on {productname-short} using KeyCloak The following example shows how to configure Llama Stack so that a vLLM model can be access by all authenticated users, while an OpenAI model is restricted to specific users. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should mention somewhere that setting up keycloak is out of scope but
Before starting Keycloak should be setup a
realm: "llamastack-demo"
client: "llamastack" with direct access grants enabled
role: "inference_max" grants access to restricted models and a protocol mapper that adds realm roles to the access token under the claim name llamastack_roles
two test users:
user1 as a basic user with no assigned roles
user2 as an advanced user assigned the inference_max role (who will have access to both unrestricted and restricted models). The client secret generated by Keycloak must be saved as you will need it for token requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha!
For the keycloak configs, what would that look like as a code snippit
@derekhiggins
| + | ||
| [source,terminal] | ||
| ---- | ||
| curl -X POST ${LLAMASTACK_URL}/v1/openai/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: Bearer ${USER1_TOKEN}" -d '{"model": "vllm-inference/llama-3.2", "messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 50}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to set $LLAMASTACK_URL somewhere and to do that we need to create a route.
apiVersion: route.openshift.io/v1
kind: Route
metadata:
name: llamastack-distribution
namespace: redhat-ods-operator
labels:
app: llamastack-distribution
spec:
to:
kind: Service
name: llamastack-distribution-service
weight: 100
port:
targetPort: 8321
tls:
termination: edge
insecureEdgeTerminationPolicy: Redirect
wildcardPolicy: None
apply this yaml (oc apply -f route.yaml) and then run the command
LLAMASTACK_URL=https://$(oc get route -n redhat-ods-operator llamastack-distribution -o jsonpath='{.spec.host}' 2>/dev/null )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I added that to the right spot, @derekhiggins correct me if im wrong
0edff6d to
92c1718
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (2)
modules/auth-on-llama-stack.adoc (2)
6-6: Fix grammatical errors in introduction.The sentence contains multiple errors: "access" should be "accessed" and "authenticate" should be "authenticated".
Apply this diff:
-You can configure Llama Stack to enable Role-Based Access Control (RBAC) for model access using OAuth authentication on {productname-short} using KeyCloak The following example shows how to configure Llama Stack so that a vLLM model can be access by all authenticate users, while an OpenAI model is restricted to specific users. +You can configure Llama Stack to enable Role-Based Access Control (RBAC) for model access using OAuth authentication on {productname-short} using Keycloak. The following example shows how to configure Llama Stack so that a vLLM model can be accessed by all authenticated users, while an OpenAI model is restricted to specific users.(Note: Also added missing period after "Keycloak" and standardized "KeyCloak" → "Keycloak".)
355-365: Fix test scenario logic: "without authorization" section still uses a Bearer token.The "Testing without any authorization" section (lines 349–365) should demonstrate API calls without the Authorization header to show proper rejection. Currently, line 359 includes
"Authorization: Bearer ${USER2_TOKEN}", which contradicts the intent to test unauthenticated access.Remove the Bearer token header and optionally run the curl command twice: once without the header to show rejection, then explain the expected 401 response. Lines 355–356 also appear to be debugging artifacts (local URL and hardcoded cluster hostname) that should be removed.
Apply this diff:
* Testing without any authorization: + . Attempt to access models without providing an Authorization header: + [source,terminal] ---- - $ LLAMASTACK_URL="http://localhost:8321" - $ LLAMASTACK_URL=https://llamastack-distribution-redhat-ods-operator.apps.rosa.derekh-cluster.qrj7.p3.openshiftapps.com $ curl -X POST ${LLAMASTACK_URL}/v1/openai/v1/chat/completions \ -H "Content-Type: application/json" \ - -H "Authorization: Bearer ${USER2_TOKEN}" \ -d '{ "model": "openai/gpt-4o-mini", "messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 50 }' ----
🧹 Nitpick comments (1)
modules/auth-on-llama-stack.adoc (1)
27-27: Clarify awkward phrasing in first procedure step.Line 27 reads "to access models" which is redundant given the previous context. Simplify to "for model access" for consistency with the introduction.
Apply this diff:
-. To configure Llama Stack to use Role-Based Access Control (RBAC) to access models, you need to view and verify the OAuth provider token structure. +. To configure Llama Stack to use Role-Based Access Control (RBAC) for model access, you need to view and verify the OAuth provider token structure.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
modules/auth-on-llama-stack.adoc(1 hunks)working-with-llama-stack.adoc(1 hunks)
🔇 Additional comments (3)
working-with-llama-stack.adoc (1)
22-23: Verify the include path for the auth module.Line 23 includes
assemblies/auth-on-llama-stack.adoc, but the file is located atmodules/auth-on-llama-stack.adoc. Confirm whether this is an intentional assembly wrapper or if the path should point directly to the module.modules/auth-on-llama-stack.adoc (2)
126-126: Confirm TLS verification setting is intentional and documented.Line 126 sets
verify_tls: true, which is the secure default for production. Earlier feedback flagged the risk ofverify_tls: falsein similar configurations. Verify this is the correct and intended setting for the example, and consider adding a brief note after the config block clarifying thatverify_tls: falseshould only be used for development environments.
301-301: Verify the API endpoint path structure.Lines 301, 325, and 339 use the path
${LLAMASTACK_URL}/v1/openai/v1/chat/completions, which contains/v1twice. Verify this is the correct endpoint structure for Llama Stack's OpenAI-compatible API or if it should be simplified to${LLAMASTACK_URL}/v1/chat/completions.Also applies to: 325-325, 339-339
92c1718 to
f481c59
Compare
Description:
Adds full procedure on how to enable Oauth on RHOAI using llama stack configurations.
Module preview
Summary by CodeRabbit