Skip to content

Commit 0edff6d

Browse files
committed
RHAIENG-1134: Configuring LLS to use OAuth
1 parent 3c55a02 commit 0edff6d

File tree

2 files changed

+328
-1
lines changed

2 files changed

+328
-1
lines changed

modules/auth-on-llama-stack.adoc

Lines changed: 326 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,326 @@
1+
:_module-type: PROCEDURE
2+
3+
[id="auth-on-llama-stack_{context}"]
4+
= Configuring Llama Stack with OAuth Authentication
5+
6+
You can configure Llama Stack to enable Role-Based Access Control (RBAC) for model access using OAuth authentication on {productname-short} using KeyCloak The following example shows how to configure Llama Stack so that a vLLM model can be access by all authenticated users, while an OpenAI model is restricted to specific users.
7+
8+
Before starting, you must already have KeyCloak set up wth the following parameters:
9+
10+
.Prerequisites
11+
12+
* You have installed {openshift-platform} 4.19 or newer.
13+
* You have logged in to {productname-long}.
14+
* You have cluster administrator privileges for your OpenShift cluster.
15+
* You have installed the {openshift-cli} as described in the appropriate documentation for your cluster:
16+
ifdef::upstream,self-managed[]
17+
** link:https://docs.redhat.com/en/documentation/openshift_container_platform/{ocp-latest-version}/html/cli_tools/openshift-cli-oc#installing-openshift-cli[Installing the OpenShift CLI^] for OpenShift Container Platform
18+
** link:https://docs.redhat.com/en/documentation/red_hat_openshift_service_on_aws/{rosa-latest-version}/html/cli_tools/openshift-cli-oc#installing-openshift-cli[Installing the OpenShift CLI^] for {rosa-productname}
19+
endif::[]
20+
ifdef::cloud-service[]
21+
** link:https://docs.redhat.com/en/documentation/openshift_dedicated/{osd-latest-version}/html/cli_tools/openshift-cli-oc#installing-openshift-cli[Installing the OpenShift CLI^] for OpenShift Dedicated
22+
** link:https://docs.redhat.com/en/documentation/red_hat_openshift_service_on_aws_classic_architecture/{rosa-classic-latest-version}/html/cli_tools/openshift-cli-oc#installing-openshift-cli[Installing the OpenShift CLI^] for {rosa-classic-productname}
23+
endif::[]
24+
25+
.Procedure
26+
27+
. To configure Llama Stack to use Role-Based Access Control (RBAC) to access models, you need to view and verify the OAuth provider token structure.
28+
29+
.. Generate a KeyCloak test token to view the structure with the following command:
30+
+
31+
[source,terminal]
32+
----
33+
$ curl -d client_id=llamastack -d client_secret=YOUR_CLIENT_SECRET -d username=user1 -d password=user-password -d grant_type=password ${TOKEN_ENDPOINT} | jq -r .access_token > test.token
34+
----
35+
36+
.. View the token claims by running the following command:
37+
+
38+
[source,terminal]
39+
----
40+
$ cat test.token | cut -d . -f 2 | base64 -d 2>/dev/null | jq .
41+
----
42+
+
43+
.Example token structure from KeyCloak
44+
[source,terminal]
45+
----
46+
$ {
47+
"iss": "http://keycloak-host/realms/testrealm",
48+
"aud": "account",
49+
"sub": "761cdc99-80e5-4506-9b9e-26a67a8566f7",
50+
"preferred_username": "user1",
51+
"llamastack_roles": [
52+
"inference_max",
53+
],
54+
}
55+
----
56+
57+
. You need to then create a `run.yaml` file that defines the necessary configurations for OAuth.
58+
59+
.. Define a configuration with two inference providers and OAuth authentication with the following `run.yaml` example:
60+
+
61+
[source,yaml]
62+
----
63+
version: 2
64+
image_name: rh
65+
apis:
66+
- inference
67+
- agents
68+
- safety
69+
- telemetry
70+
- tool_runtime
71+
- vector_io
72+
providers:
73+
inference:
74+
- provider_id: vllm-inference
75+
provider_type: remote::vllm
76+
config:
77+
url: ${env.VLLM_URL:=http://localhost:8000/v1}
78+
max_tokens: ${env.VLLM_MAX_TOKENS:=4096}
79+
api_token: ${env.VLLM_API_TOKEN:=fake}
80+
tls_verify: ${env.VLLM_TLS_VERIFY:=true}
81+
- provider_id: openai
82+
provider_type: remote::openai
83+
config:
84+
api_key: ${env.OPENAI_API_KEY:=}
85+
base_url: ${env.OPENAI_BASE_URL:=https://api.openai.com/v1}
86+
telemetry:
87+
- provider_id: meta-reference
88+
provider_type: inline::meta-reference
89+
config:
90+
service_name: "${env.OTEL_SERVICE_NAME:=​}"
91+
sinks: ${env.TELEMETRY_SINKS:=console,sqlite}
92+
sqlite_db_path: /opt/app-root/src/.llama/distributions/rh/trace_store.db
93+
otel_exporter_otlp_endpoint: ${env.OTEL_EXPORTER_OTLP_ENDPOINT:=}
94+
agents:
95+
- provider_id: meta-reference
96+
provider_type: inline::meta-reference
97+
config:
98+
persistence_store:
99+
type: sqlite
100+
namespace: null
101+
db_path: /opt/app-root/src/.llama/distributions/rh/agents_store.db
102+
responses_store:
103+
type: sqlite
104+
db_path: /opt/app-root/src/.llama/distributions/rh/responses_store.db
105+
models:
106+
- model_id: llama-3.2
107+
provider_id: vllm-inference
108+
model_type: llm
109+
metadata: {}
110+
111+
- model_id: gpt-4o-mini
112+
provider_id: openai
113+
model_type: llm
114+
metadata: {}
115+
116+
server:
117+
port: 8321
118+
auth:
119+
provider_config:
120+
type: "oauth2_token"
121+
jwks:
122+
uri: "https://<keycloak-host>/realms/<your-keycloak-realm>/protocol/openid-connect/certs" <1>
123+
key_recheck_period: 3600
124+
issuer: "https://<keycloak-host>/realms/<your-keycloak-realm>" <1>
125+
audience: "account"
126+
verify_tls: true
127+
claims_mapping:
128+
llamastack_roles: "roles" <2>
129+
access_policy:
130+
- permit: <3>
131+
actions: [read]
132+
resource: model::vllm-inference/llama-3.2
133+
description: Allow all authenticated users to access Llama 3.2 model
134+
- permit: <4>
135+
actions: [read]
136+
resource: model::openai/gpt-4o-mini
137+
when: user with inference_max in roles
138+
description: Allow only users with inference_max role to access OpenAI models
139+
----
140+
+
141+
<1> Specify your KeyCloak host and Realm in the URL.
142+
<2> Maps the `llamastack_roles` path from the token to the `roles` field.
143+
<3> Policy 1: Allow all authenticated users to access vLLM models.
144+
<4> Policy 2: Restrict OpenAI models to users with the `inference_max` role.
145+
146+
. Create a ConfigMap that uses the `run.yaml` configuration by running the following command:
147+
+
148+
[source,terminal]
149+
----
150+
$ oc create configmap llamastack-custom-config --from-file=run.yaml=run.yaml -n redhat-ods-operator
151+
----
152+
153+
. Create a `llamastack-distribution.yaml` files with the following parameters:
154+
+
155+
[source,yaml]
156+
----
157+
apiVersion: llamastack.io/v1alpha1
158+
kind: LlamaStackDistribution
159+
metadata:
160+
name: llamastack-distribution
161+
namespace: redhat-ods-operator
162+
spec:
163+
replicas: 1
164+
server:
165+
distribution:
166+
name: rh-dev
167+
containerSpec:
168+
name: llama-stack
169+
port: 8321
170+
env:
171+
# vLLM Provider Configuration
172+
- name: VLLM_URL
173+
value: "http://your-vllm-service:8000/v1"
174+
- name: VLLM_API_TOKEN
175+
value: "your-vllm-token"
176+
- name: VLLM_TLS_VERIFY
177+
value: "false"
178+
# OpenAI Provider Configuration
179+
- name: OPENAI_API_KEY
180+
value: "your-openai-api-key"
181+
- name: OPENAI_BASE_URL
182+
value: "https://api.openai.com/v1"
183+
userConfig:
184+
configMapName: llamastack-custom-config
185+
configMapNamespace: redhat-ods-operator
186+
----
187+
188+
. To apply the distribution, run the following command:
189+
+
190+
[source,terminal]
191+
----
192+
$ oc apply -f llamastack-distribution.yaml
193+
----
194+
195+
. Wait for the distribution to be ready by running the following command:
196+
+
197+
[source,terminal]
198+
----
199+
oc wait --for=jsonpath='{.status.phase}'=Ready llamastackdistribution/llamastack-distribution -n redhat-ods-operator --timeout=300s
200+
----
201+
202+
. Generate the OAuth tokens for each user account to authenticate API requests.
203+
204+
.. To request a basic access token and add it to a `user1.token` file, run the following command
205+
+
206+
[source,terminal]
207+
----
208+
$ curl -d client_id=llamastack \
209+
-d client_secret=YOUR_CLIENT_SECRET \
210+
-d username=user1 \
211+
-d password=user1-password \
212+
-d grant_type=password \
213+
https://YOUR_KEYCLOAK_HOST/realms/YOUR_REALM/protocol/openid-connect/token \
214+
| jq -r .access_token > user1.token
215+
----
216+
217+
.. To request full access token and add it to a `user2.token` file, run the following command
218+
+
219+
[source,terminal]
220+
----
221+
$ curl -d client_id=llamastack \
222+
-d client_secret=YOUR_CLIENT_SECRET \
223+
-d username=user2 \
224+
-d password=user2-password \
225+
-d grant_type=password \
226+
https://YOUR_KEYCLOAK_HOST/realms/YOUR_REALM/protocol/openid-connect/token \
227+
| jq -r .access_token > user2.token
228+
----
229+
230+
.. You can verify the credentials by running the following command:
231+
+
232+
[source,terminal]
233+
----
234+
$ cat user2.token | cut -d . -f 2 | base64 -d 2>/dev/null | jq .
235+
----
236+
+
237+
.Example output
238+
[source,terminal]
239+
----
240+
{
241+
"iss": "https://keycloak-host/realms/testrealm",
242+
"aud": "account",
243+
"exp": 1760553504,
244+
"preferred_username": "user2",
245+
"llamastack_roles": ["inference_max"]
246+
}
247+
----
248+
249+
.Verification
250+
251+
* Testing basic access to models
252+
253+
. Load the token with the following command:
254+
+
255+
[source,terminal]
256+
----
257+
USER1_TOKEN=$(cat user1.token)
258+
----
259+
260+
. Access the vLLM model by running the following command:
261+
+
262+
[source,terminal]
263+
----
264+
curl -X POST ${LLAMASTACK_URL}/v1/openai/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: Bearer ${USER1_TOKEN}" -d '{"model": "vllm-inference/llama-3.2", "messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 50}'
265+
----
266+
267+
. If you attempt to access the OpenAI models with these permissions, you will see an error due to access restrictions;
268+
+
269+
[source, terminal]
270+
----
271+
curl -X POST ${LLAMASTACK_URL}/v1/openai/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: Bearer ${USER1_TOKEN}" -d '{"model": "openai/gpt-4o-mini", "messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 50}'
272+
----
273+
274+
275+
* Testing full authorization to models
276+
277+
. Load the token with the following command:
278+
+
279+
[source,terminal]
280+
----
281+
$ USER2_TOKEN=$(cat user2.token)
282+
----
283+
284+
. Access the vLLM model by running the following command:
285+
+
286+
[source,terminal]
287+
----
288+
$ curl -X POST ${LLAMASTACK_URL}/v1/openai/v1/chat/completions \
289+
-H "Content-Type: application/json" \
290+
-H "Authorization: Bearer ${USER2_TOKEN}" \
291+
-d '{
292+
"model": "vllm-inference/llama-3.2",
293+
"messages": [{"role": "user", "content": "Hello!"}],
294+
"max_tokens": 50
295+
}'
296+
----
297+
298+
. Access the OpenAI models by running the following command:
299+
+
300+
[source,terminal]
301+
----
302+
$ curl -X POST ${LLAMASTACK_URL}/v1/openai/v1/chat/completions \
303+
-H "Content-Type: application/json" \
304+
-H "Authorization: Bearer ${USER2_TOKEN}" \
305+
-d '{
306+
"model": "openai/gpt-4o-mini",
307+
"messages": [{"role": "user", "content": "Hello!"}],
308+
"max_tokens": 50
309+
}'
310+
----
311+
312+
* Testing without any authorization:
313+
+
314+
. Attempt to access the OpenAI or vLLM models:
315+
+
316+
[source,terminal]
317+
----
318+
$ LLAMASTACK_URL="http://localhost:8321"
319+
$ LLAMASTACK_URL=https://llamastack-distribution-redhat-ods-operator.apps.rosa.derekh-cluster.qrj7.p3.openshiftapps.com
320+
----
321+
.Example output
322+
+
323+
[source,terminal]
324+
----
325+
$ HTTP Status: 401
326+
----

working-with-llama-stack.adoc

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,4 +19,5 @@ include::modules/overview-of-llama-stack.adoc[leveloffset=+1]
1919
include::modules/openai-compatibility-for-rag-apis-in-llama-stack.adoc[leveloffset=+2]
2020
include::modules/openai-compatible-apis-in-llama-stack.adoc[leveloffset=+2]
2121
include::modules/activating-the-llama-stack-operator.adoc[leveloffset=+1]
22-
include::assemblies/deploying-a-rag-stack-in-a-data-science-project.adoc[leveloffset=+1]
22+
include::assemblies/deploying-a-rag-stack-in-a-data-science-project.adoc[leveloffset=+1]
23+
include::modules/auth-on-llama-stack.adoc[leveloffset=+1]

0 commit comments

Comments
 (0)