Skip to content

Commit b437632

Browse files
committed
RHAIENG-1134: Configuring LLS to use OAuth
1 parent 3c55a02 commit b437632

File tree

2 files changed

+345
-1
lines changed

2 files changed

+345
-1
lines changed

modules/auth-on-llama-stack.adoc

Lines changed: 343 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,343 @@
1+
:_module-type: PROCEDURE
2+
3+
[id="auth-on-llama-stack_{context}"]
4+
= Configuring Llama Stack with OAuth Authentication
5+
6+
You can configure Llama Stack to enable Role-Based Access Control (RBAC) for model access using OAuth authentication on{productname-short}. The following example shows how to configure Llama Stack so that a vLLM model can be access by all authenticated users, while an OpenAI model is restricted to specific users.
7+
8+
.Prerequisites
9+
10+
* You have installed {openshift-platform} 4.17 or newer.
11+
* You have logged in to {productname-long}.
12+
* You have cluster administrator privileges for your OpenShift cluster.
13+
* You have installed the {openshift-cli} as described in the appropriate documentation for your cluster:
14+
ifdef::upstream,self-managed[]
15+
** link:https://docs.redhat.com/en/documentation/openshift_container_platform/{ocp-latest-version}/html/cli_tools/openshift-cli-oc#installing-openshift-cli[Installing the OpenShift CLI^] for OpenShift Container Platform
16+
** link:https://docs.redhat.com/en/documentation/red_hat_openshift_service_on_aws/{rosa-latest-version}/html/cli_tools/openshift-cli-oc#installing-openshift-cli[Installing the OpenShift CLI^] for {rosa-productname}
17+
endif::[]
18+
ifdef::cloud-service[]
19+
** link:https://docs.redhat.com/en/documentation/openshift_dedicated/{osd-latest-version}/html/cli_tools/openshift-cli-oc#installing-openshift-cli[Installing the OpenShift CLI^] for OpenShift Dedicated
20+
** link:https://docs.redhat.com/en/documentation/red_hat_openshift_service_on_aws_classic_architecture/{rosa-classic-latest-version}/html/cli_tools/openshift-cli-oc#installing-openshift-cli[Installing the OpenShift CLI^] for {rosa-classic-productname}
21+
endif::[]
22+
23+
.Procedure
24+
25+
. To configure Llama Stack to use Role-Based Access Control (RBAC) to model access, you first need to create Service account.
26+
27+
.. Define the service account used for OAuth authentication, where each account corresponds to a specific application with its own access permissions. To configure this, create a `llamastack-auth.yaml` file.
28+
+
29+
[source,yaml]
30+
----
31+
apiVersion: v1
32+
kind: ServiceAccount
33+
metadata:
34+
name: llamastack-vllm-inference <1>
35+
namespace: redhat-ods-operator
36+
---
37+
apiVersion: v1
38+
kind: ServiceAccount
39+
metadata:
40+
name: llamastack-openai-inference <2>
41+
namespace: redhat-ods-operator
42+
----
43+
<1> Allows access to only vLLM models.
44+
<2> Allows access to vLLM and OpenAI models.
45+
46+
.. Apply the service account by running the following command:
47+
+
48+
[source,terminal]
49+
----
50+
$ oc apply -f llamastack-auth.yaml
51+
----
52+
53+
. You then need to retrieve the OpenID Connect Configuration.
54+
55+
.. {openshift-platform} provides a built-in OIDC provider that you can access by running the following command:
56+
+
57+
[source,terminal]
58+
----
59+
$ OIDC_CONFIG=$(oc get --raw /.well-known/openid-configuration)
60+
----
61+
62+
.. Extract issuer and JWKS URI by running the following commands:
63+
+
64+
[source,terminal]
65+
----
66+
$ ISSUER=$(echo "$OIDC_CONFIG" | jq -r .issuer)
67+
JWKS_URI="${ISSUER}/keys.json"
68+
69+
echo "OIDC Issuer: $ISSUER"
70+
echo "JWKS URI: $JWKS_URI"
71+
----
72+
+
73+
.Example output
74+
[source,terminal]
75+
----
76+
OIDC Issuer: https://oidc.com/2...a
77+
JWKS URI: https://oidc.com/2...a/keys.json
78+
----
79+
+
80+
Make a note of these values as they are necessary for the LlamaStack Distribution configuration
81+
82+
. You need to then create a `run.yaml` file that defines the necessary configurations for OAuth.
83+
84+
.. Define a configuration with two inference providers and OAuth authentication with the following `run.yaml` example:
85+
+
86+
[source,yaml]
87+
----
88+
version: 2
89+
image_name: rh
90+
apis:
91+
- inference
92+
- agents
93+
- safety
94+
- telemetry
95+
- tool_runtime
96+
- vector_io
97+
providers:
98+
inference:
99+
- provider_id: vllm-inference
100+
provider_type: remote::vllm
101+
config:
102+
url: ${env.VLLM_URL:=http://localhost:8000/v1}
103+
max_tokens: ${env.VLLM_MAX_TOKENS:=4096}
104+
api_token: ${env.VLLM_API_TOKEN:=fake}
105+
tls_verify: ${env.VLLM_TLS_VERIFY:=true}
106+
- provider_id: openai
107+
provider_type: remote::openai
108+
config:
109+
api_key: ${env.OPENAI_API_KEY:=}
110+
base_url: ${env.OPENAI_BASE_URL:=https://api.openai.com/v1}
111+
telemetry:
112+
- provider_id: meta-reference
113+
provider_type: inline::meta-reference
114+
config:
115+
service_name: "${env.OTEL_SERVICE_NAME:=\u200B}"
116+
sinks: ${env.TELEMETRY_SINKS:=console,sqlite}
117+
sqlite_db_path: /opt/app-root/src/.llama/distributions/rh/trace_store.db
118+
otel_exporter_otlp_endpoint: ${env.OTEL_EXPORTER_OTLP_ENDPOINT:=}
119+
agents:
120+
- provider_id: meta-reference
121+
provider_type: inline::meta-reference
122+
config:
123+
persistence_store:
124+
type: sqlite
125+
namespace: null
126+
db_path: /opt/app-root/src/.llama/distributions/rh/agents_store.db
127+
responses_store:
128+
type: sqlite
129+
db_path: /opt/app-root/src/.llama/distributions/rh/responses_store.db
130+
models:
131+
- model_id: llama-3.2
132+
provider_id: vllm-inference
133+
model_type: llm
134+
metadata: {}
135+
136+
- model_id: gpt-4o-mini
137+
provider_id: openai
138+
model_type: llm
139+
metadata: {}
140+
141+
server:
142+
port: 8321
143+
auth:
144+
provider_config:
145+
type: "oauth2_token"
146+
jwks:
147+
uri: "https://<your-cluster-oidc-url>/keys.json" <1>
148+
key_recheck_period: 3600
149+
issuer: "https://<your-cluster-oidc-url>"
150+
audience: "https://<your-cluster-oidc-url>"
151+
verify_tls: false
152+
claims_mapping:
153+
sub: "roles"
154+
access_policy:
155+
- permit: <2>
156+
actions: [read]
157+
resource: model::vllm-inference/llama-3.2
158+
description: Allow all authenticated users to access Llama 3.2 model
159+
- permit: <3>
160+
actions: [read]
161+
resource: model::openai/gpt-4o-mini
162+
when: user with system:serviceaccount:redhat-ods-operator:llamastack-openai-inference in roles
163+
description: Allow only llamastack-openai-inference to access OpenAI models
164+
----
165+
+
166+
<1> Enter your OIDC configuration information.
167+
<2> Policy 1: Allow all authenticated users to access vLLM models.
168+
<3> Policy 2: Restrict OpenAI models to specific service account.
169+
170+
. Create a `ConfigMap` with OAuth configuration by running the following command:
171+
+
172+
[source,terminal]
173+
----
174+
$ oc create configmap llamastack-custom-config --from-file=run.yaml=run.yaml -n redhat-ods-operator
175+
----
176+
177+
. You can verify that the ConfigMap was created by running the following command:
178+
+
179+
[source,terminal]
180+
----
181+
$ oc get configmap llamastack-custom-config -n redhat-ods-operator
182+
----
183+
184+
. You then need to create a `LlamaStackDistribution` custom resource that uses the OAuth configuration.
185+
186+
.. Create a `llamastack-distribution.yaml` files with the following parameters:
187+
+
188+
[source,yaml]
189+
----
190+
apiVersion: llamastack.io/v1alpha1
191+
kind: LlamaStackDistribution
192+
metadata:
193+
name: llamastack-distribution
194+
namespace: redhat-ods-operator
195+
spec:
196+
replicas: 1
197+
server:
198+
distribution:
199+
name: rh-dev
200+
containerSpec:
201+
name: llama-stack
202+
port: 8321
203+
env:
204+
# vLLM Provider Configuration
205+
- name: VLLM_URL
206+
value: "http://your-vllm-service:8000/v1"
207+
- name: VLLM_API_TOKEN
208+
value: "your-vllm-token"
209+
- name: VLLM_TLS_VERIFY
210+
value: "false"
211+
212+
# OpenAI Provider Configuration
213+
- name: OPENAI_API_KEY
214+
value: "your-openai-api-key"
215+
- name: OPENAI_BASE_URL
216+
value: "https://api.openai.com/v1"
217+
218+
# Reference the ConfigMap with OAuth configuration
219+
userConfig:
220+
configMapName: llamastack-custom-config
221+
configMapNamespace: redhat-ods-operator
222+
----
223+
224+
. To apply the distribution, run the following command:
225+
+
226+
[source,terminal]
227+
----
228+
$ oc apply -f llamastack-distribution.yaml
229+
----
230+
231+
. Wait for the distribution to be ready by running the following command:
232+
+
233+
[source,terminal]
234+
----
235+
oc wait --for=jsonpath='{.status.phase}'=Ready llamastackdistribution/llamastack-distribution -n redhat-ods-operator --timeout=300s
236+
----
237+
238+
. Generate the OAuth tokens for each service account to authenticate API requests.
239+
240+
* Token for vLLM service account:
241+
.. Generate the token for vLLM service account by running the following command:
242+
+
243+
[source,terminal]
244+
----
245+
$ oc create token llamastack-vllm-inference -n redhat-ods-operator --duration=24h > llamastack-vllm-token.txt
246+
----
247+
.. View the token with the following command:
248+
+
249+
[source,terminal]
250+
----
251+
$ cat llamastack-vllm-token.txt
252+
----
253+
254+
* Token for OpenAI service account
255+
.. Generate the token for OpenAI service account by running the following command:
256+
+
257+
[source,terminal]
258+
----
259+
$ oc create token llamastack-openai-inference -n redhat-ods-operator --duration=24h > llamastack-openai-token.txt
260+
----
261+
.. View the token with the following command:
262+
+
263+
[source,terminal]
264+
----
265+
$ cat llamastack-openai-token.txt
266+
----
267+
268+
.Verification
269+
270+
* Testing successful vLLM access:
271+
+
272+
This will allow you to access only vLLM models.
273+
274+
. Load the vLLM token with the following command:
275+
+
276+
[source,terminal]
277+
----
278+
$ VLLM_TOKEN=$(cat llamastack-vllm-token.txt)
279+
----
280+
281+
. You can then access the vLLM model by running the following command:
282+
+
283+
[source,terminal]
284+
----
285+
$ oc exec $POD_NAME -n redhat-ods-operator -- curl -s -X POST http://localhost:8321/v1/openai/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: Bearer ${VLLM_TOKEN}" -d '{"model": "vllm-inference/llama-3.2", "messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 50}'
286+
----
287+
288+
289+
* Testing successful OpenAI access.
290+
+
291+
This will allow you to access OpenAI and vLLM models
292+
293+
. Load the OpenAI token with the following command:
294+
+
295+
[source,terminal]
296+
----
297+
$ OPENAI_TOKEN=$(cat llamastack-openai-token.txt)
298+
----
299+
300+
. Access the vLLM model by running the following command:
301+
+
302+
[source,terminal]
303+
----
304+
$ oc exec $POD_NAME -n redhat-ods-operator -- curl -s -X POST http://localhost:8321/v1/openai/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: Bearer ${OPENAI_TOKEN}" -d '{"model": "vllm-inference/llama-3.2", "messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 50}'
305+
----
306+
. Access the OpenAI models by running the following command:
307+
+
308+
[source,terminal]
309+
----
310+
oc exec $POD_NAME -n redhat-ods-operator -- curl -s -X POST http://localhost:8321/v1/openai/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: Bearer ${OPENAI_TOKEN}" -d '{"model": "openai/gpt-4o-mini", "messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 50}'
311+
----
312+
313+
* Testing without any authorization:
314+
+
315+
. Attempt to access the OpenAI or vLLM models:
316+
+
317+
[source,terminal]
318+
----
319+
$ POD_NAME=$(oc get pods -n redhat-ods-operator -l app=llama-stack -o jsonpath='{.items[0].metadata.name}')
320+
$ oc exec $POD_NAME -n redhat-ods-operator -- curl -s -w "\nHTTP Status: %{http_code}\n" http://localhost:8321/v1/models
321+
----
322+
.Example output
323+
+
324+
[source,terminal]
325+
----
326+
$ HTTP Status: 401
327+
----
328+
329+
* Testing incorrect authorization:
330+
+
331+
. Attempt to access an OpenAI model with a vLLM token:
332+
+
333+
[source, terminal]
334+
----
335+
$ oc exec $POD_NAME -n redhat-ods-operator -- curl -s -X POST http://localhost:8321/v1/openai/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: Bearer ${VLLM_TOKEN}" -d '{"model": "openai/gpt-4o-mini", "messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 50}'
336+
----
337+
+
338+
.Example output
339+
+
340+
[source,terminal]
341+
----
342+
$ 404 - File (model) not found
343+
----

working-with-llama-stack.adoc

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,4 +19,5 @@ include::modules/overview-of-llama-stack.adoc[leveloffset=+1]
1919
include::modules/openai-compatibility-for-rag-apis-in-llama-stack.adoc[leveloffset=+2]
2020
include::modules/openai-compatible-apis-in-llama-stack.adoc[leveloffset=+2]
2121
include::modules/activating-the-llama-stack-operator.adoc[leveloffset=+1]
22-
include::assemblies/deploying-a-rag-stack-in-a-data-science-project.adoc[leveloffset=+1]
22+
include::assemblies/deploying-a-rag-stack-in-a-data-science-project.adoc[leveloffset=+1]
23+
include::modules/auth-on-llama-stack.adoc[leveloffset=+1]

0 commit comments

Comments
 (0)