|
| 1 | +:_module-type: PROCEDURE |
| 2 | + |
| 3 | +[id="auth-on-llama-stack_{context}"] |
| 4 | += Configuring Llama Stack with OAuth Authentication |
| 5 | + |
| 6 | +You can configure Llama Stack to enable Role-Based Access Control (RBAC) for model access using OAuth authentication on{productname-short}. The following example shows how to configure Llama Stack so that a vLLM model can be access by all authenticate users, while an OpenAI model is restricted to specific users. |
| 7 | + |
| 8 | +.Prerequisites |
| 9 | + |
| 10 | +* You have installed {openshift-platform} 4.17 or newer. |
| 11 | +* You have logged in to {productname-long}. |
| 12 | +* You have cluster administrator privileges for your OpenShift cluster. |
| 13 | +* You have installed the {openshift-cli} as described in the appropriate documentation for your cluster: |
| 14 | +ifdef::upstream,self-managed[] |
| 15 | +** link:https://docs.redhat.com/en/documentation/openshift_container_platform/{ocp-latest-version}/html/cli_tools/openshift-cli-oc#installing-openshift-cli[Installing the OpenShift CLI^] for OpenShift Container Platform |
| 16 | +** link:https://docs.redhat.com/en/documentation/red_hat_openshift_service_on_aws/{rosa-latest-version}/html/cli_tools/openshift-cli-oc#installing-openshift-cli[Installing the OpenShift CLI^] for {rosa-productname} |
| 17 | +endif::[] |
| 18 | +ifdef::cloud-service[] |
| 19 | +** link:https://docs.redhat.com/en/documentation/openshift_dedicated/{osd-latest-version}/html/cli_tools/openshift-cli-oc#installing-openshift-cli[Installing the OpenShift CLI^] for OpenShift Dedicated |
| 20 | +** link:https://docs.redhat.com/en/documentation/red_hat_openshift_service_on_aws_classic_architecture/{rosa-classic-latest-version}/html/cli_tools/openshift-cli-oc#installing-openshift-cli[Installing the OpenShift CLI^] for {rosa-classic-productname} |
| 21 | +endif::[] |
| 22 | + |
| 23 | +.Procedure |
| 24 | + |
| 25 | +. To configure Llama Stack to use Role-Based Access Control (RBAC) to model access, you first need to create Service account. |
| 26 | + |
| 27 | +.. Define the service account used for OAuth authentication, where each account corresponds to a specific application with its own access permissions. To configure this, create a `llamastack-auth.yaml` file. |
| 28 | ++ |
| 29 | +[source,yaml] |
| 30 | +---- |
| 31 | +apiVersion: v1 |
| 32 | +kind: ServiceAccount |
| 33 | +metadata: |
| 34 | + name: llamastack-vllm-inference <1> |
| 35 | + namespace: redhat-ods-operator |
| 36 | +--- |
| 37 | +apiVersion: v1 |
| 38 | +kind: ServiceAccount |
| 39 | +metadata: |
| 40 | + name: llamastack-openai-inference <2> |
| 41 | + namespace: redhat-ods-operator |
| 42 | +---- |
| 43 | +<1> Allows access to only vLLM models. |
| 44 | +<2> Allows access to vLLM and OpenAI models. |
| 45 | + |
| 46 | +.. Apply the service account by running the following command: |
| 47 | ++ |
| 48 | +[source,terminal] |
| 49 | +---- |
| 50 | +$ oc apply -f llamastack-auth.yaml |
| 51 | +---- |
| 52 | + |
| 53 | +. You then need to retrieve the OpenID Connect Configuration. |
| 54 | + |
| 55 | +.. {openshift-platform} provides a built-in OIDC provider that you can access by running the following command: |
| 56 | ++ |
| 57 | +[source,terminal] |
| 58 | +---- |
| 59 | +$ OIDC_CONFIG=$(oc get --raw /.well-known/openid-configuration) |
| 60 | +---- |
| 61 | + |
| 62 | +.. Extract issuer and JWKS URI by running the following commands: |
| 63 | ++ |
| 64 | +[source,terminal] |
| 65 | +---- |
| 66 | +$ ISSUER=$(echo "$OIDC_CONFIG" | jq -r .issuer) |
| 67 | +JWKS_URI="${ISSUER}/keys.json" |
| 68 | +
|
| 69 | +echo "OIDC Issuer: $ISSUER" |
| 70 | +echo "JWKS URI: $JWKS_URI" |
| 71 | +---- |
| 72 | ++ |
| 73 | +.Example output |
| 74 | +[source,terminal] |
| 75 | +---- |
| 76 | +OIDC Issuer: https://oidc.com/2...a |
| 77 | +JWKS URI: https://oidc.com/2...a/keys.json |
| 78 | +---- |
| 79 | ++ |
| 80 | +Make a note of these values as they are necessary for the LlamaStack Distribution configuration |
| 81 | + |
| 82 | +. You need to then create a `run.yaml` file that defines the necessary configurations for OAuth. |
| 83 | + |
| 84 | +.. Define a configuration with two inference providers and OAuth authentication with the following `run.yaml` example: |
| 85 | ++ |
| 86 | +[source,yaml] |
| 87 | +---- |
| 88 | +version: 2 |
| 89 | +image_name: rh |
| 90 | +apis: |
| 91 | + - inference |
| 92 | + - agents |
| 93 | + - safety |
| 94 | + - telemetry |
| 95 | + - tool_runtime |
| 96 | + - vector_io |
| 97 | +providers: |
| 98 | + inference: |
| 99 | + - provider_id: vllm-inference |
| 100 | + provider_type: remote::vllm |
| 101 | + config: |
| 102 | + url: ${env.VLLM_URL:=http://localhost:8000/v1} |
| 103 | + max_tokens: ${env.VLLM_MAX_TOKENS:=4096} |
| 104 | + api_token: ${env.VLLM_API_TOKEN:=fake} |
| 105 | + tls_verify: ${env.VLLM_TLS_VERIFY:=true} |
| 106 | + - provider_id: openai |
| 107 | + provider_type: remote::openai |
| 108 | + config: |
| 109 | + api_key: ${env.OPENAI_API_KEY:=} |
| 110 | + base_url: ${env.OPENAI_BASE_URL:=https://api.openai.com/v1} |
| 111 | + telemetry: |
| 112 | + - provider_id: meta-reference |
| 113 | + provider_type: inline::meta-reference |
| 114 | + config: |
| 115 | + service_name: "${env.OTEL_SERVICE_NAME:=\u200B}" |
| 116 | + sinks: ${env.TELEMETRY_SINKS:=console,sqlite} |
| 117 | + sqlite_db_path: /opt/app-root/src/.llama/distributions/rh/trace_store.db |
| 118 | + otel_exporter_otlp_endpoint: ${env.OTEL_EXPORTER_OTLP_ENDPOINT:=} |
| 119 | + agents: |
| 120 | + - provider_id: meta-reference |
| 121 | + provider_type: inline::meta-reference |
| 122 | + config: |
| 123 | + persistence_store: |
| 124 | + type: sqlite |
| 125 | + namespace: null |
| 126 | + db_path: /opt/app-root/src/.llama/distributions/rh/agents_store.db |
| 127 | + responses_store: |
| 128 | + type: sqlite |
| 129 | + db_path: /opt/app-root/src/.llama/distributions/rh/responses_store.db |
| 130 | +models: |
| 131 | + - model_id: llama-3.2 |
| 132 | + provider_id: vllm-inference |
| 133 | + model_type: llm |
| 134 | + metadata: {} |
| 135 | +
|
| 136 | + - model_id: gpt-4o-mini |
| 137 | + provider_id: openai |
| 138 | + model_type: llm |
| 139 | + metadata: {} |
| 140 | +
|
| 141 | +server: |
| 142 | + port: 8321 |
| 143 | + auth: |
| 144 | + provider_config: |
| 145 | + type: "oauth2_token" |
| 146 | + jwks: |
| 147 | + uri: "https://<your-cluster-oidc-url>/keys.json" <1> |
| 148 | + key_recheck_period: 3600 |
| 149 | + issuer: "https://<your-cluster-oidc-url>" |
| 150 | + audience: "https://<your-cluster-oidc-url>" |
| 151 | + verify_tls: false |
| 152 | + claims_mapping: |
| 153 | + sub: "roles" |
| 154 | + access_policy: |
| 155 | + - permit: <2> |
| 156 | + actions: [read] |
| 157 | + resource: model::vllm-inference/llama-3.2 |
| 158 | + description: Allow all authenticated users to access Llama 3.2 model |
| 159 | + - permit: <3> |
| 160 | + actions: [read] |
| 161 | + resource: model::openai/gpt-4o-mini |
| 162 | + when: user with system:serviceaccount:redhat-ods-operator:llamastack-openai-inference in roles |
| 163 | + description: Allow only llamastack-openai-inference to access OpenAI models |
| 164 | +---- |
| 165 | ++ |
| 166 | +<1> Enter your OIDC configuration information. |
| 167 | +<2> Policy 1: Allow all authenticated users to access vLLM models. |
| 168 | +<3> Policy 2: Restrict OpenAI models to specific service account. |
| 169 | + |
| 170 | +. Create a `ConfigMap` with OAuth configuration by running the following command: |
| 171 | ++ |
| 172 | +[source,terminal] |
| 173 | +---- |
| 174 | +$ oc create configmap llamastack-custom-config --from-file=run.yaml=run.yaml -n redhat-ods-operator |
| 175 | +---- |
| 176 | + |
| 177 | +. You can verify that the ConfigMap was created by running the following command: |
| 178 | ++ |
| 179 | +[source,terminal] |
| 180 | +---- |
| 181 | +$ oc get configmap llamastack-custom-config -n redhat-ods-operator |
| 182 | +---- |
| 183 | + |
| 184 | +. You then need to create a `LlamaStackDistrobution` custom resource that uses the OAuth configuration. |
| 185 | + |
| 186 | +.. Create a `llamastack-distrobution.yaml` files with the following parameters: |
| 187 | ++ |
| 188 | +[source,yaml] |
| 189 | +---- |
| 190 | +apiVersion: llamastack.io/v1alpha1 |
| 191 | +kind: LlamaStackDistribution |
| 192 | +metadata: |
| 193 | + name: llamastack-distribution |
| 194 | + namespace: redhat-ods-operator |
| 195 | +spec: |
| 196 | + replicas: 1 |
| 197 | + server: |
| 198 | + distribution: |
| 199 | + name: rh-dev |
| 200 | + containerSpec: |
| 201 | + name: llama-stack |
| 202 | + port: 8321 |
| 203 | + env: |
| 204 | + # vLLM Provider Configuration |
| 205 | + - name: VLLM_URL |
| 206 | + value: "http://your-vllm-service:8000/v1" |
| 207 | + - name: VLLM_API_TOKEN |
| 208 | + value: "your-vllm-token" |
| 209 | + - name: VLLM_TLS_VERIFY |
| 210 | + value: "false" |
| 211 | +
|
| 212 | + # OpenAI Provider Configuration |
| 213 | + - name: OPENAI_API_KEY |
| 214 | + value: "your-openai-api-key" |
| 215 | + - name: OPENAI_BASE_URL |
| 216 | + value: "https://api.openai.com/v1" |
| 217 | +
|
| 218 | + # Reference the ConfigMap with OAuth configuration |
| 219 | + userConfig: |
| 220 | + configMapName: llamastack-custom-config |
| 221 | + configMapNamespace: redhat-ods-operator |
| 222 | +---- |
| 223 | + |
| 224 | +. To apply the distribution, run the following command: |
| 225 | ++ |
| 226 | +[source,terminal] |
| 227 | +---- |
| 228 | +$ oc apply -f llamastack-distribution.yaml |
| 229 | +---- |
| 230 | + |
| 231 | +. Wait for the distribution to be ready by running the following command: |
| 232 | ++ |
| 233 | +[source,terminal] |
| 234 | +---- |
| 235 | +oc wait --for=jsonpath='{.status.phase}'=Ready llamastackdistribution/llamastack-distribution -n redhat-ods-operator --timeout=300s |
| 236 | +---- |
| 237 | + |
| 238 | +. Generate the OAuth tokens for each service account to authenticate API requests. |
| 239 | + |
| 240 | +* Token for vLLM service account: |
| 241 | +.. Generate the token for vLLM service account by running the following command: |
| 242 | ++ |
| 243 | +[source,terminal] |
| 244 | +---- |
| 245 | +$ oc create token llamastack-vllm-inference -n redhat-ods-operator --duration=24h > llamastack-vllm-token.txt |
| 246 | +---- |
| 247 | +.. View the token with the following command: |
| 248 | ++ |
| 249 | +[source,terminal] |
| 250 | +---- |
| 251 | +$ cat llamastack-vllm-token.txt |
| 252 | +---- |
| 253 | + |
| 254 | +* Token for OpenAI service account |
| 255 | +.. Generate the token for OpenAI service account by running the following command: |
| 256 | ++ |
| 257 | +[source,terminal] |
| 258 | +---- |
| 259 | +$ oc create token llamastack-openai-inference -n redhat-ods-operator --duration=24h > llamastack-openai-token.txt |
| 260 | +---- |
| 261 | +.. View the token with the following command: |
| 262 | ++ |
| 263 | +[source,terminal] |
| 264 | +---- |
| 265 | +$ cat llamastack-openai-token.txt |
| 266 | +---- |
| 267 | + |
| 268 | +.Verification |
| 269 | + |
| 270 | +* Testing successful vLLM access: |
| 271 | ++ |
| 272 | +--- |
| 273 | +This will allow you to access only vLLM models. |
| 274 | + |
| 275 | +. Load the vLLM token with the following command: |
| 276 | ++ |
| 277 | +[source,terminal] |
| 278 | +---- |
| 279 | +$ VLLM_TOKEN=$(cat llamastack-vllm-token.txt) |
| 280 | +---- |
| 281 | + |
| 282 | +. You can then access the vLLM model by running the following command: |
| 283 | ++ |
| 284 | +[source,terminal] |
| 285 | +---- |
| 286 | +$ oc exec $POD_NAME -n redhat-ods-operator -- curl -s -X POST http://localhost:8321/v1/openai/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: Bearer ${VLLM_TOKEN}" -d '{"model": "vllm-inference/llama-3.2", "messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 50}' |
| 287 | +---- |
| 288 | +--- |
| 289 | + |
| 290 | +* Testing successful OpenAI access. |
| 291 | ++ |
| 292 | +--- |
| 293 | +This will allow you to access OpenAI and vLLM models |
| 294 | + |
| 295 | +. Load the OpenAI token with the following command: |
| 296 | ++ |
| 297 | +[source,terminal] |
| 298 | +---- |
| 299 | +$ OPENAI_TOKEN=$(cat llamastack-openai-token.txt) |
| 300 | +---- |
| 301 | + |
| 302 | +. Access the vLLM model by running the following command: |
| 303 | ++ |
| 304 | +[source,terminal] |
| 305 | +---- |
| 306 | +$ oc exec $POD_NAME -n redhat-ods-operator -- curl -s -X POST http://localhost:8321/v1/openai/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: Bearer ${OPENAI_TOKEN}" -d '{"model": "vllm-inference/llama-3.2", "messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 50}' |
| 307 | +---- |
| 308 | +. Access the OpenAI models by running the following command: |
| 309 | ++ |
| 310 | +[source,terminal] |
| 311 | +---- |
| 312 | +oc exec $POD_NAME -n redhat-ods-operator -- curl -s -X POST http://localhost:8321/v1/openai/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: Bearer ${OPENAI_TOKEN}" -d '{"model": "openai/gpt-4o-mini", "messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 50}' |
| 313 | +---- |
| 314 | +--- |
| 315 | + |
| 316 | +* Testing without any authorization: |
| 317 | ++ |
| 318 | +--- |
| 319 | +. Attempt to access the the OpenAI or vLLM models: |
| 320 | ++ |
| 321 | +[source,terminal] |
| 322 | +---- |
| 323 | +$ POD_NAME=$(oc get pods -n redhat-ods-operator -l app=llama-stack -o jsonpath='{.items[0].metadata.name}') |
| 324 | +$ oc exec $POD_NAME -n redhat-ods-operator -- curl -s -w "\nHTTP Status: %{http_code}\n" http://localhost:8321/v1/models |
| 325 | +---- |
| 326 | +.Example output |
| 327 | ++ |
| 328 | +[source,terminal] |
| 329 | +---- |
| 330 | +$ HTTP Status: 401 |
| 331 | +---- |
| 332 | +--- |
| 333 | + |
| 334 | +* Testing incorrect authorization: |
| 335 | ++ |
| 336 | +--- |
| 337 | +. Attempt to access an OpenAI model with a vLLM token: |
| 338 | ++ |
| 339 | +[source, terminal] |
| 340 | +---- |
| 341 | +$ oc exec $POD_NAME -n redhat-ods-operator -- curl -s -X POST http://localhost:8321/v1/openai/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: Bearer ${VLLM_TOKEN}" -d '{"model": "openai/gpt-4o-mini", "messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 50}' |
| 342 | +---- |
| 343 | ++ |
| 344 | +.Example output |
| 345 | ++ |
| 346 | +[source,terminal] |
| 347 | +---- |
| 348 | +$ 404 - File (model) not found |
| 349 | +---- |
| 350 | +--- |
0 commit comments