|
| 1 | +:_module-type: PROCEDURE |
| 2 | + |
| 3 | +[id="auth-on-llama-stack_{context}"] |
| 4 | += Configuring Llama Stack with OAuth Authentication |
| 5 | + |
| 6 | +You can configure Llama Stack to enable Role-Based Access Control (RBAC) for model access using OAuth authentication on {productname-short} using KeyCloak The following example shows how to configure Llama Stack so that a vLLM model can be access by all authenticated users, while an OpenAI model is restricted to specific users. |
| 7 | + |
| 8 | +Before starting, you must already have KeyCloak set up wth the following parameters: |
| 9 | + |
| 10 | +.Prerequisites |
| 11 | + |
| 12 | +* You have installed {openshift-platform} 4.19 or newer. |
| 13 | +* You have logged in to {productname-long}. |
| 14 | +* You have cluster administrator privileges for your OpenShift cluster. |
| 15 | +* You have installed the {openshift-cli} as described in the appropriate documentation for your cluster: |
| 16 | +ifdef::upstream,self-managed[] |
| 17 | +** link:https://docs.redhat.com/en/documentation/openshift_container_platform/{ocp-latest-version}/html/cli_tools/openshift-cli-oc#installing-openshift-cli[Installing the OpenShift CLI^] for OpenShift Container Platform |
| 18 | +** link:https://docs.redhat.com/en/documentation/red_hat_openshift_service_on_aws/{rosa-latest-version}/html/cli_tools/openshift-cli-oc#installing-openshift-cli[Installing the OpenShift CLI^] for {rosa-productname} |
| 19 | +endif::[] |
| 20 | +ifdef::cloud-service[] |
| 21 | +** link:https://docs.redhat.com/en/documentation/openshift_dedicated/{osd-latest-version}/html/cli_tools/openshift-cli-oc#installing-openshift-cli[Installing the OpenShift CLI^] for OpenShift Dedicated |
| 22 | +** link:https://docs.redhat.com/en/documentation/red_hat_openshift_service_on_aws_classic_architecture/{rosa-classic-latest-version}/html/cli_tools/openshift-cli-oc#installing-openshift-cli[Installing the OpenShift CLI^] for {rosa-classic-productname} |
| 23 | +endif::[] |
| 24 | + |
| 25 | +.Procedure |
| 26 | + |
| 27 | +. To configure Llama Stack to use Role-Based Access Control (RBAC) to access models, you need to view and verify the OAuth provider token structure. |
| 28 | + |
| 29 | +.. Generate a KeyCloak test token to view the structure with the following command: |
| 30 | ++ |
| 31 | +[source,terminal] |
| 32 | +---- |
| 33 | +$ curl -d client_id=llamastack -d client_secret=YOUR_CLIENT_SECRET -d username=user1 -d password=user-password -d grant_type=password ${TOKEN_ENDPOINT} | jq -r .access_token > test.token |
| 34 | +---- |
| 35 | + |
| 36 | +.. View the token claims by running the following command: |
| 37 | ++ |
| 38 | +[source,terminal] |
| 39 | +---- |
| 40 | +$ cat test.token | cut -d . -f 2 | base64 -d 2>/dev/null | jq . |
| 41 | +---- |
| 42 | ++ |
| 43 | +.Example token structure from KeyCloak |
| 44 | +[source,terminal] |
| 45 | +---- |
| 46 | +$ { |
| 47 | + "iss": "http://keycloak-host/realms/testrealm", |
| 48 | + "aud": "account", |
| 49 | + "sub": "761cdc99-80e5-4506-9b9e-26a67a8566f7", |
| 50 | + "preferred_username": "user1", |
| 51 | + "llamastack_roles": [ |
| 52 | + "inference_max", |
| 53 | + ], |
| 54 | +} |
| 55 | +---- |
| 56 | + |
| 57 | +. You need to then create a `run.yaml` file that defines the necessary configurations for OAuth. |
| 58 | + |
| 59 | +.. Define a configuration with two inference providers and OAuth authentication with the following `run.yaml` example: |
| 60 | ++ |
| 61 | +[source,yaml] |
| 62 | +---- |
| 63 | +version: 2 |
| 64 | +image_name: rh |
| 65 | +apis: |
| 66 | + - inference |
| 67 | + - agents |
| 68 | + - safety |
| 69 | + - telemetry |
| 70 | + - tool_runtime |
| 71 | + - vector_io |
| 72 | +providers: |
| 73 | + inference: |
| 74 | + - provider_id: vllm-inference |
| 75 | + provider_type: remote::vllm |
| 76 | + config: |
| 77 | + url: ${env.VLLM_URL:=http://localhost:8000/v1} |
| 78 | + max_tokens: ${env.VLLM_MAX_TOKENS:=4096} |
| 79 | + api_token: ${env.VLLM_API_TOKEN:=fake} |
| 80 | + tls_verify: ${env.VLLM_TLS_VERIFY:=true} |
| 81 | + - provider_id: openai |
| 82 | + provider_type: remote::openai |
| 83 | + config: |
| 84 | + api_key: ${env.OPENAI_API_KEY:=} |
| 85 | + base_url: ${env.OPENAI_BASE_URL:=https://api.openai.com/v1} |
| 86 | + telemetry: |
| 87 | + - provider_id: meta-reference |
| 88 | + provider_type: inline::meta-reference |
| 89 | + config: |
| 90 | + service_name: "${env.OTEL_SERVICE_NAME:=}" |
| 91 | + sinks: ${env.TELEMETRY_SINKS:=console,sqlite} |
| 92 | + sqlite_db_path: /opt/app-root/src/.llama/distributions/rh/trace_store.db |
| 93 | + otel_exporter_otlp_endpoint: ${env.OTEL_EXPORTER_OTLP_ENDPOINT:=} |
| 94 | + agents: |
| 95 | + - provider_id: meta-reference |
| 96 | + provider_type: inline::meta-reference |
| 97 | + config: |
| 98 | + persistence_store: |
| 99 | + type: sqlite |
| 100 | + namespace: null |
| 101 | + db_path: /opt/app-root/src/.llama/distributions/rh/agents_store.db |
| 102 | + responses_store: |
| 103 | + type: sqlite |
| 104 | + db_path: /opt/app-root/src/.llama/distributions/rh/responses_store.db |
| 105 | +models: |
| 106 | + - model_id: llama-3.2 |
| 107 | + provider_id: vllm-inference |
| 108 | + model_type: llm |
| 109 | + metadata: {} |
| 110 | +
|
| 111 | + - model_id: gpt-4o-mini |
| 112 | + provider_id: openai |
| 113 | + model_type: llm |
| 114 | + metadata: {} |
| 115 | +
|
| 116 | +server: |
| 117 | + port: 8321 |
| 118 | + auth: |
| 119 | + provider_config: |
| 120 | + type: "oauth2_token" |
| 121 | + jwks: |
| 122 | + uri: "https://<keycloak-host>/realms/<your-keycloak-realm>/protocol/openid-connect/certs" <1> |
| 123 | + key_recheck_period: 3600 |
| 124 | + issuer: "https://<keycloak-host>/realms/<your-keycloak-realm>" <1> |
| 125 | + audience: "account" |
| 126 | + verify_tls: true |
| 127 | + claims_mapping: |
| 128 | + llamastack_roles: "roles" <2> |
| 129 | + access_policy: |
| 130 | + - permit: <3> |
| 131 | + actions: [read] |
| 132 | + resource: model::vllm-inference/llama-3.2 |
| 133 | + description: Allow all authenticated users to access Llama 3.2 model |
| 134 | + - permit: <4> |
| 135 | + actions: [read] |
| 136 | + resource: model::openai/gpt-4o-mini |
| 137 | + when: user with inference_max in roles |
| 138 | + description: Allow only users with inference_max role to access OpenAI models |
| 139 | +---- |
| 140 | ++ |
| 141 | +<1> Specify your KeyCloak host and Realm in the URL. |
| 142 | +<2> Maps the `llamastack_roles` path from the token to the `roles` field. |
| 143 | +<3> Policy 1: Allow all authenticated users to access vLLM models. |
| 144 | +<4> Policy 2: Restrict OpenAI models to users with the `inference_max` role. |
| 145 | + |
| 146 | +. Create a ConfigMap that uses the `run.yaml` configuration by running the following command: |
| 147 | ++ |
| 148 | +[source,terminal] |
| 149 | +---- |
| 150 | +$ oc create configmap llamastack-custom-config --from-file=run.yaml=run.yaml -n redhat-ods-operator |
| 151 | +---- |
| 152 | + |
| 153 | +. Create a `llamastack-distribution.yaml` files with the following parameters: |
| 154 | ++ |
| 155 | +[source,yaml] |
| 156 | +---- |
| 157 | +apiVersion: llamastack.io/v1alpha1 |
| 158 | +kind: LlamaStackDistribution |
| 159 | +metadata: |
| 160 | + name: llamastack-distribution |
| 161 | + namespace: redhat-ods-operator |
| 162 | +spec: |
| 163 | + replicas: 1 |
| 164 | + server: |
| 165 | + distribution: |
| 166 | + name: rh-dev |
| 167 | + containerSpec: |
| 168 | + name: llama-stack |
| 169 | + port: 8321 |
| 170 | + env: |
| 171 | + # vLLM Provider Configuration |
| 172 | + - name: VLLM_URL |
| 173 | + value: "http://your-vllm-service:8000/v1" |
| 174 | + - name: VLLM_API_TOKEN |
| 175 | + value: "your-vllm-token" |
| 176 | + - name: VLLM_TLS_VERIFY |
| 177 | + value: "false" |
| 178 | + # OpenAI Provider Configuration |
| 179 | + - name: OPENAI_API_KEY |
| 180 | + value: "your-openai-api-key" |
| 181 | + - name: OPENAI_BASE_URL |
| 182 | + value: "https://api.openai.com/v1" |
| 183 | + userConfig: |
| 184 | + configMapName: llamastack-custom-config |
| 185 | + configMapNamespace: redhat-ods-operator |
| 186 | +---- |
| 187 | + |
| 188 | +. To apply the distribution, run the following command: |
| 189 | ++ |
| 190 | +[source,terminal] |
| 191 | +---- |
| 192 | +$ oc apply -f llamastack-distribution.yaml |
| 193 | +---- |
| 194 | + |
| 195 | +. Wait for the distribution to be ready by running the following command: |
| 196 | ++ |
| 197 | +[source,terminal] |
| 198 | +---- |
| 199 | +oc wait --for=jsonpath='{.status.phase}'=Ready llamastackdistribution/llamastack-distribution -n redhat-ods-operator --timeout=300s |
| 200 | +---- |
| 201 | + |
| 202 | +. Generate the OAuth tokens for each user account to authenticate API requests. |
| 203 | + |
| 204 | +.. To request a basic access token and add it to a `user1.token` file, run the following command |
| 205 | ++ |
| 206 | +[source,terminal] |
| 207 | +---- |
| 208 | +$ curl -d client_id=llamastack \ |
| 209 | + -d client_secret=YOUR_CLIENT_SECRET \ |
| 210 | + -d username=user1 \ |
| 211 | + -d password=user1-password \ |
| 212 | + -d grant_type=password \ |
| 213 | + https://YOUR_KEYCLOAK_HOST/realms/YOUR_REALM/protocol/openid-connect/token \ |
| 214 | + | jq -r .access_token > user1.token |
| 215 | +---- |
| 216 | + |
| 217 | +.. To request full access token and add it to a `user2.token` file, run the following command |
| 218 | ++ |
| 219 | +[source,terminal] |
| 220 | +---- |
| 221 | +$ curl -d client_id=llamastack \ |
| 222 | + -d client_secret=YOUR_CLIENT_SECRET \ |
| 223 | + -d username=user2 \ |
| 224 | + -d password=user2-password \ |
| 225 | + -d grant_type=password \ |
| 226 | + https://YOUR_KEYCLOAK_HOST/realms/YOUR_REALM/protocol/openid-connect/token \ |
| 227 | + | jq -r .access_token > user2.token |
| 228 | +---- |
| 229 | + |
| 230 | +.. You can verify the credentials by running the following command: |
| 231 | ++ |
| 232 | +[source,terminal] |
| 233 | +---- |
| 234 | +$ cat user2.token | cut -d . -f 2 | base64 -d 2>/dev/null | jq . |
| 235 | +---- |
| 236 | ++ |
| 237 | +.Example output |
| 238 | +[source,terminal] |
| 239 | +---- |
| 240 | +{ |
| 241 | + "iss": "https://keycloak-host/realms/testrealm", |
| 242 | + "aud": "account", |
| 243 | + "exp": 1760553504, |
| 244 | + "preferred_username": "user2", |
| 245 | + "llamastack_roles": ["inference_max"] |
| 246 | +} |
| 247 | +---- |
| 248 | + |
| 249 | +.Verification |
| 250 | + |
| 251 | +* Testing basic access to models |
| 252 | + |
| 253 | +. Load the token with the following command: |
| 254 | ++ |
| 255 | +[source,terminal] |
| 256 | +---- |
| 257 | +USER1_TOKEN=$(cat user1.token) |
| 258 | +---- |
| 259 | + |
| 260 | +. Access the vLLM model by running the following command: |
| 261 | ++ |
| 262 | +[source,terminal] |
| 263 | +---- |
| 264 | +curl -X POST ${LLAMASTACK_URL}/v1/openai/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: Bearer ${USER1_TOKEN}" -d '{"model": "vllm-inference/llama-3.2", "messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 50}' |
| 265 | +---- |
| 266 | + |
| 267 | +. If you attempt to access the OpenAI models with these permissions, you will see an error due to access restrictions; |
| 268 | ++ |
| 269 | +[source, terminal] |
| 270 | +---- |
| 271 | +curl -X POST ${LLAMASTACK_URL}/v1/openai/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: Bearer ${USER1_TOKEN}" -d '{"model": "openai/gpt-4o-mini", "messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 50}' |
| 272 | +---- |
| 273 | + |
| 274 | + |
| 275 | +* Testing full authorization to models |
| 276 | + |
| 277 | +. Load the token with the following command: |
| 278 | ++ |
| 279 | +[source,terminal] |
| 280 | +---- |
| 281 | +$ USER2_TOKEN=$(cat user2.token) |
| 282 | +---- |
| 283 | + |
| 284 | +. Access the vLLM model by running the following command: |
| 285 | ++ |
| 286 | +[source,terminal] |
| 287 | +---- |
| 288 | +$ curl -X POST ${LLAMASTACK_URL}/v1/openai/v1/chat/completions \ |
| 289 | + -H "Content-Type: application/json" \ |
| 290 | + -H "Authorization: Bearer ${USER2_TOKEN}" \ |
| 291 | + -d '{ |
| 292 | + "model": "vllm-inference/llama-3.2", |
| 293 | + "messages": [{"role": "user", "content": "Hello!"}], |
| 294 | + "max_tokens": 50 |
| 295 | + }' |
| 296 | +---- |
| 297 | + |
| 298 | +. Access the OpenAI models by running the following command: |
| 299 | ++ |
| 300 | +[source,terminal] |
| 301 | +---- |
| 302 | +$ curl -X POST ${LLAMASTACK_URL}/v1/openai/v1/chat/completions \ |
| 303 | + -H "Content-Type: application/json" \ |
| 304 | + -H "Authorization: Bearer ${USER2_TOKEN}" \ |
| 305 | + -d '{ |
| 306 | + "model": "openai/gpt-4o-mini", |
| 307 | + "messages": [{"role": "user", "content": "Hello!"}], |
| 308 | + "max_tokens": 50 |
| 309 | + }' |
| 310 | +---- |
| 311 | + |
| 312 | +* Testing without any authorization: |
| 313 | ++ |
| 314 | +. Attempt to access the OpenAI or vLLM models: |
| 315 | ++ |
| 316 | +[source,terminal] |
| 317 | +---- |
| 318 | +$ LLAMASTACK_URL="http://localhost:8321" |
| 319 | +$ LLAMASTACK_URL=https://llamastack-distribution-redhat-ods-operator.apps.rosa.derekh-cluster.qrj7.p3.openshiftapps.com |
| 320 | +---- |
| 321 | +.Example output |
| 322 | ++ |
| 323 | +[source,terminal] |
| 324 | +---- |
| 325 | +$ HTTP Status: 401 |
| 326 | +---- |
0 commit comments