RHAIENG-1134: Configuring LLS to use OAuth

kelbrown20 · kelbrown20 · commit 0edff6d100c0 · 2025-11-04T11:04:23.000-05:00
diff --git a/modules/auth-on-llama-stack.adoc b/modules/auth-on-llama-stack.adoc
@@ -0,0 +1,326 @@
+:_module-type: PROCEDURE
+
+[id="auth-on-llama-stack_{context}"]
+= Configuring Llama Stack with OAuth Authentication 
+
+You can configure Llama Stack to enable Role-Based Access Control (RBAC) for model access using OAuth authentication on {productname-short} using KeyCloak The following example shows how to configure Llama Stack so that a vLLM model can be access by all authenticated users, while an OpenAI model is restricted to specific users.
+
+Before starting, you must already have KeyCloak set up wth the following parameters:
+
+.Prerequisites
+
+* You have installed {openshift-platform} 4.19 or newer.
+* You have logged in to {productname-long}.
+* You have cluster administrator privileges for your OpenShift cluster.
+* You have installed the {openshift-cli} as described in the appropriate documentation for your cluster:
+ifdef::upstream,self-managed[]
+** link:https://docs.redhat.com/en/documentation/openshift_container_platform/{ocp-latest-version}/html/cli_tools/openshift-cli-oc#installing-openshift-cli[Installing the OpenShift CLI^] for OpenShift Container Platform  
+** link:https://docs.redhat.com/en/documentation/red_hat_openshift_service_on_aws/{rosa-latest-version}/html/cli_tools/openshift-cli-oc#installing-openshift-cli[Installing the OpenShift CLI^] for {rosa-productname}
+endif::[]
+ifdef::cloud-service[]
+** link:https://docs.redhat.com/en/documentation/openshift_dedicated/{osd-latest-version}/html/cli_tools/openshift-cli-oc#installing-openshift-cli[Installing the OpenShift CLI^] for OpenShift Dedicated  
+** link:https://docs.redhat.com/en/documentation/red_hat_openshift_service_on_aws_classic_architecture/{rosa-classic-latest-version}/html/cli_tools/openshift-cli-oc#installing-openshift-cli[Installing the OpenShift CLI^] for {rosa-classic-productname}
+endif::[]
+
+.Procedure
+
+. To configure Llama Stack to use Role-Based Access Control (RBAC) to access models, you need to view and verify the OAuth provider token structure.
+
+.. Generate a KeyCloak test token to view the structure with the following command: 
++
+[source,terminal]
+----
+$ curl -d client_id=llamastack -d client_secret=YOUR_CLIENT_SECRET -d username=user1 -d password=user-password -d grant_type=password ${TOKEN_ENDPOINT} | jq -r .access_token > test.token
+---- 
+
+.. View the token claims by running the following command: 
++
+[source,terminal]
+----
+$ cat test.token | cut -d . -f 2 | base64 -d 2>/dev/null | jq .
+----
++
+.Example token structure from KeyCloak
+[source,terminal]
+----
+$ {
+  "iss": "http://keycloak-host/realms/testrealm",
+  "aud": "account",
+  "sub": "761cdc99-80e5-4506-9b9e-26a67a8566f7",
+  "preferred_username": "user1",
+  "llamastack_roles": [
+    "inference_max",
+  ],
+}
+----
+
+. You need to then create a `run.yaml` file that defines the necessary configurations for OAuth.
+
+.. Define a configuration with two inference providers and OAuth authentication with the following `run.yaml` example: 
++
+[source,yaml]
+----
+version: 2
+image_name: rh
+apis:
+  - inference
+  - agents
+  - safety
+  - telemetry
+  - tool_runtime
+  - vector_io
+providers:
+  inference:
+    - provider_id: vllm-inference
+      provider_type: remote::vllm
+      config:
+        url: ${env.VLLM_URL:=http://localhost:8000/v1}
+        max_tokens: ${env.VLLM_MAX_TOKENS:=4096}
+        api_token: ${env.VLLM_API_TOKEN:=fake}
+        tls_verify: ${env.VLLM_TLS_VERIFY:=true}
+    - provider_id: openai
+      provider_type: remote::openai
+      config:
+        api_key: ${env.OPENAI_API_KEY:=}
+        base_url: ${env.OPENAI_BASE_URL:=https://api.openai.com/v1}
+  telemetry:
+  - provider_id: meta-reference
+    provider_type: inline::meta-reference
+    config:
+      service_name: "${env.OTEL_SERVICE_NAME:=​}"
+      sinks: ${env.TELEMETRY_SINKS:=console,sqlite}
+      sqlite_db_path: /opt/app-root/src/.llama/distributions/rh/trace_store.db
+      otel_exporter_otlp_endpoint: ${env.OTEL_EXPORTER_OTLP_ENDPOINT:=}
+  agents:
+  - provider_id: meta-reference
+    provider_type: inline::meta-reference
+    config:
+      persistence_store:
+        type: sqlite
+        namespace: null
+        db_path: /opt/app-root/src/.llama/distributions/rh/agents_store.db
+      responses_store:
+        type: sqlite
+        db_path: /opt/app-root/src/.llama/distributions/rh/responses_store.db
+models:
+  - model_id: llama-3.2
+    provider_id: vllm-inference
+    model_type: llm
+    metadata: {}
+
+  - model_id: gpt-4o-mini
+    provider_id: openai
+    model_type: llm
+    metadata: {}
+
+server:
+  port: 8321
+  auth:
+    provider_config:
+      type: "oauth2_token"
+      jwks:
+        uri: "https://<keycloak-host>/realms/<your-keycloak-realm>/protocol/openid-connect/certs" <1>
+        key_recheck_period: 3600
+      issuer: "https://<keycloak-host>/realms/<your-keycloak-realm>" <1>
+      audience: "account"
+      verify_tls: true
+      claims_mapping:
+        llamastack_roles: "roles" <2> 
+    access_policy:
+      - permit: <3>
+          actions: [read]
+          resource: model::vllm-inference/llama-3.2
+        description: Allow all authenticated users to access Llama 3.2 model
+      - permit: <4> 
+          actions: [read]
+          resource: model::openai/gpt-4o-mini
+        when: user with inference_max in roles
+        description: Allow only users with inference_max role to access OpenAI models
+----
++
+<1> Specify your KeyCloak host and Realm in the URL. 
+<2> Maps the `llamastack_roles` path from the token to the `roles` field. 
+<3> Policy 1: Allow all authenticated users to access vLLM models.
+<4> Policy 2: Restrict OpenAI models to users with the `inference_max` role.
+
+. Create a ConfigMap that uses the `run.yaml` configuration by running the following command: 
++
+[source,terminal]
+----
+$ oc create configmap llamastack-custom-config --from-file=run.yaml=run.yaml -n redhat-ods-operator
+----
+
+. Create a `llamastack-distribution.yaml` files with the following parameters:
++
+[source,yaml]
+----
+apiVersion: llamastack.io/v1alpha1
+kind: LlamaStackDistribution
+metadata:
+  name: llamastack-distribution
+  namespace: redhat-ods-operator
+spec:
+  replicas: 1
+  server:
+    distribution:
+      name: rh-dev
+    containerSpec:
+      name: llama-stack
+      port: 8321
+      env: 
+        # vLLM Provider Configuration
+        - name: VLLM_URL
+          value: "http://your-vllm-service:8000/v1"
+        - name: VLLM_API_TOKEN
+          value: "your-vllm-token"
+        - name: VLLM_TLS_VERIFY
+          value: "false"
+        # OpenAI Provider Configuration
+        - name: OPENAI_API_KEY
+          value: "your-openai-api-key"
+        - name: OPENAI_BASE_URL
+          value: "https://api.openai.com/v1"
+    userConfig:
+      configMapName: llamastack-custom-config
+      configMapNamespace: redhat-ods-operator
+----
+
+. To apply the distribution, run the following command: 
++
+[source,terminal]
+----
+$ oc apply -f llamastack-distribution.yaml
+----
+
+. Wait for the distribution to be ready by running the following command: 
++
+[source,terminal]
+----
+oc wait --for=jsonpath='{.status.phase}'=Ready llamastackdistribution/llamastack-distribution -n redhat-ods-operator --timeout=300s
+----
+
+. Generate the OAuth tokens for each user account to authenticate API requests. 
+
+.. To request a basic access token and add it to a `user1.token` file, run the following command
++
+[source,terminal]
+----
+$ curl -d client_id=llamastack \
+  -d client_secret=YOUR_CLIENT_SECRET \
+  -d username=user1 \
+  -d password=user1-password \
+  -d grant_type=password \
+  https://YOUR_KEYCLOAK_HOST/realms/YOUR_REALM/protocol/openid-connect/token \
+  | jq -r .access_token > user1.token
+----
+
+.. To request full access token and add it to a `user2.token` file, run the following command
++
+[source,terminal]
+----
+$ curl -d client_id=llamastack \
+  -d client_secret=YOUR_CLIENT_SECRET \
+  -d username=user2 \
+  -d password=user2-password \
+  -d grant_type=password \
+  https://YOUR_KEYCLOAK_HOST/realms/YOUR_REALM/protocol/openid-connect/token \
+  | jq -r .access_token > user2.token
+----
+
+.. You can verify the credentials by running the following command: 
++
+[source,terminal]
+----
+$ cat user2.token | cut -d . -f 2 | base64 -d 2>/dev/null | jq .
+----
++
+.Example output 
+[source,terminal]
+----
+{
+  "iss": "https://keycloak-host/realms/testrealm",
+  "aud": "account",
+  "exp": 1760553504,
+  "preferred_username": "user2",
+  "llamastack_roles": ["inference_max"]
+}
+----
+
+.Verification 
+
+* Testing basic access to models
+
+. Load the token with the following command: 
++
+[source,terminal]
+----
+USER1_TOKEN=$(cat user1.token)
+----
+
+. Access the vLLM model by running the following command: 
++
+[source,terminal]
+----
+curl -X POST ${LLAMASTACK_URL}/v1/openai/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: Bearer ${USER1_TOKEN}" -d '{"model": "vllm-inference/llama-3.2", "messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 50}'
+----
+
+. If you attempt to access the OpenAI models with these permissions, you will see an error due to access restrictions;
++
+[source, terminal]
+----
+curl -X POST ${LLAMASTACK_URL}/v1/openai/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: Bearer ${USER1_TOKEN}" -d '{"model": "openai/gpt-4o-mini", "messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 50}'
+----
+
+
+* Testing full authorization to models
+
+.  Load the token with the following command: 
++
+[source,terminal]
+----
+$ USER2_TOKEN=$(cat user2.token)
+----
+
+. Access the vLLM model by running the following command: 
++
+[source,terminal]
+----
+$ curl -X POST ${LLAMASTACK_URL}/v1/openai/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer ${USER2_TOKEN}" \
+  -d '{
+    "model": "vllm-inference/llama-3.2",
+    "messages": [{"role": "user", "content": "Hello!"}],
+    "max_tokens": 50
+  }'
+----
+
+. Access the OpenAI models by running the following command: 
++
+[source,terminal]
+----
+$ curl -X POST ${LLAMASTACK_URL}/v1/openai/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer ${USER2_TOKEN}" \
+  -d '{
+    "model": "openai/gpt-4o-mini",
+    "messages": [{"role": "user", "content": "Hello!"}],
+    "max_tokens": 50
+  }'
+----
+
+* Testing without any authorization: 
++
+. Attempt to access the OpenAI or vLLM models: 
++
+[source,terminal]
+----
+$ LLAMASTACK_URL="http://localhost:8321"  
+$ LLAMASTACK_URL=https://llamastack-distribution-redhat-ods-operator.apps.rosa.derekh-cluster.qrj7.p3.openshiftapps.com
+----
+.Example output 
++
+[source,terminal]
+----
+$ HTTP Status: 401
+----
diff --git a/working-with-llama-stack.adoc b/working-with-llama-stack.adoc
@@ -19,4 +19,5 @@ include::modules/overview-of-llama-stack.adoc[leveloffset=+1]
 include::modules/openai-compatibility-for-rag-apis-in-llama-stack.adoc[leveloffset=+2]
 include::modules/openai-compatible-apis-in-llama-stack.adoc[leveloffset=+2]
 include::modules/activating-the-llama-stack-operator.adoc[leveloffset=+1]
-include::assemblies/deploying-a-rag-stack-in-a-data-science-project.adoc[leveloffset=+1]
+include::assemblies/deploying-a-rag-stack-in-a-data-science-project.adoc[leveloffset=+1]
+include::modules/auth-on-llama-stack.adoc[leveloffset=+1]