RHAIENG-1134: Configuring LLS to use OAuth

kelbrown20 · kelbrown20 · commit 5773034fe208 · 2025-10-31T13:20:10.000-04:00
diff --git a/modules/auth-on-llama-stack.adoc b/modules/auth-on-llama-stack.adoc
@@ -0,0 +1,350 @@
+:_module-type: PROCEDURE
+
+[id="auth-on-llama-stack_{context}"]
+= Configuring Llama Stack with OAuth Authentication 
+
+You can configure Llama Stack to enable Role-Based Access Control (RBAC) for model access using OAuth authentication on{productname-short}. The following example shows how to configure Llama Stack so that a vLLM model can be access by all authenticate users, while an OpenAI model is restricted to specific users.
+
+.Prerequisites
+
+* You have installed {openshift-platform} 4.17 or newer.
+* You have logged in to {productname-long}.
+* You have cluster administrator privileges for your OpenShift cluster.
+* You have installed the {openshift-cli} as described in the appropriate documentation for your cluster:
+ifdef::upstream,self-managed[]
+** link:https://docs.redhat.com/en/documentation/openshift_container_platform/{ocp-latest-version}/html/cli_tools/openshift-cli-oc#installing-openshift-cli[Installing the OpenShift CLI^] for OpenShift Container Platform  
+** link:https://docs.redhat.com/en/documentation/red_hat_openshift_service_on_aws/{rosa-latest-version}/html/cli_tools/openshift-cli-oc#installing-openshift-cli[Installing the OpenShift CLI^] for {rosa-productname}
+endif::[]
+ifdef::cloud-service[]
+** link:https://docs.redhat.com/en/documentation/openshift_dedicated/{osd-latest-version}/html/cli_tools/openshift-cli-oc#installing-openshift-cli[Installing the OpenShift CLI^] for OpenShift Dedicated  
+** link:https://docs.redhat.com/en/documentation/red_hat_openshift_service_on_aws_classic_architecture/{rosa-classic-latest-version}/html/cli_tools/openshift-cli-oc#installing-openshift-cli[Installing the OpenShift CLI^] for {rosa-classic-productname}
+endif::[]
+
+.Procedure
+
+. To configure Llama Stack to use Role-Based Access Control (RBAC) to model access, you first need to create Service account.
+
+.. Define the service account used for OAuth authentication, where each account corresponds to a specific application with its own access permissions. To configure this, create a `llamastack-auth.yaml` file. 
++
+[source,yaml]
+----
+apiVersion: v1
+kind: ServiceAccount
+metadata:
+  name: llamastack-vllm-inference <1> 
+  namespace: redhat-ods-operator
+---
+apiVersion: v1
+kind: ServiceAccount
+metadata:
+  name: llamastack-openai-inference <2>
+  namespace: redhat-ods-operator
+----
+<1> Allows access to only vLLM models.
+<2> Allows access to vLLM and OpenAI models.
+
+.. Apply the service account by running the following command:
++ 
+[source,terminal]
+----
+$ oc apply -f llamastack-auth.yaml
+----
+
+. You then need to retrieve the OpenID Connect Configuration. 
+
+.. {openshift-platform} provides a built-in OIDC provider that you can access by running the following command:
++
+[source,terminal]
+----
+$ OIDC_CONFIG=$(oc get --raw /.well-known/openid-configuration)
+----
+
+.. Extract issuer and JWKS URI by running the following commands:
++
+[source,terminal]
+----
+$ ISSUER=$(echo "$OIDC_CONFIG" | jq -r .issuer)
+JWKS_URI="${ISSUER}/keys.json"
+
+echo "OIDC Issuer: $ISSUER"
+echo "JWKS URI: $JWKS_URI"
+----
++
+.Example output 
+[source,terminal]
+----
+OIDC Issuer: https://oidc.com/2...a
+JWKS URI: https://oidc.com/2...a/keys.json
+----
++
+Make a note of these values as they are necessary for the LlamaStack Distribution configuration
+
+. You need to then create a `run.yaml` file that defines the necessary configurations for OAuth.
+
+.. Define a configuration with two inference providers and OAuth authentication with the following `run.yaml` example: 
++
+[source,yaml]
+----
+version: 2
+image_name: rh
+apis:
+  - inference
+  - agents
+  - safety
+  - telemetry
+  - tool_runtime
+  - vector_io
+providers:
+  inference:
+    - provider_id: vllm-inference
+      provider_type: remote::vllm
+      config:
+        url: ${env.VLLM_URL:=http://localhost:8000/v1}
+        max_tokens: ${env.VLLM_MAX_TOKENS:=4096}
+        api_token: ${env.VLLM_API_TOKEN:=fake}
+        tls_verify: ${env.VLLM_TLS_VERIFY:=true}
+    - provider_id: openai
+      provider_type: remote::openai
+      config:
+        api_key: ${env.OPENAI_API_KEY:=}
+        base_url: ${env.OPENAI_BASE_URL:=https://api.openai.com/v1}
+  telemetry:
+  - provider_id: meta-reference
+    provider_type: inline::meta-reference
+    config:
+      service_name: "${env.OTEL_SERVICE_NAME:=\u200B}"
+      sinks: ${env.TELEMETRY_SINKS:=console,sqlite}
+      sqlite_db_path: /opt/app-root/src/.llama/distributions/rh/trace_store.db
+      otel_exporter_otlp_endpoint: ${env.OTEL_EXPORTER_OTLP_ENDPOINT:=}
+  agents:
+  - provider_id: meta-reference
+    provider_type: inline::meta-reference
+    config:
+      persistence_store:
+        type: sqlite
+        namespace: null
+        db_path: /opt/app-root/src/.llama/distributions/rh/agents_store.db
+      responses_store:
+        type: sqlite
+        db_path: /opt/app-root/src/.llama/distributions/rh/responses_store.db
+models:
+  - model_id: llama-3.2
+    provider_id: vllm-inference
+    model_type: llm
+    metadata: {}
+
+  - model_id: gpt-4o-mini
+    provider_id: openai
+    model_type: llm
+    metadata: {}
+
+server:
+  port: 8321
+  auth:
+    provider_config:
+      type: "oauth2_token"
+      jwks:
+        uri: "https://<your-cluster-oidc-url>/keys.json" <1>
+        key_recheck_period: 3600
+      issuer: "https://<your-cluster-oidc-url>"
+      audience: "https://<your-cluster-oidc-url>"
+      verify_tls: false
+      claims_mapping:
+        sub: "roles"
+    access_policy: 
+      - permit: <2> 
+          actions: [read]
+          resource: model::vllm-inference/llama-3.2
+        description: Allow all authenticated users to access Llama 3.2 model
+      - permit: <3>
+          actions: [read]
+          resource: model::openai/gpt-4o-mini
+        when: user with system:serviceaccount:redhat-ods-operator:llamastack-openai-inference in roles
+        description: Allow only llamastack-openai-inference to access OpenAI models
+----
++
+<1> Enter your OIDC configuration information. 
+<2> Policy 1: Allow all authenticated users to access vLLM models.
+<3> Policy 2: Restrict OpenAI models to specific service account.
+
+. Create a `ConfigMap` with OAuth configuration by running the following command:
++
+[source,terminal]
+----
+$ oc create configmap llamastack-custom-config --from-file=run.yaml=run.yaml -n redhat-ods-operator
+----
+
+. You can verify that the ConfigMap was created by running the following command: 
++
+[source,terminal]
+----
+$ oc get configmap llamastack-custom-config -n redhat-ods-operator
+----
+
+. You then need to create a `LlamaStackDistrobution` custom resource that uses the OAuth configuration. 
+
+.. Create a `llamastack-distrobution.yaml` files with the following parameters:
++
+[source,yaml]
+----
+apiVersion: llamastack.io/v1alpha1
+kind: LlamaStackDistribution
+metadata:
+  name: llamastack-distribution
+  namespace: redhat-ods-operator
+spec:
+  replicas: 1
+  server:
+    distribution:
+      name: rh-dev
+    containerSpec:
+      name: llama-stack
+      port: 8321
+      env:
+        # vLLM Provider Configuration
+        - name: VLLM_URL
+          value: "http://your-vllm-service:8000/v1"
+        - name: VLLM_API_TOKEN
+          value: "your-vllm-token"
+        - name: VLLM_TLS_VERIFY
+          value: "false"
+
+        # OpenAI Provider Configuration
+        - name: OPENAI_API_KEY
+          value: "your-openai-api-key"
+        - name: OPENAI_BASE_URL
+          value: "https://api.openai.com/v1"
+
+    # Reference the ConfigMap with OAuth configuration
+    userConfig:
+      configMapName: llamastack-custom-config
+      configMapNamespace: redhat-ods-operator
+----
+
+. To apply the distribution, run the following command: 
++
+[source,terminal]
+----
+$ oc apply -f llamastack-distribution.yaml
+----
+
+. Wait for the distribution to be ready by running the following command: 
++
+[source,terminal]
+----
+oc wait --for=jsonpath='{.status.phase}'=Ready llamastackdistribution/llamastack-distribution -n redhat-ods-operator --timeout=300s
+----
+
+. Generate the OAuth tokens for each service account to authenticate API requests. 
+
+* Token for vLLM service account: 
+.. Generate the token for vLLM service account by running the following command: 
++
+[source,terminal]
+----
+$ oc create token llamastack-vllm-inference -n redhat-ods-operator --duration=24h > llamastack-vllm-token.txt
+----
+.. View the token with the following command:
++
+[source,terminal]
+----
+$ cat llamastack-vllm-token.txt
+----
+
+* Token for OpenAI service account 
+.. Generate the token for OpenAI service account by running the following command: 
++
+[source,terminal]
+----
+$ oc create token llamastack-openai-inference -n redhat-ods-operator --duration=24h > llamastack-openai-token.txt
+----
+.. View the token with the following command:
++
+[source,terminal]
+----
+$ cat llamastack-openai-token.txt
+----
+
+.Verification 
+
+* Testing successful vLLM access:
++
+---
+This will allow you to access only vLLM models.
+
+. Load the vLLM token with the following command: 
++
+[source,terminal]
+----
+$ VLLM_TOKEN=$(cat llamastack-vllm-token.txt)
+----
+
+. You can then access the vLLM model by running the following command: 
++
+[source,terminal]
+----
+$ oc exec $POD_NAME -n redhat-ods-operator -- curl -s -X POST http://localhost:8321/v1/openai/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: Bearer ${VLLM_TOKEN}" -d '{"model": "vllm-inference/llama-3.2", "messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 50}'
+----
+---
+
+* Testing successful OpenAI access.
++
+---
+This will allow you to access OpenAI and vLLM models 
+
+. Load the OpenAI token with the following command: 
++
+[source,terminal]
+----
+$ OPENAI_TOKEN=$(cat llamastack-openai-token.txt)
+----
+
+. Access the vLLM model by running the following command: 
++
+[source,terminal]
+----
+$ oc exec $POD_NAME -n redhat-ods-operator -- curl -s -X POST http://localhost:8321/v1/openai/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: Bearer ${OPENAI_TOKEN}" -d '{"model": "vllm-inference/llama-3.2", "messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 50}'
+----
+. Access the OpenAI models by running the following command: 
++
+[source,terminal]
+----
+oc exec $POD_NAME -n redhat-ods-operator -- curl -s -X POST http://localhost:8321/v1/openai/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: Bearer ${OPENAI_TOKEN}" -d '{"model": "openai/gpt-4o-mini", "messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 50}'
+----
+---
+
+* Testing without any authorization: 
++
+---
+. Attempt to access the the OpenAI or vLLM models: 
++
+[source,terminal]
+----
+$ POD_NAME=$(oc get pods -n redhat-ods-operator -l app=llama-stack -o jsonpath='{.items[0].metadata.name}')
+$ oc exec $POD_NAME -n redhat-ods-operator -- curl -s -w "\nHTTP Status: %{http_code}\n" http://localhost:8321/v1/models
+----
+.Example output 
++
+[source,terminal]
+----
+$ HTTP Status: 401
+----
+---
+
+* Testing incorrect authorization:
++
+---
+. Attempt to access an OpenAI model with a vLLM token:
++
+[source, terminal]
+----
+$ oc exec $POD_NAME -n redhat-ods-operator -- curl -s -X POST http://localhost:8321/v1/openai/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: Bearer ${VLLM_TOKEN}" -d '{"model": "openai/gpt-4o-mini", "messages": [{"role": "user", "content": "Hello!"}], "max_tokens": 50}'
+----
++
+.Example output 
++
+[source,terminal]
+----
+$ 404 - File (model) not found
+----
+---
diff --git a/working-with-llama-stack.adoc b/working-with-llama-stack.adoc
@@ -19,4 +19,5 @@ include::modules/overview-of-llama-stack.adoc[leveloffset=+1]
 include::modules/openai-compatibility-for-rag-apis-in-llama-stack.adoc[leveloffset=+2]
 include::modules/openai-compatible-apis-in-llama-stack.adoc[leveloffset=+2]
 include::modules/activating-the-llama-stack-operator.adoc[leveloffset=+1]
-include::assemblies/deploying-a-rag-stack-in-a-data-science-project.adoc[leveloffset=+1]
+include::assemblies/deploying-a-rag-stack-in-a-data-science-project.adoc[leveloffset=+1]
+include::modules/auth-on-llama-stack.adoc[leveloffset=+1]