Skip to content

Commit 5c6a8e6

Browse files
committed
Doc revamp
1 parent 2a1b008 commit 5c6a8e6

6 files changed

Lines changed: 261 additions & 4 deletions

File tree

deploy/helm/tlm/Chart.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,10 +15,10 @@ type: application
1515
# This is the chart version. This version number should be incremented each time you make changes
1616
# to the chart and its templates, including the app version.
1717
# Versions are expected to follow Semantic Versioning (https://semver.org/)
18-
version: 0.1.28
18+
version: 0.1.29
1919

2020
# This is the version number of the application being deployed. This version number should be
2121
# incremented each time you make changes to the application. Versions are not expected to
2222
# follow Semantic Versioning. They should reflect the version the application is using.
2323
# It is recommended to use it with quotes.
24-
appVersion: "0.1.20"
24+
appVersion: "0.1.0"

deploy/helm/tlm/templates/chat/deployment.yaml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -34,9 +34,11 @@ spec:
3434
value: "{{ .Values.chat_backend.container.port }}"
3535
- name: NUM_PROXIES
3636
value: "{{ .Values.chat_backend.num_proxies }}"
37+
- name: ENVIRONMENT
38+
value: "{{ .Values.environment }}"
3739
envFrom:
3840
- secretRef:
39-
name: {{ .Release.Name }}-chat-backend-secret
41+
name: {{ .Values.chat_backend.secret_name}}
4042
startupProbe:
4143
httpGet:
4244
path: /api/health/
@@ -61,7 +63,7 @@ spec:
6163
port: {{ .Values.chat_backend.container.port }}
6264
resources:
6365
{{- toYaml .Values.chat_backend.resources | nindent 12 }}
64-
{{- if .Values.imagePullSecret.enabled }}
66+
{{- if .Values.imagePullSecret.enabled }}
6567
imagePullSecrets:
6668
- name: {{ .Values.imagePullSecret.name }}
6769
{{- end }}

deploy/helm/tlm/values.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,16 @@
22
# This is a YAML-formatted file.
33
# Declare variables to be passed into your templates.
44

5+
environment: "production"
56
replicaCount: 1
67

78
imagePullSecret:
89
enabled: false
910
name: ""
1011

1112
chat_backend:
13+
secret_name: ""
14+
1215
image:
1316
repository: 043170249292.dkr.ecr.us-east-1.amazonaws.com/tlm/chat-backend
1417
pullPolicy: IfNotPresent

deploy/tilt/Tiltfile.chat

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,9 @@ def deploy_chat():
4343
"deploy/helm/tlm/values.yaml",
4444
"deploy/helm/values/local.yaml",
4545
],
46+
set=[
47+
"chat_backend.secret_name=tlm-chat-backend-secret",
48+
],
4649
))
4750

4851
docker_build(

installation/aks.md

Lines changed: 249 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,249 @@
1+
# TLM Installation on Azure Kubernetes Service (AKS)
2+
This guide describes how to install the TLM app on Azure Kubernetes Service (AKS).
3+
4+
## 0. Prerequisites
5+
6+
Before you begin the installation process, ensure that the following prerequisites are met:
7+
8+
- **Azure Kubernetes Service (AKS) Cluster**: You must have an existing AKS cluster. The cluster should be configured with a Virtual Network (VNet) of size `/22` or larger.
9+
10+
- **Azure Application Gateway Ingress Controller (AGIC) Add-on** *(optional)*: If you plan to use the Azure Application Gateway Ingress Controller to expose the TLM application, ensure that the AGIC add-on is enabled when creating your AKS cluster.
11+
12+
- **Configured Tools**: The tools listed below must be installed and properly configured on your machine:
13+
- [helm](https://helm.sh/docs/intro/install/)
14+
- [kubectl](https://kubernetes.io/docs/tasks/tools/#kubectl)
15+
- [az (Azure CLI)](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli)
16+
- [jq](https://jqlang.github.io/jq/)
17+
18+
Ensure that these prerequisites are satisfied to successfully install and run the TLM application on AKS.
19+
20+
## 1. Cross-tenant Azure Container Registry Authentication
21+
22+
TLM container images are stored in Azure Container Registry (ACR). To pull these images, you will need to authenticate your AKS cluster to the ACR.
23+
24+
Since the ACR repository is in a different tenant, it is necessary to set up a cross-tenant Entra app registration.
25+
26+
1. Create the app registration in your ACR tenant
27+
```bash
28+
acr_app_id=$(az ad app create \
29+
--display-name "CleanlabTLMCrossTenantApp" \
30+
--sign-in-audience AzureADMultipleOrgs \
31+
--web-redirect-uris https://microsoft.com \
32+
--query appId -o tsv)
33+
```
34+
2. Reset the app registration credentials
35+
```bash
36+
acr_app_credentials=$(az ad app credential reset --id $acr_app_id)
37+
acr_app_password=$(echo $acr_app_credentials | jq -r '.password')
38+
acr_app_tenant_id=$(echo $acr_app_credentials | jq -r '.tenant')
39+
```
40+
3. Share the application ID and tenant ID with the Cleanlab Infra team.
41+
4. Export the application ID and password as environment variables for later use:
42+
```bash
43+
export ACR_APP_ID=$acr_app_id
44+
export ACR_APP_PASSWORD=$acr_app_password
45+
export ACR_APP_TENANT_ID=$acr_app_tenant_id
46+
```
47+
48+
You will use the the application ID and password to pull the TLM helm chart and to pull the TLM container images in the AKS cluster.
49+
50+
## 2. Azure OpenAI Service
51+
52+
### 2a. Set up Azure OpenAI Resource
53+
54+
If you do not already have an Azure OpenAI resource that you wish to use, you can create one:
55+
```bash
56+
azure_openai_endpoint=$(az cognitiveservices account create \
57+
--name CleanlabTLMOpenAIResource \
58+
--resource-group <resource_group_name> \
59+
--location <location> \
60+
--sku S0 \
61+
--kind OpenAI \
62+
--yes \
63+
--query properties.endpoint -o tsv)
64+
export AZURE_OPENAI_ENDPOINT=$azure_openai_endpoint
65+
```
66+
67+
### 2b. Set up Azure OpenAI Deployments
68+
69+
You will need to set up deployments for each model that you wish to use. This must include at least one each of `completion` and `embedding` deployments.
70+
71+
For example, if you wish to use `gpt-4o` and `gpt-4o-mini` completions and `text-embedding-3-small` embeddings, you will need to set up the following deployments:
72+
```bash
73+
az cognitiveservices deployment create \
74+
--name CleanlabTLMOpenAIResource \
75+
--resource-group <resource_group_name> \
76+
--model-name gpt-4o \
77+
--model-version 1 \
78+
--model-format OpenAI \
79+
80+
az cognitiveservices deployment create \
81+
--name CleanlabTLMOpenAIResource \
82+
--resource-group <resource_group_name> \
83+
--model-name gpt-4o-mini \
84+
--model-version 1 \
85+
--model-format OpenAI \
86+
87+
az cognitiveservices deployment create \
88+
--name CleanlabTLMOpenAIResource \
89+
--resource-group <resource_group_name> \
90+
--model-name text-embedding-3-small \
91+
--model-version 1 \
92+
--model-format OpenAI \
93+
```
94+
95+
### 2c. Set up Azure OpenAI Service Principal
96+
97+
To set up a Service Principal for Azure OpenAI and assign the necessary role, follow these steps:
98+
99+
1. **Create a Service Principal**
100+
101+
Create a new Service Principal and capture its credentials:
102+
```bash
103+
openai_sp_credentials=$(az ad sp create-for-rbac --name "CleanlabTLMOpenAISP" --skip-assignment)
104+
openai_sp_app_id=$(echo $openai_sp_credentials | jq -r '.appId')
105+
openai_sp_password=$(echo $openai_sp_credentials | jq -r '.password')
106+
openai_sp_tenant=$(echo $openai_sp_credentials | jq -r '.tenant')
107+
```
108+
109+
2. **Assign the Cognitive Services OpenAI User Role**
110+
111+
Assign the `Cognitive Services OpenAI User` role to the Service Principal:
112+
```bash
113+
az role assignment create \
114+
--assignee $openai_sp_app_id \
115+
--role "Cognitive Services OpenAI User" \
116+
--scope /subscriptions/<subscription_id>/resourceGroups/<resource_group_name>/providers/Microsoft.CognitiveServices/accounts/CleanlabTLMOpenAIResource
117+
```
118+
119+
Replace `<subscription_id>`, `<resource_group_name>`, and `<openai_resource_name>` with your Azure subscription ID, the resource group name, and the name of your Azure OpenAI resource, respectively.
120+
121+
3. **Export Service Principal Credentials**
122+
123+
Store the Service Principal credentials as environment variables for later use:
124+
```bash
125+
export OPENAI_SP_APP_ID=$openai_sp_app_id
126+
export OPENAI_SP_PASSWORD=$openai_sp_password
127+
export OPENAI_SP_TENANT=$openai_sp_tenant
128+
```
129+
130+
Ensure that these environment variables are securely stored and accessible to the components of the TLM application that require them.
131+
132+
4. **Verify the Role Assignment**
133+
134+
Confirm that the role has been successfully assigned:
135+
```bash
136+
az role assignment list --assignee $openai_sp_app_id --role "Cognitive Services OpenAI User" --scope /subscriptions/<subscription_id>/resourceGroups/<resource_group_name>/providers/Microsoft.CognitiveServices/accounts/<openai_resource_name>
137+
```
138+
139+
You should see an output confirming the role assignment for the Service Principal.
140+
141+
By completing these steps, you establish a Service Principal with the appropriate permissions to interact with the Azure OpenAI resources required by the TLM application.
142+
143+
## 3. Install TLM
144+
145+
### 3a. Create Kubernetes Namespace
146+
147+
```bash
148+
kubectl create namespace cleanlabtlm
149+
```
150+
151+
### 3b. Set up secrets (image pull secret, OpenAI service principal)
152+
153+
1. Using the service principal credentials created in the [cross-tenant authentication section](#1-create-azure-openai-service-principal), create a `docker-registry` secret. This will be used to pull container images from the `cleanlabtlm.azurecr.io` repository.
154+
155+
```bash
156+
kubectl create secret docker-registry cleanlabtlm-acr-secret \
157+
--namespace cleanlabtlm \
158+
--docker-server cleanlabtlm.azurecr.io \
159+
--docker-username $ACR_APP_ID \
160+
--docker-password $ACR_APP_PASSWORD
161+
```
162+
163+
2. Using the service principal credentials created in the [Azure OpenAI Service Principal section](#2c-set-up-azure-openai-service-principal), create a secret containing the credentials needed to authenticate with Azure OpenAI:
164+
165+
```bash
166+
kubectl create secret generic chat-backend-secret \
167+
--namespace cleanlabtlm \
168+
--from-literal=AZURE_TENANT_ID=$OPENAI_SP_TENANT \
169+
--from-literal=AZURE_CLIENT_ID=$OPENAI_SP_APP_ID \
170+
--from-literal=AZURE_CLIENT_SECRET=$OPENAI_SP_PASSWORD \
171+
--from-literal=AZURE_API_BASE=$AZURE_OPENAI_ENDPOINT
172+
```
173+
174+
### 3c. Log in to the `cleanlabtlm` Helm registry
175+
176+
Using the Azure Container Registry credentials created in the [cross-tenant authentication section](#1-create-azure-openai-service-principal), log in to the `cleanlabtlm` Helm registry:
177+
178+
```bash
179+
helm registry login cleanlabtlm.azurecr.io \
180+
--username $ACR_APP_ID \
181+
--password $ACR_APP_PASSWORD
182+
```
183+
184+
### 3d. Construct `values.yaml`
185+
186+
The following contains default values for the TLM helm chart. You can modify these values to suit your needs.
187+
188+
```bash
189+
cat <<EOF > values.yaml
190+
chat_backend:
191+
image:
192+
repository: cleanlabtlm.azurecr.io/tlm/chat-backend
193+
194+
imagePullSecret:
195+
enabled: true
196+
name: cleanlabtlm-acr-secret
197+
EOF
198+
```
199+
200+
### 3e. Install the Helm Chart
201+
202+
You can now install the helm chart using the `values.yaml` file you created in the previous step:
203+
204+
```bash
205+
helm upgrade --install tlm oci://cleanlab.azurecr.io/tlm/tlm \
206+
--namespace cleanlabtlm \
207+
-f values.yaml
208+
```
209+
210+
@ryan TODO!!!
211+
If you wish to [setup an Azure AGIC](#4-set-up-azure-application-gateway-ingress-controller-agic-optional), you will need to note the following values printed out after performing the TLM helm install:
212+
- `uri_prefix`
213+
- `service.name`
214+
- `service.port`
215+
216+
## 4. Set up Azure Application Gateway Ingress Controller (AGIC) (optional)
217+
218+
This is one means of exposing the TLM application. You will need to have the AGIC add-on installed on your AKS cluster.
219+
220+
Then, you can create the ingress resource by running the following:
221+
```bash
222+
tlm_uri_prefix=<uri_prefix> \
223+
tlm_service_name=<service.name> \
224+
tlm_service_port=<service.port> \
225+
kubectl apply -f - <<EOF
226+
apiVersion: networking.k8s.io/v1
227+
kind: Ingress
228+
metadata:
229+
name: tlm-ingress
230+
spec:
231+
ingressClassName: azure-application-gateway
232+
rules:
233+
- http:
234+
paths:
235+
- path: $tlm_uri_prefix
236+
backend:
237+
service:
238+
name: $tlm_service_name
239+
port:
240+
number: $tlm_service_port
241+
pathType: Prefix
242+
EOF
243+
```
244+
245+
## References
246+
247+
- [Cross tenant Azure Container Registry Authentication](https://learn.microsoft.com/en-us/azure/container-registry/authenticate-aks-cross-tenant)
248+
- [Azure Application Gateway Ingress Controller](https://learn.microsoft.com/en-us/azure/application-gateway/ingress-controller-overview)
249+
- [Azure OpenAI Service -- Create Resource](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/create-resource?pivots=cli)

0 commit comments

Comments
 (0)