Skip to content

Commit bacc65d

Browse files
authored
build: add support for development on kubernetes cluster (#190)
* build: add support for development on Kubernetes cluster Signed-off-by: Kfir Toledo <kfir.toledo@ibm.com>
1 parent 437f887 commit bacc65d

File tree

11 files changed

+580
-29
lines changed

11 files changed

+580
-29
lines changed

DEVELOPMENT.md

Lines changed: 239 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -19,41 +19,39 @@ Documentation for developing the inference scheduler.
1919

2020
## Kind Development Environment
2121

22-
> **WARNING**: This currently requires you to have manually built the vllm
23-
> simulator separately on your local system. In a future iteration this will
24-
> be handled automatically and will not be required. The tag for the simulator
25-
> currently needs to be `v0.1.0`.
22+
The following deployment creates a [Kubernetes in Docker (KIND)] cluster with an inference scheduler using a Gateway API implementation, connected to the vLLM simulator.
23+
To run the deployment, use the following command:
2624

27-
You can deploy the current scheduler with a Gateway API implementation into a
28-
[Kubernetes in Docker (KIND)] cluster locally with the following:
29-
30-
```console
25+
```bash
3126
make env-dev-kind
3227
```
3328

3429
This will create a `kind` cluster (or re-use an existing one) using the system's
3530
local container runtime and deploy the development stack into the `default`
3631
namespace.
3732

33+
> [!NOTE]
34+
> You can download the image locally using `docker pull ghcr.io/llm-d/llm-d-inference-sim:latest`, and the script will load it from your local Docker registry.
35+
3836
There are several ways to access the gateway:
3937

4038
**Port forward**:
4139

42-
```console
40+
```bash
4341
$ kubectl --context llm-d-inference-scheduler-dev port-forward service/inference-gateway 8080:80
4442
```
4543

4644
**NodePort**
4745

48-
```console
46+
```bash
4947
# Determine the k8s node address
5048
$ kubectl --context llm-d-inference-scheduler-dev get node -o yaml | grep address
5149
# The service is accessible over port 80 of the worker IP address.
5250
```
5351

5452
**LoadBalancer**
5553

56-
```console
54+
```bash
5755
# Install and run cloud-provider-kind:
5856
$ go install sigs.k8s.io/cloud-provider-kind@latest && cloud-provider-kind &
5957
$ kubectl --context llm-d-inference-scheduler-dev get service inference-gateway
@@ -62,22 +60,23 @@ $ kubectl --context llm-d-inference-scheduler-dev get service inference-gateway
6260

6361
You can now make requests macthing the IP:port of one of the access mode above:
6462

65-
```console
63+
```bash
6664
$ curl -s -w '\n' http://<IP:port>/v1/completions -H 'Content-Type: application/json' -d '{"model":"food-review","prompt":"hi","max_tokens":10,"temperature":0}' | jq
6765
```
6866

6967
By default the created inference gateway, can be accessed on port 30080. This can
7068
be overriden to any free port in the range of 30000 to 32767, by running the above
7169
command as follows:
7270

73-
```console
71+
```bash
7472
KIND_GATEWAY_HOST_PORT=<selected-port> make env-dev-kind
7573
```
7674

7775
**Where:** &lt;selected-port&gt; is the port on your local machine you want to use to
7876
access the inference gatyeway.
7977

80-
> **NOTE**: If you require significant customization of this environment beyond
78+
> [!NOTE]
79+
> If you require significant customization of this environment beyond
8180
> what the standard deployment provides, you can use the `deploy/components`
8281
> with `kustomize` to build your own highly customized environment. You can use
8382
> the `deploy/environments/kind` deployment as a reference for your own.
@@ -89,27 +88,245 @@ access the inference gatyeway.
8988
To test your changes to `llm-d-inference-scheduler` in this environment, make your changes locally
9089
and then re-run the deployment:
9190

92-
```console
91+
```bash
9392
make env-dev-kind
9493
```
9594

9695
This will build images with your recent changes and load the new images to the
9796
cluster. By default the image tag will be `dev`. It will also load `llm-d-inference-sim` image.
9897

99-
**NOTE:** The built image tag can be specified via the `EPP_TAG` environment variable so it is used in the deployment. For example:
98+
> [!NOTE]
99+
>The built image tag can be specified via the `EPP_TAG` environment variable so it is used in the deployment. For example:
100100
101-
```console
101+
```bash
102102
EPP_TAG=0.0.4 make env-dev-kind
103103
```
104104

105-
**NOTE:** If you want to load a different tag of llm-d-inference-sim, you can use the environment variable `VLLM_SIMULATOR_TAG` to specify it.
105+
> [!NOTE]
106+
> If you want to load a different tag of llm-d-inference-sim, you can use the environment variable `VLLM_SIMULATOR_TAG` to specify it.
106107
107-
**NOTE**: If you are working on a MacOS with Apple Silicon, it is required to add
108-
the environment variable `GOOS=linux`.
108+
> [!NOTE]
109+
> If you are working on a MacOS with Apple Silicon, it is required to add the environment variable `GOOS=linux`.
109110
110111
Then do a rollout of the EPP `Deployment` so that your recent changes are
111112
reflected:
112113

113-
```console
114-
kubectl rollout restart deployment endpoint-picker
114+
```bash
115+
kubectl rollout restart deployment food-review-endpoint-picker
116+
```
117+
118+
## Kubernetes Development Environment
119+
120+
A Kubernetes cluster can be used for development and testing.
121+
The setup can be split in two:
122+
123+
- cluster-level infrastructure deployment (e.g., CRDs), and
124+
- deployment of development environments on a per-namespace basis
125+
126+
This enables cluster sharing by multiple developers. In case of private/personal
127+
clusters, the `default` namespace can be used directly.
128+
129+
### Setup - Infrastructure
130+
131+
> [!CAUTION]
132+
> In shared cluster situations you should probably not be
133+
> running this unless you're the cluster admin and you're _certain_
134+
> that you should be running this, as this can be disruptive to other developers
135+
> in the cluster.
136+
137+
The following will deploy all the infrastructure-level requirements (e.g. CRDs,
138+
Operators, etc.) to support the namespace-level development environments:
139+
140+
Install GIE CRDs:
141+
142+
```bash
143+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/latest/download/manifests.yaml
144+
```
145+
146+
Install kgateway:
147+
```bash
148+
KGTW_VERSION=v2.0.2
149+
helm upgrade -i --create-namespace --namespace kgateway-system --version $KGTW_VERSION kgateway-crds oci://cr.kgateway.dev/kgateway-dev/charts/kgateway-crds
150+
helm upgrade -i --namespace kgateway-system --version $KGTW_VERSION kgateway oci://cr.kgateway.dev/kgateway-dev/charts/kgateway --set inferenceExtension.enabled=true
151+
```
152+
153+
For more details, see the Gateway API inference Extension [getting started guide](https://gateway-api-inference-extension.sigs.k8s.io/guides/)
154+
155+
### Setup - Developer Environment
156+
157+
> [!NOTE]
158+
> This setup is currently very manual in regards to container
159+
> images for the VLLM simulator and the EPP. It is expected that you build and
160+
> push images for both to your own private registry. In future iterations, we
161+
> will be providing automation around this to make it simpler.
162+
163+
To deploy a development environment to the cluster, you'll need to explicitly
164+
provide a namespace. This can be `default` if this is your personal cluster,
165+
but on a shared cluster you should pick something unique. For example:
166+
167+
```bash
168+
export NAMESPACE=annas-dev-environment
169+
```
170+
171+
Create the namespace:
172+
173+
```bash
174+
kubectl create namespace ${NAMESPACE}
175+
```
176+
177+
Set the default namespace for kubectl commands
178+
179+
```bash
180+
kubectl config set-context --current --namespace="${NAMESPACE}"
181+
```
182+
183+
> [!NOTE]
184+
> If you are using OpenShift (oc CLI), you can use the following instead: `oc project "${NAMESPACE}"`
185+
186+
- Set Hugging Face token variable:
187+
188+
```bash
189+
export HF_TOKEN="<HF_TOKEN>"
190+
```
191+
192+
Download the `llm-d-kv-cache-manager` repository (the instllation script and Helm chart to install the vLLM environment):
193+
194+
```bash
195+
cd .. && git clone git@github.com:llm-d/llm-d-kv-cache-manager.git
196+
```
197+
198+
If you prefer to clone it into the `/tmp` directory, make sure to update the `VLLM_CHART_DIR` environment variable:
199+
`export VLLM_CHART_DIR=<tmp_dir>/llm-d-kv-cache-manager/vllm-setup-helm`
200+
201+
Once all this is set up, you can deploy the environment:
202+
203+
```bash
204+
make env-dev-kubernetes
205+
```
206+
207+
This will deploy the entire stack to whatever namespace you chose.
208+
> [!NOTE]
209+
> The model and images of each componet can be replaced. See [Environment Configuration](#environment-configuration) for model settings.
210+
211+
You can test by exposing the `inference gateway` via port-forward:
212+
213+
```bash
214+
kubectl port-forward service/inference-gateway 8080:80 -n "${NAMESPACE}"
215+
```
216+
217+
And making requests with `curl`:
218+
219+
```bash
220+
curl -s -w '\n' http://localhost:8080/v1/completions -H 'Content-Type: application/json' \
221+
-d '{"model":"meta-llama/Llama-3.1-8B-Instruct","prompt":"hi","max_tokens":10,"temperature":0}' | jq
222+
```
223+
224+
> [!NOTE]
225+
> If the response is empty or contains an error, jq may output a cryptic error. You can run the command without jq to debug raw responses.
226+
227+
#### Environment Configurateion
228+
229+
**1. Setting the EPP image and tag:**
230+
231+
You can optionally set a custom EPP image (otherwise, the default will be used):
232+
233+
```bash
234+
export EPP_TAG="<YOUR_TAG>"
235+
export EPP_IMAGE="<YOUR_REGISTRY>/<YOUR_IMAGE>"
236+
```
237+
238+
**2. Setting the vLLM replicas:**
239+
240+
You can optionally set the vllm replicas:
241+
242+
```bash
243+
export VLLM_REPLICA_COUNT=2
244+
```
245+
246+
**3. Setting the model name:**
247+
248+
You can replace the model name that will be used in the system.
249+
250+
```bash
251+
export MODEL_NAME=mistralai/Mistral-7B-Instruct-v0.2
252+
```
253+
254+
If you need to deploy a larger model, update the vLLM-related parameters according to the model's requirements. For example:
255+
256+
```bash
257+
export MODEL_NAME=meta-llama/Llama-3.1-70B-Instruct
258+
export PVC_SIZE=200Gi
259+
export VLLM_MEMORY_RESOURCES=100Gi
260+
export VLLM_GPU_MEMORY_UTILIZATION=0.95
261+
export VLLM_TENSOR_PARALLEL_SIZE=2
262+
export VLLM_GPU_COUNT_PER_INSTANCE=2
263+
```
264+
265+
**4. Additional environment settings:**
266+
267+
More environment variable settings can be found in the `scripts/kubernetes-dev-env.sh`.
268+
269+
#### Development Cycle
270+
271+
> [!Warning]
272+
> This is a very manual process at the moment. We expect to make
273+
> this more automated in future iterations.
274+
275+
Make your changes locally and commit them. Then select an image tag based on
276+
the `git` SHA and set your private registry:
277+
278+
```bash
279+
export EPP_TAG=$(git rev-parse HEAD)
280+
export IMAGE_REGISTRY="quay.io/<my-id>"
281+
```
282+
283+
Build the image and tag the image for your private registry:
284+
285+
```bash
286+
make image-build
287+
```
288+
289+
and push it:
290+
291+
```bash
292+
make image-push
293+
```
294+
295+
You can now re-deploy the environment with your changes (don't forget all of
296+
the required environment variables):
297+
298+
```bash
299+
make env-dev-kubernetes
115300
```
301+
302+
And test the changes.
303+
304+
### Cleanup Environment
305+
306+
To clean up the development environment and remove all deployed resources in your namespace, run:
307+
308+
```bash
309+
make clean-env-dev-kubernetes
310+
```
311+
312+
If you also want to remove the namespace entirely, run:
313+
314+
```bash
315+
kubectl delete namespace ${NAMESPACE}
316+
```
317+
318+
To uninstall the infra-stracture development:
319+
Uninstal GIE CRDs:
320+
321+
```bash
322+
kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/latest/download/manifests.yaml --ignore-not-found
323+
```
324+
325+
Uninstall kgateway:
326+
327+
```bash
328+
helm uninstall kgateway -n kgateway-system
329+
helm uninstall kgateway-crds -n kgateway-system
330+
```
331+
332+
For more details, see the Gateway API inference Extension [getting started guide](https://gateway-api-inference-extension.sigs.k8s.io/guides/)

Makefile

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -298,3 +298,16 @@ env-dev-kind: image-build ## Run under kind ($(KIND_CLUSTER_NAME))
298298
clean-env-dev-kind: ## Cleanup kind setup (delete cluster $(KIND_CLUSTER_NAME))
299299
@echo "INFO: cleaning up kind cluster $(KIND_CLUSTER_NAME)"
300300
kind delete cluster --name $(KIND_CLUSTER_NAME)
301+
302+
303+
# Kubernetes Development Environment - Deploy
304+
# This target deploys the inference scheduler stack in a specific namespace for development and testing.
305+
.PHONY: env-dev-kubernetes
306+
env-dev-kubernetes: check-kubectl check-kustomize check-envsubst
307+
IMAGE_REGISTRY=$(IMAGE_REGISTRY) ./scripts/kubernetes-dev-env.sh 2>&1
308+
309+
# Kubernetes Development Environment - Teardown
310+
.PHONY: clean-env-dev-kubernetes
311+
clean-env-dev-kubernetes: check-kubectl check-kustomize check-envsubst
312+
@CLEAN=true ./scripts/kubernetes-dev-env.sh 2>&1
313+
@echo "INFO: Finished cleanup of development environment for namespace $(NAMESPACE)"

deploy/components/inference-gateway/inference-models.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
apiVersion: inference.networking.x-k8s.io/v1alpha2
22
kind: InferenceModel
33
metadata:
4-
name: ${MODEL_NAME}
4+
name: ${MODEL_NAME_SAFE}
55
spec:
66
modelName: ${MODEL_NAME}
77
criticality: Critical

deploy/components/vllm-sim/deployments.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
apiVersion: apps/v1
22
kind: Deployment
33
metadata:
4-
name: ${MODEL_NAME}-vllm-sim
4+
name: ${MODEL_NAME_SAFE}-vllm-sim
55
labels:
66
app: ${POOL_NAME}
77
spec:
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
apiVersion: gateway.kgateway.dev/v1alpha1
2+
kind: GatewayParameters
3+
metadata:
4+
name: custom-gw-params
5+
spec:
6+
kube:
7+
envoyContainer:
8+
securityContext:
9+
allowPrivilegeEscalation: false
10+
readOnlyRootFilesystem: true
11+
runAsNonRoot: true
12+
runAsUser: "${PROXY_UID}"
13+
service:
14+
type: ${GATEWAY_SERVICE_TYPE}
15+
extraLabels:
16+
gateway: custom
17+
podTemplate:
18+
extraLabels:
19+
gateway: custom
20+
securityContext:
21+
seccompProfile:
22+
type: RuntimeDefault

0 commit comments

Comments
 (0)