Skip to content

Commit 3bc2058

Browse files
committed
build: Use latest taf for vllm-sim for deployment
Signed-off-by: Kfir Toledo <[email protected]>
1 parent 28c8145 commit 3bc2058

File tree

2 files changed

+32
-29
lines changed

2 files changed

+32
-29
lines changed

DEVELOPMENT.md

Lines changed: 27 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -19,42 +19,39 @@ Documentation for developing the inference scheduler.
1919

2020
## Kind Development Environment
2121

22-
> [!Warning]
23-
> This currently requires you to have manually built the vllm
24-
> simulator separately on your local system. In a future iteration this will
25-
> be handled automatically and will not be required. The tag for the simulator
26-
> currently needs to be `v0.1.0`.
27-
28-
You can deploy the current scheduler with a Gateway API implementation into a
29-
[Kubernetes in Docker (KIND)] cluster locally with the following:
22+
The following deployment creates a [Kubernetes in Docker (KIND)] cluster with an inference scheduler using a Gateway API implementation, connected to the vLLM simulator.
23+
To run the deployment, use the following command:
3024

31-
```console
25+
```bash
3226
make env-dev-kind
3327
```
3428

3529
This will create a `kind` cluster (or re-use an existing one) using the system's
3630
local container runtime and deploy the development stack into the `default`
3731
namespace.
3832

33+
> [!NOTE]
34+
> You can download the image locally using `docker pull ghcr.io/llm-d/llm-d-inference-sim:latest`, and the script will load it from your local Docker registry.
35+
3936
There are several ways to access the gateway:
4037

4138
**Port forward**:
4239

43-
```console
40+
```bash
4441
$ kubectl --context llm-d-inference-scheduler-dev port-forward service/inference-gateway 8080:80
4542
```
4643

4744
**NodePort**
4845

49-
```console
46+
```bash
5047
# Determine the k8s node address
5148
$ kubectl --context llm-d-inference-scheduler-dev get node -o yaml | grep address
5249
# The service is accessible over port 80 of the worker IP address.
5350
```
5451

5552
**LoadBalancer**
5653

57-
```console
54+
```bash
5855
# Install and run cloud-provider-kind:
5956
$ go install sigs.k8s.io/cloud-provider-kind@latest && cloud-provider-kind &
6057
$ kubectl --context llm-d-inference-scheduler-dev get service inference-gateway
@@ -63,22 +60,23 @@ $ kubectl --context llm-d-inference-scheduler-dev get service inference-gateway
6360

6461
You can now make requests macthing the IP:port of one of the access mode above:
6562

66-
```console
63+
```bash
6764
$ curl -s -w '\n' http://<IP:port>/v1/completions -H 'Content-Type: application/json' -d '{"model":"food-review","prompt":"hi","max_tokens":10,"temperature":0}' | jq
6865
```
6966

7067
By default the created inference gateway, can be accessed on port 30080. This can
7168
be overriden to any free port in the range of 30000 to 32767, by running the above
7269
command as follows:
7370

74-
```console
71+
```bash
7572
KIND_GATEWAY_HOST_PORT=<selected-port> make env-dev-kind
7673
```
7774

7875
**Where:** &lt;selected-port&gt; is the port on your local machine you want to use to
7976
access the inference gatyeway.
8077

81-
> **NOTE**: If you require significant customization of this environment beyond
78+
> [!NOTE]
79+
> If you require significant customization of this environment beyond
8280
> what the standard deployment provides, you can use the `deploy/components`
8381
> with `kustomize` to build your own highly customized environment. You can use
8482
> the `deploy/environments/kind` deployment as a reference for your own.
@@ -90,28 +88,30 @@ access the inference gatyeway.
9088
To test your changes to `llm-d-inference-scheduler` in this environment, make your changes locally
9189
and then re-run the deployment:
9290

93-
```console
91+
```bash
9492
make env-dev-kind
9593
```
9694

9795
This will build images with your recent changes and load the new images to the
9896
cluster. By default the image tag will be `dev`. It will also load `llm-d-inference-sim` image.
9997

100-
**NOTE:** The built image tag can be specified via the `EPP_TAG` environment variable so it is used in the deployment. For example:
98+
> [!NOTE]
99+
>The built image tag can be specified via the `EPP_TAG` environment variable so it is used in the deployment. For example:
101100
102-
```console
101+
```bash
103102
EPP_TAG=0.0.4 make env-dev-kind
104103
```
105104

106-
**NOTE:** If you want to load a different tag of llm-d-inference-sim, you can use the environment variable `VLLM_SIMULATOR_TAG` to specify it.
105+
> [!NOTE]
106+
> If you want to load a different tag of llm-d-inference-sim, you can use the environment variable `VLLM_SIMULATOR_TAG` to specify it.
107107
108-
**NOTE**: If you are working on a MacOS with Apple Silicon, it is required to add
109-
the environment variable `GOOS=linux`.
108+
> [!NOTE]
109+
> If you are working on a MacOS with Apple Silicon, it is required to add the environment variable `GOOS=linux`.
110110
111111
Then do a rollout of the EPP `Deployment` so that your recent changes are
112112
reflected:
113113

114-
```console
114+
```bash
115115
kubectl rollout restart deployment food-review-endpoint-picker
116116
```
117117

@@ -292,7 +292,7 @@ and push it:
292292
make image-push
293293
```
294294

295-
You can now re-deploy the environment with your changes (don't forget all
295+
You can now re-deploy the environment with your changes (don't forget all of
296296
the required environment variables):
297297

298298
```bash
@@ -305,26 +305,26 @@ And test the changes.
305305

306306
To clean up the development environment and remove all deployed resources in your namespace, run:
307307

308-
```sh
308+
```bash
309309
make clean-env-dev-kubernetes
310310
```
311311

312312
If you also want to remove the namespace entirely, run:
313313

314-
```sh
314+
```bash
315315
kubectl delete namespace ${NAMESPACE}
316316
```
317317

318318
To uninstall the infra-stracture development:
319319
Uninstal GIE CRDs:
320320

321-
```sh
321+
```bash
322322
kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/latest/download/manifests.yaml --ignore-not-found
323323
```
324324

325325
Uninstall kgateway:
326326

327-
```sh
327+
```bash
328328
helm uninstall kgateway -n kgateway-system
329329
helm uninstall kgateway-crds -n kgateway-system
330330
```

scripts/kind-dev-env.sh

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
2626
: "${VLLM_SIMULATOR_IMAGE:=llm-d-inference-sim}"
2727

2828
# Set a default VLLM_SIMULATOR_TAG if not provided
29-
export VLLM_SIMULATOR_TAG="${VLLM_SIMULATOR_TAG:-v0.1.0}"
29+
export VLLM_SIMULATOR_TAG="${VLLM_SIMULATOR_TAG:-latest}"
3030

3131
# Set a default EPP_IMAGE if not provided
3232
: "${EPP_IMAGE:=llm-d-inference-scheduler}"
@@ -133,7 +133,10 @@ kubectl --context ${KUBE_CONTEXT} -n local-path-storage wait --for=condition=Rea
133133
if [ "${CONTAINER_RUNTIME}" == "podman" ]; then
134134
podman save ${IMAGE_REGISTRY}/${VLLM_SIMULATOR_IMAGE}:${VLLM_SIMULATOR_TAG} -o /dev/stdout | kind --name ${CLUSTER_NAME} load image-archive /dev/stdin
135135
else
136-
kind --name ${CLUSTER_NAME} load docker-image ${IMAGE_REGISTRY}/${VLLM_SIMULATOR_IMAGE}:${VLLM_SIMULATOR_TAG}
136+
if docker image inspect "${IMAGE_REGISTRY}/${VLLM_SIMULATOR_IMAGE}:${VLLM_SIMULATOR_TAG}" > /dev/null 2>&1; then
137+
echo "INFO: Loading image into KIND cluster..."
138+
kind --name ${CLUSTER_NAME} load docker-image ${IMAGE_REGISTRY}/${VLLM_SIMULATOR_IMAGE}:${VLLM_SIMULATOR_TAG}
139+
fi
137140
fi
138141

139142
# Load the ext_proc endpoint-picker image into the cluster

0 commit comments

Comments
 (0)