@@ -19,7 +19,8 @@ Documentation for developing the inference scheduler.
1919
2020## Kind Development Environment
2121
22- > ** WARNING** : This currently requires you to have manually built the vllm
22+ > [ !Warning]
23+ > This currently requires you to have manually built the vllm
2324> simulator separately on your local system. In a future iteration this will
2425> be handled automatically and will not be required. The tag for the simulator
2526> currently needs to be ` v0.1.0 ` .
@@ -116,46 +117,50 @@ kubectl rollout restart deployment food-review-endpoint-picker
116117
117118## Kubernetes Development Environment
118119
119- A Kubernetes (or OpenShift) cluster can be used for development and testing.
120- There is a cluster-level infrastructure deployment that needs to be managed,
121- and then development environments can be created on a per-namespace basis to
122- enable sharing the cluster with multiple developers (or feel free to just use
123- the ` default ` namespace if the cluster is private/personal).
120+ A Kubernetes cluster can be used for development and testing.
121+ The setup can be split in two:
122+
123+ - cluster-level infrastructure deployment (e.g., CRDs), and
124+ - deployment of development environments on a per-namespace basis
125+
126+ This enables cluster sharing by multiple developers. In case of private/personal
127+ clusters, the the ` default ` namespace can be used directly.
124128
125129### Setup - Infrastructure
126130
127- > ** WARNING** : In shared cluster situations you should probably not be
128- > running this unless you're the cluster admin and you're _ certain_ it's you
131+ > [ !CAUTION]
132+ > In shared cluster situations you should probably not be
133+ > running this unless you're the cluster admin and you're _ certain_ you
129134> that should be running this, as this can be disruptive to other developers
130135> in the cluster.
131136
132137The following will deploy all the infrastructure-level requirements (e.g. CRDs,
133- Operators, etc) to support the namespace-level development environments:
138+ Operators, etc. ) to support the namespace-level development environments:
134139
135140Install GIE CRDs:
136141
137142``` bash
138- VERSION=v0.3.0
139- kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/download/$VERSION /manifests.yaml
143+ kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/latest/download/manifests.yaml
140144```
141145
142- Install Kgateway :
146+ Install kgateway :
143147``` bash
144148KGTW_VERSION=v2.0.2
145149helm upgrade -i --create-namespace --namespace kgateway-system --version $KGTW_VERSION kgateway-crds oci://cr.kgateway.dev/kgateway-dev/charts/kgateway-crds
146150helm upgrade -i --namespace kgateway-system --version $KGTW_VERSION kgateway oci://cr.kgateway.dev/kgateway-dev/charts/kgateway --set inferenceExtension.enabled=true
147151```
148152
149- For more details you can find in Gateway API inference [ getting started guide] ( https://gateway-api-inference-extension.sigs.k8s.io/guides/ )
153+ For more details, see the Gateway API inference Extension [ getting started guide] ( https://gateway-api-inference-extension.sigs.k8s.io/guides/ )
150154
151155### Setup - Developer Environment
152156
153- > ** WARNING** : This setup is currently very manual in regards to container
157+ > [ !NOTE]
158+ > This setup is currently very manual in regards to container
154159> images for the VLLM simulator and the EPP. It is expected that you build and
155160> push images for both to your own private registry. In future iterations, we
156161> will be providing automation around this to make it simpler.
157162
158- To deploy a development environment to the cluster you'll need to explicitly
163+ To deploy a development environment to the cluster, you'll need to explicitly
159164provide a namespace. This can be ` default ` if this is your personal cluster,
160165but on a shared cluster you should pick something unique. For example:
161166
@@ -175,7 +180,8 @@ Set the default namespace for kubectl commands
175180kubectl config set-context --current --namespace=" ${NAMESPACE} "
176181```
177182
178- > NOTE: If you are using OpenShift (oc CLI), use the following instead: ` oc project "${NAMESPACE}" `
183+ > [ !NOTE]
184+ > If you are using OpenShift (oc CLI), you can use the following instead: ` oc project "${NAMESPACE}" `
179185
180186- Set Hugging Face token variable:
181187
@@ -186,26 +192,26 @@ export HF_TOKEN="<HF_TOKEN>"
186192Download the ` llm-d-kv-cache-manager ` repository (the instllation script and Helm chart to install the vLLM environment):
187193
188194``` bash
189- cd .. & git clone git@github.com:llm-d/llm-d-kv-cache-manager.git
195+ cd .. && git clone git@github.com:llm-d/llm-d-kv-cache-manager.git
190196```
197+
191198If you prefer to clone it into the ` /tmp ` directory, make sure to update the ` VLLM_CHART_DIR ` environment variable:
192199` export VLLM_CHART_DIR=<tmp_dir>/llm-d-kv-cache-manager/vllm-setup-helm `
193200
194-
195-
196201Once all this is set up, you can deploy the environment:
197202
198203``` bash
199204make env-dev-kubernetes
200205```
201206
202207This will deploy the entire stack to whatever namespace you chose.
203- ** Note:** The model and images of each componet can be replaced. See [ Environment Configuration] ( #environment-configuration ) for model settings.
208+ > [ !NOTE]
209+ > The model and images of each componet can be replaced. See [ Environment Configuration] ( #environment-configuration ) for model settings.
204210
205- You can test by exposing the inference ` Gateway ` via port-forward:
211+ You can test by exposing the ` inference gateway ` via port-forward:
206212
207213``` bash
208- kubectl port-forward service/inference-gateway 8080:80
214+ kubectl port-forward service/inference-gateway 8080:80 -n " ${NAMESPACE} "
209215```
210216
211217And making requests with ` curl ` :
@@ -215,6 +221,9 @@ curl -s -w '\n' http://localhost:8080/v1/completions -H 'Content-Type: applicati
215221 -d ' {"model":"meta-llama/Llama-3.1-8B-Instruct","prompt":"hi","max_tokens":10,"temperature":0}' | jq
216222```
217223
224+ > [ !NOTE]
225+ > If the response is empty or contains an error, jq may output a cryptic error. You can run the command without jq to debug raw responses.
226+
218227#### Environment Configurateion
219228
220229** 1. Setting the EPP image and tag:**
@@ -234,7 +243,7 @@ You can optionally set the vllm replicas:
234243export VLLM_REPLICA_COUNT=2
235244```
236245
237- ** 3. Setting the model name and label :**
246+ ** 3. Setting the model name:**
238247
239248You can replace the model name that will be used in the system.
240249
@@ -244,41 +253,36 @@ export MODEL_NAME="${MODEL_NAME:-mistralai/Mistral-7B-Instruct-v0.2}"
244253
245254** 4. Additional environment settings:**
246255
247- More Setting of environment variables can be found in the ` scripts/kubernetes-dev-env.sh ` .
248-
249-
256+ More environment variable settings can be found in the ` scripts/kubernetes-dev-env.sh ` .
250257
251258#### Development Cycle
252259
253- > ** WARNING** : This is a very manual process at the moment. We expect to make
260+ > [ !Warning]
261+ > This is a very manual process at the moment. We expect to make
254262> this more automated in future iterations.
255263
256264Make your changes locally and commit them. Then select an image tag based on
257- the ` git ` SHA:
265+ the ` git ` SHA and set your private registry :
258266
259267``` bash
260268export EPP_TAG=$( git rev-parse HEAD)
269+ export IMAGE_REGISTRY=" quay.io/my-id"
261270```
262271
263- Build the image:
272+ Build the image and tag the image for your private registry :
264273
265274``` bash
266- DEV_VERSION= $EPP_TAG make image-build
275+ make image-build
267276```
268277
269- Tag the image for your private registry and push it:
278+ and push it:
270279
271280``` bash
272- $CONTAINER_RUNTIME tag quay.io/llm-d/llm-d-gateway-api-inference-extension/epp:$TAG \
273- < MY_REGISTRY> /< MY_IMAGE> :$EPP_TAG
274- $CONTAINER_RUNTIME push < MY_REGISTRY> /< MY_IMAGE> :$EPP_TAG
281+ make image-push
275282```
276283
277- > ** NOTE** : ` $CONTAINER_RUNTIME ` can be configured or replaced with whatever your
278- > environment's standard container runtime is (e.g. ` podman ` , ` docker ` ).
279-
280- Then you can re-deploy the environment with the new changes (don't forget all
281- the required env vars):
284+ You can now re-deploy the environment with your changes (don't forget all
285+ the required environment variables):
282286
283287``` bash
284288make env-dev-kubernetes
@@ -299,3 +303,19 @@ If you also want to remove the namespace entirely, run:
299303``` sh
300304kubectl delete namespace ${NAMESPACE}
301305```
306+
307+ To uninstall the infra-stracture development:
308+ Uninstal GIE CRDs:
309+
310+ ``` sh
311+ kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/latest/download/manifests.yaml --ignore-not-found
312+ ```
313+
314+ Uninstall kgateway:
315+
316+ ``` sh
317+ helm uninstall kgateway -n kgateway-system
318+ helm uninstall kgateway-crds -n kgateway-system
319+ ```
320+
321+ For more details, see the Gateway API inference Extension [ getting started guide] ( https://gateway-api-inference-extension.sigs.k8s.io/guides/ )
0 commit comments