@@ -19,41 +19,39 @@ Documentation for developing the inference scheduler.
1919
2020## Kind Development Environment
2121
22- > ** WARNING** : This currently requires you to have manually built the vllm
23- > simulator separately on your local system. In a future iteration this will
24- > be handled automatically and will not be required. The tag for the simulator
25- > currently needs to be ` v0.1.0 ` .
22+ The following deployment creates a [ Kubernetes in Docker (KIND)] cluster with an inference scheduler using a Gateway API implementation, connected to the vLLM simulator.
23+ To run the deployment, use the following command:
2624
27- You can deploy the current scheduler with a Gateway API implementation into a
28- [ Kubernetes in Docker (KIND)] cluster locally with the following:
29-
30- ``` console
25+ ``` bash
3126make env-dev-kind
3227```
3328
3429This will create a ` kind ` cluster (or re-use an existing one) using the system's
3530local container runtime and deploy the development stack into the ` default `
3631namespace.
3732
33+ > [ !NOTE]
34+ > You can download the image locally using ` docker pull ghcr.io/llm-d/llm-d-inference-sim:latest ` , and the script will load it from your local Docker registry.
35+
3836There are several ways to access the gateway:
3937
4038** Port forward** :
4139
42- ``` console
40+ ``` bash
4341$ kubectl --context llm-d-inference-scheduler-dev port-forward service/inference-gateway 8080:80
4442```
4543
4644** NodePort**
4745
48- ``` console
46+ ``` bash
4947# Determine the k8s node address
5048$ kubectl --context llm-d-inference-scheduler-dev get node -o yaml | grep address
5149# The service is accessible over port 80 of the worker IP address.
5250```
5351
5452** LoadBalancer**
5553
56- ``` console
54+ ``` bash
5755# Install and run cloud-provider-kind:
5856$ go install sigs.k8s.io/cloud-provider-kind@latest && cloud-provider-kind &
5957$ kubectl --context llm-d-inference-scheduler-dev get service inference-gateway
@@ -62,22 +60,23 @@ $ kubectl --context llm-d-inference-scheduler-dev get service inference-gateway
6260
6361You can now make requests macthing the IP: port of one of the access mode above:
6462
65- ``` console
63+ ``` bash
6664$ curl -s -w ' \n' http://< IP:port> /v1/completions -H ' Content-Type: application/json' -d ' {"model":"food-review","prompt":"hi","max_tokens":10,"temperature":0}' | jq
6765```
6866
6967By default the created inference gateway, can be accessed on port 30080. This can
7068be overriden to any free port in the range of 30000 to 32767, by running the above
7169command as follows:
7270
73- ``` console
71+ ``` bash
7472KIND_GATEWAY_HOST_PORT=< selected-port> make env-dev-kind
7573```
7674
7775** Where:** < ; selected-port> ; is the port on your local machine you want to use to
7876access the inference gatyeway.
7977
80- > ** NOTE** : If you require significant customization of this environment beyond
78+ > [ !NOTE]
79+ > If you require significant customization of this environment beyond
8180> what the standard deployment provides, you can use the ` deploy/components `
8281> with ` kustomize ` to build your own highly customized environment. You can use
8382> the ` deploy/environments/kind ` deployment as a reference for your own.
@@ -89,27 +88,245 @@ access the inference gatyeway.
8988To test your changes to ` llm-d-inference-scheduler ` in this environment, make your changes locally
9089and then re-run the deployment:
9190
92- ``` console
91+ ``` bash
9392make env-dev-kind
9493```
9594
9695This will build images with your recent changes and load the new images to the
9796cluster. By default the image tag will be ` dev ` . It will also load ` llm-d-inference-sim ` image.
9897
99- ** NOTE:** The built image tag can be specified via the ` EPP_TAG ` environment variable so it is used in the deployment. For example:
98+ > [ !NOTE]
99+ > The built image tag can be specified via the ` EPP_TAG ` environment variable so it is used in the deployment. For example:
100100
101- ``` console
101+ ``` bash
102102EPP_TAG=0.0.4 make env-dev-kind
103103```
104104
105- ** NOTE:** If you want to load a different tag of llm-d-inference-sim, you can use the environment variable ` VLLM_SIMULATOR_TAG ` to specify it.
105+ > [ !NOTE]
106+ > If you want to load a different tag of llm-d-inference-sim, you can use the environment variable ` VLLM_SIMULATOR_TAG ` to specify it.
106107
107- ** NOTE ** : If you are working on a MacOS with Apple Silicon, it is required to add
108- the environment variable ` GOOS=linux ` .
108+ > [ !NOTE ]
109+ > If you are working on a MacOS with Apple Silicon, it is required to add the environment variable ` GOOS=linux ` .
109110
110111Then do a rollout of the EPP ` Deployment ` so that your recent changes are
111112reflected:
112113
113- ``` console
114- kubectl rollout restart deployment endpoint-picker
114+ ``` bash
115+ kubectl rollout restart deployment food-review-endpoint-picker
116+ ```
117+
118+ ## Kubernetes Development Environment
119+
120+ A Kubernetes cluster can be used for development and testing.
121+ The setup can be split in two:
122+
123+ - cluster-level infrastructure deployment (e.g., CRDs), and
124+ - deployment of development environments on a per-namespace basis
125+
126+ This enables cluster sharing by multiple developers. In case of private/personal
127+ clusters, the ` default ` namespace can be used directly.
128+
129+ ### Setup - Infrastructure
130+
131+ > [ !CAUTION]
132+ > In shared cluster situations you should probably not be
133+ > running this unless you're the cluster admin and you're _ certain_
134+ > that you should be running this, as this can be disruptive to other developers
135+ > in the cluster.
136+
137+ The following will deploy all the infrastructure-level requirements (e.g. CRDs,
138+ Operators, etc.) to support the namespace-level development environments:
139+
140+ Install GIE CRDs:
141+
142+ ``` bash
143+ kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/latest/download/manifests.yaml
144+ ```
145+
146+ Install kgateway:
147+ ``` bash
148+ KGTW_VERSION=v2.0.2
149+ helm upgrade -i --create-namespace --namespace kgateway-system --version $KGTW_VERSION kgateway-crds oci://cr.kgateway.dev/kgateway-dev/charts/kgateway-crds
150+ helm upgrade -i --namespace kgateway-system --version $KGTW_VERSION kgateway oci://cr.kgateway.dev/kgateway-dev/charts/kgateway --set inferenceExtension.enabled=true
151+ ```
152+
153+ For more details, see the Gateway API inference Extension [ getting started guide] ( https://gateway-api-inference-extension.sigs.k8s.io/guides/ )
154+
155+ ### Setup - Developer Environment
156+
157+ > [ !NOTE]
158+ > This setup is currently very manual in regards to container
159+ > images for the VLLM simulator and the EPP. It is expected that you build and
160+ > push images for both to your own private registry. In future iterations, we
161+ > will be providing automation around this to make it simpler.
162+
163+ To deploy a development environment to the cluster, you'll need to explicitly
164+ provide a namespace. This can be ` default ` if this is your personal cluster,
165+ but on a shared cluster you should pick something unique. For example:
166+
167+ ``` bash
168+ export NAMESPACE=annas-dev-environment
169+ ```
170+
171+ Create the namespace:
172+
173+ ``` bash
174+ kubectl create namespace ${NAMESPACE}
175+ ```
176+
177+ Set the default namespace for kubectl commands
178+
179+ ``` bash
180+ kubectl config set-context --current --namespace=" ${NAMESPACE} "
181+ ```
182+
183+ > [ !NOTE]
184+ > If you are using OpenShift (oc CLI), you can use the following instead: ` oc project "${NAMESPACE}" `
185+
186+ - Set Hugging Face token variable:
187+
188+ ``` bash
189+ export HF_TOKEN=" <HF_TOKEN>"
190+ ```
191+
192+ Download the ` llm-d-kv-cache-manager ` repository (the instllation script and Helm chart to install the vLLM environment):
193+
194+ ``` bash
195+ cd .. && git clone git@github.com:llm-d/llm-d-kv-cache-manager.git
196+ ```
197+
198+ If you prefer to clone it into the ` /tmp ` directory, make sure to update the ` VLLM_CHART_DIR ` environment variable:
199+ ` export VLLM_CHART_DIR=<tmp_dir>/llm-d-kv-cache-manager/vllm-setup-helm `
200+
201+ Once all this is set up, you can deploy the environment:
202+
203+ ``` bash
204+ make env-dev-kubernetes
205+ ```
206+
207+ This will deploy the entire stack to whatever namespace you chose.
208+ > [ !NOTE]
209+ > The model and images of each componet can be replaced. See [ Environment Configuration] ( #environment-configuration ) for model settings.
210+
211+ You can test by exposing the ` inference gateway ` via port-forward:
212+
213+ ``` bash
214+ kubectl port-forward service/inference-gateway 8080:80 -n " ${NAMESPACE} "
215+ ```
216+
217+ And making requests with ` curl ` :
218+
219+ ``` bash
220+ curl -s -w ' \n' http://localhost:8080/v1/completions -H ' Content-Type: application/json' \
221+ -d ' {"model":"meta-llama/Llama-3.1-8B-Instruct","prompt":"hi","max_tokens":10,"temperature":0}' | jq
222+ ```
223+
224+ > [ !NOTE]
225+ > If the response is empty or contains an error, jq may output a cryptic error. You can run the command without jq to debug raw responses.
226+
227+ #### Environment Configurateion
228+
229+ ** 1. Setting the EPP image and tag:**
230+
231+ You can optionally set a custom EPP image (otherwise, the default will be used):
232+
233+ ``` bash
234+ export EPP_TAG=" <YOUR_TAG>"
235+ export EPP_IMAGE=" <YOUR_REGISTRY>/<YOUR_IMAGE>"
236+ ```
237+
238+ ** 2. Setting the vLLM replicas:**
239+
240+ You can optionally set the vllm replicas:
241+
242+ ``` bash
243+ export VLLM_REPLICA_COUNT=2
244+ ```
245+
246+ ** 3. Setting the model name:**
247+
248+ You can replace the model name that will be used in the system.
249+
250+ ``` bash
251+ export MODEL_NAME=mistralai/Mistral-7B-Instruct-v0.2
252+ ```
253+
254+ If you need to deploy a larger model, update the vLLM-related parameters according to the model's requirements. For example:
255+
256+ ``` bash
257+ export MODEL_NAME=meta-llama/Llama-3.1-70B-Instruct
258+ export PVC_SIZE=200Gi
259+ export VLLM_MEMORY_RESOURCES=100Gi
260+ export VLLM_GPU_MEMORY_UTILIZATION=0.95
261+ export VLLM_TENSOR_PARALLEL_SIZE=2
262+ export VLLM_GPU_COUNT_PER_INSTANCE=2
263+ ```
264+
265+ ** 4. Additional environment settings:**
266+
267+ More environment variable settings can be found in the ` scripts/kubernetes-dev-env.sh ` .
268+
269+ #### Development Cycle
270+
271+ > [ !Warning]
272+ > This is a very manual process at the moment. We expect to make
273+ > this more automated in future iterations.
274+
275+ Make your changes locally and commit them. Then select an image tag based on
276+ the ` git ` SHA and set your private registry:
277+
278+ ``` bash
279+ export EPP_TAG=$( git rev-parse HEAD)
280+ export IMAGE_REGISTRY=" quay.io/<my-id>"
281+ ```
282+
283+ Build the image and tag the image for your private registry:
284+
285+ ``` bash
286+ make image-build
287+ ```
288+
289+ and push it:
290+
291+ ``` bash
292+ make image-push
293+ ```
294+
295+ You can now re-deploy the environment with your changes (don't forget all of
296+ the required environment variables):
297+
298+ ``` bash
299+ make env-dev-kubernetes
115300```
301+
302+ And test the changes.
303+
304+ ### Cleanup Environment
305+
306+ To clean up the development environment and remove all deployed resources in your namespace, run:
307+
308+ ``` bash
309+ make clean-env-dev-kubernetes
310+ ```
311+
312+ If you also want to remove the namespace entirely, run:
313+
314+ ``` bash
315+ kubectl delete namespace ${NAMESPACE}
316+ ```
317+
318+ To uninstall the infra-stracture development:
319+ Uninstal GIE CRDs:
320+
321+ ``` bash
322+ kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/latest/download/manifests.yaml --ignore-not-found
323+ ```
324+
325+ Uninstall kgateway:
326+
327+ ``` bash
328+ helm uninstall kgateway -n kgateway-system
329+ helm uninstall kgateway-crds -n kgateway-system
330+ ```
331+
332+ For more details, see the Gateway API inference Extension [ getting started guide] ( https://gateway-api-inference-extension.sigs.k8s.io/guides/ )
0 commit comments