-
Notifications
You must be signed in to change notification settings - Fork 134
build: add support for development on kubernetes cluster #190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
7a3275d
35c3fb2
28c8145
3bc2058
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
|
|
@@ -19,41 +19,39 @@ Documentation for developing the inference scheduler. | |||||||
|
|
||||||||
| ## Kind Development Environment | ||||||||
|
|
||||||||
| > **WARNING**: This currently requires you to have manually built the vllm | ||||||||
| > simulator separately on your local system. In a future iteration this will | ||||||||
| > be handled automatically and will not be required. The tag for the simulator | ||||||||
| > currently needs to be `v0.1.0`. | ||||||||
| The following deployment creates a [Kubernetes in Docker (KIND)] cluster with an inference scheduler using a Gateway API implementation, connected to the vLLM simulator. | ||||||||
| To run the deployment, use the following command: | ||||||||
|
|
||||||||
| You can deploy the current scheduler with a Gateway API implementation into a | ||||||||
| [Kubernetes in Docker (KIND)] cluster locally with the following: | ||||||||
|
|
||||||||
| ```console | ||||||||
| ```bash | ||||||||
| make env-dev-kind | ||||||||
| ``` | ||||||||
|
|
||||||||
| This will create a `kind` cluster (or re-use an existing one) using the system's | ||||||||
| local container runtime and deploy the development stack into the `default` | ||||||||
| namespace. | ||||||||
|
|
||||||||
| > [!NOTE] | ||||||||
| > You can download the image locally using `docker pull ghcr.io/llm-d/llm-d-inference-sim:latest`, and the script will load it from your local Docker registry. | ||||||||
|
|
||||||||
| There are several ways to access the gateway: | ||||||||
|
|
||||||||
| **Port forward**: | ||||||||
|
|
||||||||
| ```console | ||||||||
| ```bash | ||||||||
| $ kubectl --context llm-d-inference-scheduler-dev port-forward service/inference-gateway 8080:80 | ||||||||
| ``` | ||||||||
|
|
||||||||
| **NodePort** | ||||||||
|
|
||||||||
| ```console | ||||||||
| ```bash | ||||||||
| # Determine the k8s node address | ||||||||
| $ kubectl --context llm-d-inference-scheduler-dev get node -o yaml | grep address | ||||||||
| # The service is accessible over port 80 of the worker IP address. | ||||||||
| ``` | ||||||||
|
|
||||||||
| **LoadBalancer** | ||||||||
|
|
||||||||
| ```console | ||||||||
| ```bash | ||||||||
| # Install and run cloud-provider-kind: | ||||||||
| $ go install sigs.k8s.io/cloud-provider-kind@latest && cloud-provider-kind & | ||||||||
| $ kubectl --context llm-d-inference-scheduler-dev get service inference-gateway | ||||||||
|
|
@@ -62,22 +60,23 @@ $ kubectl --context llm-d-inference-scheduler-dev get service inference-gateway | |||||||
|
|
||||||||
| You can now make requests macthing the IP:port of one of the access mode above: | ||||||||
|
|
||||||||
| ```console | ||||||||
| ```bash | ||||||||
| $ curl -s -w '\n' http://<IP:port>/v1/completions -H 'Content-Type: application/json' -d '{"model":"food-review","prompt":"hi","max_tokens":10,"temperature":0}' | jq | ||||||||
| ``` | ||||||||
|
|
||||||||
| By default the created inference gateway, can be accessed on port 30080. This can | ||||||||
| be overriden to any free port in the range of 30000 to 32767, by running the above | ||||||||
| command as follows: | ||||||||
|
|
||||||||
| ```console | ||||||||
| ```bash | ||||||||
| KIND_GATEWAY_HOST_PORT=<selected-port> make env-dev-kind | ||||||||
| ``` | ||||||||
|
|
||||||||
| **Where:** <selected-port> is the port on your local machine you want to use to | ||||||||
| access the inference gatyeway. | ||||||||
|
|
||||||||
| > **NOTE**: If you require significant customization of this environment beyond | ||||||||
| > [!NOTE] | ||||||||
| > If you require significant customization of this environment beyond | ||||||||
| > what the standard deployment provides, you can use the `deploy/components` | ||||||||
| > with `kustomize` to build your own highly customized environment. You can use | ||||||||
| > the `deploy/environments/kind` deployment as a reference for your own. | ||||||||
|
|
@@ -89,27 +88,245 @@ access the inference gatyeway. | |||||||
| To test your changes to `llm-d-inference-scheduler` in this environment, make your changes locally | ||||||||
| and then re-run the deployment: | ||||||||
|
|
||||||||
| ```console | ||||||||
| ```bash | ||||||||
| make env-dev-kind | ||||||||
| ``` | ||||||||
|
|
||||||||
| This will build images with your recent changes and load the new images to the | ||||||||
| cluster. By default the image tag will be `dev`. It will also load `llm-d-inference-sim` image. | ||||||||
|
|
||||||||
| **NOTE:** The built image tag can be specified via the `EPP_TAG` environment variable so it is used in the deployment. For example: | ||||||||
| > [!NOTE] | ||||||||
| >The built image tag can be specified via the `EPP_TAG` environment variable so it is used in the deployment. For example: | ||||||||
|
|
||||||||
| ```console | ||||||||
| ```bash | ||||||||
| EPP_TAG=0.0.4 make env-dev-kind | ||||||||
| ``` | ||||||||
|
|
||||||||
| **NOTE:** If you want to load a different tag of llm-d-inference-sim, you can use the environment variable `VLLM_SIMULATOR_TAG` to specify it. | ||||||||
| > [!NOTE] | ||||||||
| > If you want to load a different tag of llm-d-inference-sim, you can use the environment variable `VLLM_SIMULATOR_TAG` to specify it. | ||||||||
|
|
||||||||
| **NOTE**: If you are working on a MacOS with Apple Silicon, it is required to add | ||||||||
| the environment variable `GOOS=linux`. | ||||||||
| > [!NOTE] | ||||||||
| > If you are working on a MacOS with Apple Silicon, it is required to add the environment variable `GOOS=linux`. | ||||||||
|
|
||||||||
| Then do a rollout of the EPP `Deployment` so that your recent changes are | ||||||||
| reflected: | ||||||||
|
|
||||||||
| ```console | ||||||||
| kubectl rollout restart deployment endpoint-picker | ||||||||
| ```bash | ||||||||
| kubectl rollout restart deployment food-review-endpoint-picker | ||||||||
| ``` | ||||||||
|
|
||||||||
| ## Kubernetes Development Environment | ||||||||
|
|
||||||||
| A Kubernetes cluster can be used for development and testing. | ||||||||
| The setup can be split in two: | ||||||||
|
|
||||||||
| - cluster-level infrastructure deployment (e.g., CRDs), and | ||||||||
| - deployment of development environments on a per-namespace basis | ||||||||
|
|
||||||||
| This enables cluster sharing by multiple developers. In case of private/personal | ||||||||
| clusters, the `default` namespace can be used directly. | ||||||||
|
|
||||||||
| ### Setup - Infrastructure | ||||||||
|
|
||||||||
| > [!CAUTION] | ||||||||
| > In shared cluster situations you should probably not be | ||||||||
| > running this unless you're the cluster admin and you're _certain_ | ||||||||
| > that you should be running this, as this can be disruptive to other developers | ||||||||
| > in the cluster. | ||||||||
|
|
||||||||
| The following will deploy all the infrastructure-level requirements (e.g. CRDs, | ||||||||
| Operators, etc.) to support the namespace-level development environments: | ||||||||
|
|
||||||||
| Install GIE CRDs: | ||||||||
|
|
||||||||
| ```bash | ||||||||
| kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/latest/download/manifests.yaml | ||||||||
| ``` | ||||||||
|
|
||||||||
| Install kgateway: | ||||||||
| ```bash | ||||||||
| KGTW_VERSION=v2.0.2 | ||||||||
kfirtoledo marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||
| helm upgrade -i --create-namespace --namespace kgateway-system --version $KGTW_VERSION kgateway-crds oci://cr.kgateway.dev/kgateway-dev/charts/kgateway-crds | ||||||||
| helm upgrade -i --namespace kgateway-system --version $KGTW_VERSION kgateway oci://cr.kgateway.dev/kgateway-dev/charts/kgateway --set inferenceExtension.enabled=true | ||||||||
| ``` | ||||||||
|
|
||||||||
| For more details, see the Gateway API inference Extension [getting started guide](https://gateway-api-inference-extension.sigs.k8s.io/guides/) | ||||||||
|
|
||||||||
| ### Setup - Developer Environment | ||||||||
|
|
||||||||
| > [!NOTE] | ||||||||
| > This setup is currently very manual in regards to container | ||||||||
| > images for the VLLM simulator and the EPP. It is expected that you build and | ||||||||
| > push images for both to your own private registry. In future iterations, we | ||||||||
| > will be providing automation around this to make it simpler. | ||||||||
|
|
||||||||
| To deploy a development environment to the cluster, you'll need to explicitly | ||||||||
| provide a namespace. This can be `default` if this is your personal cluster, | ||||||||
| but on a shared cluster you should pick something unique. For example: | ||||||||
|
|
||||||||
| ```bash | ||||||||
| export NAMESPACE=annas-dev-environment | ||||||||
| ``` | ||||||||
|
|
||||||||
| Create the namespace: | ||||||||
|
|
||||||||
| ```bash | ||||||||
| kubectl create namespace ${NAMESPACE} | ||||||||
| ``` | ||||||||
|
|
||||||||
| Set the default namespace for kubectl commands | ||||||||
|
|
||||||||
| ```bash | ||||||||
| kubectl config set-context --current --namespace="${NAMESPACE}" | ||||||||
kfirtoledo marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||
| ``` | ||||||||
|
|
||||||||
| > [!NOTE] | ||||||||
| > If you are using OpenShift (oc CLI), you can use the following instead: `oc project "${NAMESPACE}"` | ||||||||
|
|
||||||||
| - Set Hugging Face token variable: | ||||||||
kfirtoledo marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||
|
|
||||||||
| ```bash | ||||||||
| export HF_TOKEN="<HF_TOKEN>" | ||||||||
| ``` | ||||||||
|
|
||||||||
| Download the `llm-d-kv-cache-manager` repository (the instllation script and Helm chart to install the vLLM environment): | ||||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not clear what the text in the parenthesis is explaining?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't want to manage it, I think in the future when we have a deployer, we can use the helm from the deployment,
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Is this what you meant?
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please fix typo ( |
||||||||
|
|
||||||||
| ```bash | ||||||||
| cd .. && git clone git@github.com:llm-d/llm-d-kv-cache-manager.git | ||||||||
| ``` | ||||||||
|
|
||||||||
| If you prefer to clone it into the `/tmp` directory, make sure to update the `VLLM_CHART_DIR` environment variable: | ||||||||
| `export VLLM_CHART_DIR=<tmp_dir>/llm-d-kv-cache-manager/vllm-setup-helm` | ||||||||
|
|
||||||||
| Once all this is set up, you can deploy the environment: | ||||||||
|
|
||||||||
| ```bash | ||||||||
| make env-dev-kubernetes | ||||||||
| ``` | ||||||||
|
|
||||||||
| This will deploy the entire stack to whatever namespace you chose. | ||||||||
| > [!NOTE] | ||||||||
| > The model and images of each componet can be replaced. See [Environment Configuration](#environment-configuration) for model settings. | ||||||||
|
|
||||||||
| You can test by exposing the `inference gateway` via port-forward: | ||||||||
|
|
||||||||
| ```bash | ||||||||
| kubectl port-forward service/inference-gateway 8080:80 -n "${NAMESPACE}" | ||||||||
| ``` | ||||||||
|
|
||||||||
| And making requests with `curl`: | ||||||||
elevran marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||
|
|
||||||||
| ```bash | ||||||||
| curl -s -w '\n' http://localhost:8080/v1/completions -H 'Content-Type: application/json' \ | ||||||||
| -d '{"model":"meta-llama/Llama-3.1-8B-Instruct","prompt":"hi","max_tokens":10,"temperature":0}' | jq | ||||||||
| ``` | ||||||||
kfirtoledo marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||
|
|
||||||||
| > [!NOTE] | ||||||||
| > If the response is empty or contains an error, jq may output a cryptic error. You can run the command without jq to debug raw responses. | ||||||||
|
|
||||||||
| #### Environment Configurateion | ||||||||
|
|
||||||||
| **1. Setting the EPP image and tag:** | ||||||||
|
|
||||||||
| You can optionally set a custom EPP image (otherwise, the default will be used): | ||||||||
|
|
||||||||
| ```bash | ||||||||
| export EPP_TAG="<YOUR_TAG>" | ||||||||
| export EPP_IMAGE="<YOUR_REGISTRY>/<YOUR_IMAGE>" | ||||||||
| ``` | ||||||||
|
|
||||||||
| **2. Setting the vLLM replicas:** | ||||||||
|
|
||||||||
| You can optionally set the vllm replicas: | ||||||||
|
|
||||||||
| ```bash | ||||||||
| export VLLM_REPLICA_COUNT=2 | ||||||||
| ``` | ||||||||
|
|
||||||||
| **3. Setting the model name:** | ||||||||
|
|
||||||||
| You can replace the model name that will be used in the system. | ||||||||
|
|
||||||||
| ```bash | ||||||||
| export MODEL_NAME=mistralai/Mistral-7B-Instruct-v0.2 | ||||||||
| ``` | ||||||||
|
|
||||||||
| If you need to deploy a larger model, update the vLLM-related parameters according to the model's requirements. For example: | ||||||||
|
|
||||||||
| ```bash | ||||||||
| export MODEL_NAME=meta-llama/Llama-3.1-70B-Instruct | ||||||||
| export PVC_SIZE=200Gi | ||||||||
| export VLLM_MEMORY_RESOURCES=100Gi | ||||||||
| export VLLM_GPU_MEMORY_UTILIZATION=0.95 | ||||||||
| export VLLM_TENSOR_PARALLEL_SIZE=2 | ||||||||
| export VLLM_GPU_COUNT_PER_INSTANCE=2 | ||||||||
| ``` | ||||||||
|
|
||||||||
| **4. Additional environment settings:** | ||||||||
|
|
||||||||
| More environment variable settings can be found in the `scripts/kubernetes-dev-env.sh`. | ||||||||
|
|
||||||||
| #### Development Cycle | ||||||||
|
|
||||||||
| > [!Warning] | ||||||||
| > This is a very manual process at the moment. We expect to make | ||||||||
| > this more automated in future iterations. | ||||||||
|
|
||||||||
| Make your changes locally and commit them. Then select an image tag based on | ||||||||
| the `git` SHA and set your private registry: | ||||||||
|
|
||||||||
| ```bash | ||||||||
| export EPP_TAG=$(git rev-parse HEAD) | ||||||||
| export IMAGE_REGISTRY="quay.io/<my-id>" | ||||||||
| ``` | ||||||||
|
|
||||||||
| Build the image and tag the image for your private registry: | ||||||||
|
|
||||||||
| ```bash | ||||||||
| make image-build | ||||||||
| ``` | ||||||||
|
|
||||||||
| and push it: | ||||||||
|
|
||||||||
| ```bash | ||||||||
| make image-push | ||||||||
| ``` | ||||||||
|
|
||||||||
| You can now re-deploy the environment with your changes (don't forget all of | ||||||||
| the required environment variables): | ||||||||
|
|
||||||||
| ```bash | ||||||||
| make env-dev-kubernetes | ||||||||
| ``` | ||||||||
|
|
||||||||
| And test the changes. | ||||||||
|
|
||||||||
| ### Cleanup Environment | ||||||||
|
|
||||||||
| To clean up the development environment and remove all deployed resources in your namespace, run: | ||||||||
kfirtoledo marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||
|
|
||||||||
| ```bash | ||||||||
| make clean-env-dev-kubernetes | ||||||||
| ``` | ||||||||
|
|
||||||||
| If you also want to remove the namespace entirely, run: | ||||||||
|
|
||||||||
| ```bash | ||||||||
| kubectl delete namespace ${NAMESPACE} | ||||||||
| ``` | ||||||||
|
|
||||||||
| To uninstall the infra-stracture development: | ||||||||
| Uninstal GIE CRDs: | ||||||||
|
|
||||||||
| ```bash | ||||||||
| kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/latest/download/manifests.yaml --ignore-not-found | ||||||||
| ``` | ||||||||
|
|
||||||||
| Uninstall kgateway: | ||||||||
|
|
||||||||
| ```bash | ||||||||
| helm uninstall kgateway -n kgateway-system | ||||||||
| helm uninstall kgateway-crds -n kgateway-system | ||||||||
| ``` | ||||||||
|
|
||||||||
| For more details, see the Gateway API inference Extension [getting started guide](https://gateway-api-inference-extension.sigs.k8s.io/guides/) | ||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,22 @@ | ||
| apiVersion: gateway.kgateway.dev/v1alpha1 | ||
kfirtoledo marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| kind: GatewayParameters | ||
| metadata: | ||
| name: custom-gw-params | ||
| spec: | ||
| kube: | ||
| envoyContainer: | ||
| securityContext: | ||
| allowPrivilegeEscalation: false | ||
| readOnlyRootFilesystem: true | ||
| runAsNonRoot: true | ||
| runAsUser: "${PROXY_UID}" | ||
| service: | ||
| type: ${GATEWAY_SERVICE_TYPE} | ||
| extraLabels: | ||
| gateway: custom | ||
| podTemplate: | ||
| extraLabels: | ||
| gateway: custom | ||
| securityContext: | ||
| seccompProfile: | ||
| type: RuntimeDefault | ||
Uh oh!
There was an error while loading. Please reload this page.