llm-d-utils

Utilities and workflow helpers for managing LLM-D deployments in Kubernetes. Clone with submodules:

git clone --recursive https://github.com/LucasWilkinson/llm-d-utils

Prerequisites

Make sure the following tools are installed and available in your PATH:

just for running the recipes in this repo
kubectl configured for the target cluster
helm
stern for streaming pod logs
watch
Optional: fzf for the nicer interactive pod pickers used by several recipes

Initial Setup

Create a .env file

The Justfile loads environment variables via set dotenv-load. Create a .env file in the project root with your configuration and secrets:
```
USER_NAME=your-username
HF_TOKEN=your-huggingface-token
GH_TOKEN=your-github-token
QUAY_REPO=your-quay-username
QUAY_ROBOT=buildbot
QUAY_PASSWORD=your-robot-account-token
```
- USER_NAME is used to generate your namespace: USER_NAME + "-llm-d-wide-ep" (defaults to your system username if not set)
To get quay.io credentials:
- Log into quay.io (via SSO)
- Go to Account Settings → Robot Accounts
- Create a new robot account (e.g., buildbot)
- Copy the token and use it as QUAY_PASSWORD
- QUAY_REPO should be your quay.io username (not the robot account name)
- The full robot account name will be constructed as QUAY_REPO+QUAY_ROBOT
IMPORTANT: Before building, you must also:
- Create the repository llm-d-cuda-dev in quay.io (can be public or private)
- Go to the repository → Settings → User and Robot Permissions
- Add your robot account (QUAY_REPO+QUAY_ROBOT) with Write permission
These values are required for the secret creation step below.
Point kubectl at your token file

Export the kubeconfig path you received from the platform (example path shown below):
```
export KUBECONFIG=~/kubectl-token.txt
```
Create Kubernetes secrets

Run:
```
just create-secrets
```
This will create (or update) the llm-d-hf-token, gh-token-secret, and registry-auth secrets in your namespace using the values from .env.
(Optional) Set your kubectl namespace

To avoid specifying -n {{NAMESPACE}} manually, update your context with:
```
just set-namespace
```
Deploy the workload

Launch the deployment using Kustomize and Helm:
```
just start
```
This will:
- Deploy model servers using kubectl apply -k (CoreWeave variant)
- Install the InferencePool via Helm (with Istio gateway)
- Deploy the Istio gateway and HTTPRoute
To tear it back down, run just stop. This removes the Helm release, model server manifests, and gateway resources.

The deployment uses manifests from llm-d/guides/wide-ep-lws/manifests/ and values from llm-d/guides/wide-ep-lws/inferencepool.values.yaml.

The benchmarking helpers (e.g. just run-bench) default to the deployment's model (deepseek-ai/DeepSeek-R1-0528). If you change the model, update the MODEL variable near the top of the Justfile so the generated remote Justfile targets the right endpoint.

Everyday Commands

Deployment Commands

just start

Deploy the full stack (model servers, InferencePool, gateway) using Kustomize and Helm.
just stop

Tear down the deployment (removes Helm release, model server manifests, and gateway).
just restart

Stop and start the deployment (just stop && just start).
just update-image TAG

Update the decode.yaml and prefill.yaml manifests to use a custom image with the specified tag. Example: just update-image test-latest-main

Monitoring Commands

just get-pods

List all pods in the configured namespace.
just status

Watch pod status in real-time using watch -n 2 kubectl get pods.
just describe [name=pod-name]

Describe a pod. If name is omitted, you'll get an interactive picker. Requires fzf for fuzzy selection, otherwise falls back to shell select.
just stern [name=pod-name] [-- <stern flags>]

Stream logs from pods using stern. With no name, you get the interactive picker. Flags after -- are forwarded to stern (e.g., just stern -- -c vllm-worker).
just print-gpus

Show GPU allocation across all cluster nodes, grouped by node and namespace.
just cks-nodes

Display CoreWeave node information (type, link speed, IB speed, reliability, etc.).

Benchmark Commands

just start-bench

Create the benchmark-interactive pod for running benchmarks.
just stop-bench

Delete the benchmark-interactive pod.
just restart-bench

Stop and start the benchmark pod (just stop-bench && just start-bench).
just interact-bench

Open an interactive shell in the benchmark pod with the Justfile and scripts copied in.
just run-bench NAME [IN_TOKENS] [OUT_TOKENS] [NUM_PROMPTS] [CONCURRENCY_LEVELS]

Run a benchmark with the specified name and parameters. Parameters are positional. Example: just run-bench run1 256 1024 8192. See "Benchmark Configuration" below for details.
just cp-results

Copy the most recent benchmark results from the benchmark pod to results/<timestamp> locally.

Build Commands

just start-build-pod

Create the buildah build pod for building custom vLLM images.
just stop-build-pod

Delete the buildah build pod.
just build-image VLLM_COMMIT TAG [use_sccache]

Build a custom vLLM image with the specified commit SHA and tag. use_sccache defaults to true. Example: just build-image abc123def my-custom-tag false

Utility Commands

just set-namespace

Update your kubectl context to default to the configured namespace.
just create-secrets

Create or update Kubernetes secrets (HF token, GH token, registry auth) from .env file.
just create-registry-auth

Create or update only the registry authentication secret.
just print-results DIR STR

Grep for a string in benchmark result logs and print sorted results.
just print-throughput DIR

Print output token throughput from benchmark results in a directory.
just print-tpot DIR

Print median time-per-output-token (TPOT) from benchmark results in a directory.

Benchmark Configuration

just run-bench accepts parameters to tune the benchmark payload. Parameters can be passed either positionally or as named arguments:

Positional (recommended):

just run-bench run1 256 1024 8192

Named arguments:

just run-bench name=run1 in_tokens=256 out_tokens=1024 num_prompts=8192

Parameters

name (required): Benchmark run name for organizing results
in_tokens (default 128): Prompt length fed to vllm bench
out_tokens (default 2048): Target completion length
num_prompts (default 16384): Total requests per concurrency level
concurrency_levels (default '8192 16384 32768'): Space-separated list of concurrency levels to sweep

These values are forwarded to the benchmark pod as environment variables. You can also invoke the benchmark manually:

kubectl exec -n NAMESPACE benchmark-interactive -- \
  env INPUT_TOKENS=256 OUTPUT_TOKENS=1024 NUM_PROMPTS=8192 \
  bash /app/run.sh

Building Custom vLLM Images

To build a custom vLLM image with a specific commit:

Start the build pod:
```
just start-build-pod
```
Build and push the image:
```
just build-image VLLM_COMMIT_SHA TAG

# Example:
just build-image 8ce5d3198d00631a76e1aa02a57947b46bc7218c mtp-enabled
```
This will:
- Clone the llm-d repository
- Update the Dockerfile with your specified vLLM commit
- Build the image using buildah
- Push to quay.io/QUAY_REPO/llm-d-cuda-dev:TAG
Update the manifests:

Edit llm-d/guides/wide-ep-lws/manifests/modelserver/base/decode.yaml and prefill.yaml to use your custom image:
```
image: quay.io/your-repo/llm-d-cuda-dev:your-tag
```
Clean up the build pod:
```
just stop-build-pod
```

Note: The build takes 30-60+ minutes. Monitor progress with:

kubectl logs -f buildah-build -n your-namespace

Troubleshooting

If just reports missing environment variables, double-check your .env file and ensure you’re running commands from the repository root.
Kubernetes errors such as CreateContainerConfigError usually indicate a missing or misnamed secret; re-run just create-secrets after updating .env, or inspect the pod events via just describe name=....
For log streaming issues, ensure stern is installed and your kubeconfig points to the correct cluster.

With the setup above you should be able to deploy, inspect, and debug the LLMD workloads quickly using the provided Just recipes.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
llm-d @ 12a0c29		llm-d @ 12a0c29
scripts		scripts
.gitignore		.gitignore
.gitmodules		.gitmodules
Justfile		Justfile
Justfile.remote		Justfile.remote
README.md		README.md
benchmark-interactive-pod.yaml		benchmark-interactive-pod.yaml
buildah-build-pod.yaml		buildah-build-pod.yaml
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

llm-d-utils

Table of Contents

Prerequisites

Initial Setup

Everyday Commands

Deployment Commands

Monitoring Commands

Benchmark Commands

Build Commands

Utility Commands

Benchmark Configuration

Parameters

Building Custom vLLM Images

Troubleshooting

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

LucasWilkinson/llm-d-utils

Folders and files

Latest commit

History

Repository files navigation

llm-d-utils

Table of Contents

Prerequisites

Initial Setup

Everyday Commands

Deployment Commands

Monitoring Commands

Benchmark Commands

Build Commands

Utility Commands

Benchmark Configuration

Parameters

Building Custom vLLM Images

Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages