Skip to content

Commit 633fec6

Browse files
committed
fix: Add support for vLLM resources in deployment scripts
Signed-off-by: Kfir Toledo <kfir.toledo@ibm.com>
1 parent eb9455e commit 633fec6

File tree

2 files changed

+30
-2
lines changed

2 files changed

+30
-2
lines changed

DEVELOPMENT.md

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -248,7 +248,18 @@ export VLLM_REPLICA_COUNT=2
248248
You can replace the model name that will be used in the system.
249249

250250
```bash
251-
export MODEL_NAME="${MODEL_NAME:-mistralai/Mistral-7B-Instruct-v0.2}"
251+
export MODEL_NAME=mistralai/Mistral-7B-Instruct-v0.2
252+
```
253+
254+
If you need to deploy a larger model, update the vLLM-related parameters according to the model's requirements. For example:
255+
256+
```bash
257+
export MODEL_NAME=meta-llama/Llama-3.1-70B-Instruct
258+
export PVC_SIZE=200Gi
259+
export VLLM_MEMORY_RESOURCES=100Gi
260+
export VLLM_GPU_MEMORY_UTILIZATION=0.95
261+
export VLLM_TENSOR_PARALLEL_SIZE=2
262+
export VLLM_GPU_COUNT_PER_INSTANCE=2
252263
```
253264

254265
**4. Additional environment settings:**

scripts/kubernetes-dev-env.sh

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -111,9 +111,21 @@ export PVC_SIZE="${PVC_SIZE:-40Gi}"
111111
# CPU request per vLLM replica
112112
export VLLM_CPU_RESOURCES="${VLLM_CPU_RESOURCES:-10}"
113113

114+
# Memory request per vLLM replica
115+
export VLLM_MEMORY_RESOURCES="${VLLM_MEMORY_RESOURCES:-40Gi}"
116+
117+
# GPU memory utilization (optional, default is null)
118+
export VLLM_GPU_MEMORY_UTILIZATION="${VLLM_GPU_MEMORY_UTILIZATION:-null}"
119+
114120
# Number of vLLM replicas
115121
export VLLM_REPLICA_COUNT="${VLLM_REPLICA_COUNT:-3}"
116122

123+
# Tensor parallel size (optional, default is null)
124+
export VLLM_TENSOR_PARALLEL_SIZE="${VLLM_TENSOR_PARALLEL_SIZE:-null}"
125+
126+
# Number of GPU per vLLM
127+
export VLLM_GPU_COUNT_PER_INSTANCE="${VLLM_GPU_COUNT_PER_INSTANCE:-1}"
128+
117129
# vLLM deployment name (derived from release + model)
118130
export VLLM_DEPLOYMENT_NAME="${VLLM_HELM_RELEASE_NAME}-${MODEL_NAME_SAFE}"
119131

@@ -139,7 +151,7 @@ if [[ "$CLEAN" == "true" ]]; then
139151
# Delete inference schedulare and gateway resources.
140152
kustomize build deploy/environments/dev/kubernetes-kgateway | envsubst | kubectl -n "${NAMESPACE}" delete --ignore-not-found=true -f -
141153
# Delete vllm resources.
142-
helm uninstall vllm --namespace ${NAMESPACE}
154+
helm uninstall vllm --namespace ${NAMESPACE} --ignore-not-found
143155
exit 0
144156
fi
145157

@@ -163,6 +175,11 @@ helm upgrade --install "$VLLM_HELM_RELEASE_NAME" "$VLLM_CHART_DIR" \
163175
--set vllm.model.label="$MODEL_NAME_SAFE" \
164176
--set vllm.replicaCount="$VLLM_REPLICA_COUNT" \
165177
--set vllm.resources.requests.cpu="$VLLM_CPU_RESOURCES" \
178+
--set vllm.resources.requests.memory="$VLLM_MEMORY_RESOURCES" \
179+
--set vllm.resources.requests."nvidia\.com/gpu"="$VLLM_GPU_COUNT_PER_INSTANCE" \
180+
--set vllm.resources.limits."nvidia\.com/gpu"="$VLLM_GPU_COUNT_PER_INSTANCE" \
181+
--set vllm.gpuMemoryUtilization="${VLLM_GPU_MEMORY_UTILIZATION}" \
182+
--set vllm.tensorParallelSize="${VLLM_TENSOR_PARALLEL_SIZE}" \
166183
--set persistence.enabled=true \
167184
--set persistence.size="$PVC_SIZE"\
168185
--set redis.nameSuffix="$REDIS_DEPLOYMENT_NAME" \

0 commit comments

Comments
 (0)