Description
I'm experiencing an issue with model registration in our LLM deployment. Despite configuring the model name as "Qwen3-0.6B" in the values.yaml file, the model is showing up as "random/model" when querying the /v1/models endpoint.
Configuration
In my values.yaml, I have:
modelArtifacts.name: "Qwen3-0.6B"
routing.modelName: Qwen3-0.6B
- vLLM startup command with
--served-model-name Qwen3-0.6B
inferencePool.modelName: Qwen3-0.6B
- Both decode and prefill containers configured with the same model name
What I've Tried
- Verified all model name configurations are consistent
- Redeployed the service multiple times
- Restarted both vLLM and routing service pods
- Checked vLLM logs for model loading errors (no errors found)
- Verified model files exist at the expected path
/models/Qwen3-0___6B
- Enabled
inferenceModel.create: true as suggested
Expected Behavior
The model should be registered and accessible as "Qwen3-0.6B" when querying the /v1/models endpoint.
Actual Behavior
The /v1/models endpoint returns:
{
"data": [
{
"created": 1758531410,
"id": "random/model",
"object": "model",
"owned_by": "vllm",
"parent": null,
"root": "random/model"
}
],
"object": "list"
}
helm install in namespace "works" and gateway in "llm-d-infra"
my values.yaml
modelArtifacts:
uri: "pvc://pvc-2d4821f257a24dcdaadde41a2433d94d/Qwen3-0___6B"
name: "Qwen3-0.6B"
mountPath: "/models"
routing:
modelName: Qwen3-0.6B
servicePort: 8000
parentRefs:
- group: gateway.networking.k8s.io
kind: Gateway
name: llm-d-infra-inference-gateway
namespace: llm-d-infra
proxy:
image: ghcr.io/llm-d/llm-d-routing-sidecar:v0.2.0
targetPort: 8200
connector: nixlv2
secure: false
inferenceModel:
create: true
inferencePool:
create: true
name: test-llmd-llm-d-modelservice
targetPortNumber: 8200
modelServerType: vllm
modelName: Qwen3-0.6B
modelServers:
matchLabels:
llm-d.ai/inferenceServing: "true"
httpRoute:
create: false
epp:
create: true
service:
type: ClusterIP
port: 9002
targetPort: 9002
appProtocol: http2
image: ghcr.io/llm-d/llm-d-inference-scheduler:v0.2.1
replicas: 1
debugLevel: 4
disableReadinessProbe: false
disableLivenessProbe: false
pluginsConfigFile: "prefix-cache-tracking-config.yaml"
env: []
resources:
limits:
cpu: 1000m
memory: 1Gi
requests:
cpu: 1000m
memory: 1Gi
decode:
create: true
replicas: 1
monitoring:
podmonitor:
enabled: true
portName: "metrics"
path: "/metrics"
interval: "30s"
containers:
- name: "vllm"
image: "ghcr.io/llm-d/llm-d-dev:pr-170"
modelCommand: custom
command:
- "/bin/sh"
- "-c"
args:
- "vllm serve /models/Qwen3-0___6B --host 0.0.0.0 --port 8200 --served-model-name Qwen3-0.6B --max-model-len 1024"
env:
- name: UCX_TLS
value: "cuda_ipc,cuda_copy,tcp"
- name: VLLM_NIXL_SIDE_CHANNEL_HOST
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: VLLM_NIXL_SIDE_CHANNEL_PORT
value: "5557"
- name: VLLM_LOGGING_LEVEL
value: DEBUG
ports:
- containerPort: 5557
protocol: TCP
- containerPort: 8200
name: metrics
protocol: TCP
resources:
limits:
nvidia.com/gpu: "1"
requests:
nvidia.com/gpu: "1"
mountModelVolume: true
volumeMounts:
- name: metrics-volume
mountPath: /.config
- name: torch-compile-cache
mountPath: /.cache
volumes:
- name: metrics-volume
emptyDir: {}
- name: torch-compile-cache
emptyDir: {}
prefill:
create: true
replicas: 1
monitoring:
podmonitor:
enabled: true
portName: "metrics"
path: "/metrics"
interval: "30s"
containers:
- name: "vllm-prefill"
image: "ghcr.io/llm-d/llm-d-dev:pr-170"
modelCommand: custom
command:
- "/bin/sh"
- "-c"
args:
- "vllm serve /models/Qwen3-0___6B --host 0.0.0.0 --port 8200 --served-model-name Qwen3-0.6B --max-model-len 1024"
env:
- name: CUDA_VISIBLE_DEVICES
value: "0"
- name: UCX_TLS
value: "cuda_ipc,cuda_copy,tcp"
- name: VLLM_NIXL_SIDE_CHANNEL_HOST
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: VLLM_NIXL_SIDE_CHANNEL_PORT
value: "5558"
- name: VLLM_LOGGING_LEVEL
value: DEBUG
ports:
- containerPort: 5558
protocol: TCP
- containerPort: 8300
name: metrics
protocol: TCP
resources:
limits:
nvidia.com/gpu: "1"
requests:
nvidia.com/gpu: "1"
mountModelVolume: true
volumeMounts:
- name: metrics-volume
mountPath: /.config
- name: torch-compile-cache
mountPath: /.cache
volumes:
- name: metrics-volume
emptyDir: {}
- name: torch-compile-cache
emptyDir: {}
accelerator:
type: "nvidia"
resources:
nvidia: "nvidia.com/gpu"
env: {}
Description
I'm experiencing an issue with model registration in our LLM deployment. Despite configuring the model name as "Qwen3-0.6B" in the values.yaml file, the model is showing up as "random/model" when querying the /v1/models endpoint.
Configuration
In my values.yaml, I have:
modelArtifacts.name: "Qwen3-0.6B"routing.modelName: Qwen3-0.6B--served-model-name Qwen3-0.6BinferencePool.modelName: Qwen3-0.6BWhat I've Tried
/models/Qwen3-0___6BinferenceModel.create: trueas suggestedExpected Behavior
The model should be registered and accessible as "Qwen3-0.6B" when querying the /v1/models endpoint.
Actual Behavior
The /v1/models endpoint returns:
{ "data": [ { "created": 1758531410, "id": "random/model", "object": "model", "owned_by": "vllm", "parent": null, "root": "random/model" } ], "object": "list" }helm install in namespace "works" and gateway in "llm-d-infra"
my values.yaml