Model Not Properly Registered to Gateway: Shows as "random/model" Instead of "Qwen3-0.6B"

## Description

I'm experiencing an issue with model registration in our LLM deployment. Despite configuring the model name as "Qwen3-0.6B" in the values.yaml file, the model is showing up as "random/model" when querying the /v1/models endpoint.

## Configuration

In my values.yaml, I have:
- `modelArtifacts.name: "Qwen3-0.6B"`
- `routing.modelName: Qwen3-0.6B`
- vLLM startup command with `--served-model-name Qwen3-0.6B`
- `inferencePool.modelName: Qwen3-0.6B`
- Both decode and prefill containers configured with the same model name

## What I've Tried

1. Verified all model name configurations are consistent
2. Redeployed the service multiple times
3. Restarted both vLLM and routing service pods
4. Checked vLLM logs for model loading errors (no errors found)
5. Verified model files exist at the expected path `/models/Qwen3-0___6B`
6. Enabled `inferenceModel.create: true` as suggested

## Expected Behavior

The model should be registered and accessible as "Qwen3-0.6B" when querying the /v1/models endpoint.

## Actual Behavior

The /v1/models endpoint returns:
```json
{
  "data": [
    {
      "created": 1758531410,
      "id": "random/model",
      "object": "model",
      "owned_by": "vllm",
      "parent": null,
      "root": "random/model"
    }
  ],
  "object": "list"
}
```

helm install in namespace "works" and gateway in "llm-d-infra"
my values.yaml
```multinode: false

modelArtifacts:
  uri: "pvc://pvc-2d4821f257a24dcdaadde41a2433d94d/Qwen3-0___6B"
  name: "Qwen3-0.6B"
  mountPath: "/models"

routing:
  modelName: Qwen3-0.6B
  servicePort: 8000
  parentRefs:
    - group: gateway.networking.k8s.io
      kind: Gateway
      name: llm-d-infra-inference-gateway
      namespace: llm-d-infra

  proxy:
    image: ghcr.io/llm-d/llm-d-routing-sidecar:v0.2.0
    targetPort: 8200
    connector: nixlv2
    secure: false

  inferenceModel:
    create: true

  inferencePool:
    create: true
    name: test-llmd-llm-d-modelservice
    targetPortNumber: 8200
    modelServerType: vllm
    modelName: Qwen3-0.6B
    modelServers:
      matchLabels:
        llm-d.ai/inferenceServing: "true"

  httpRoute:
    create: false

  epp:
    create: true
    service:
      type: ClusterIP
      port: 9002
      targetPort: 9002
      appProtocol: http2
    image: ghcr.io/llm-d/llm-d-inference-scheduler:v0.2.1
    replicas: 1
    debugLevel: 4
    disableReadinessProbe: false
    disableLivenessProbe: false
    pluginsConfigFile: "prefix-cache-tracking-config.yaml"
    env: []
    resources:
      limits:
        cpu: 1000m
        memory: 1Gi
      requests:
        cpu: 1000m
        memory: 1Gi

decode:
  create: true
  replicas: 1
  monitoring:
    podmonitor:
      enabled: true
      portName: "metrics"
      path: "/metrics"
      interval: "30s"
  containers:
    - name: "vllm"
      image: "ghcr.io/llm-d/llm-d-dev:pr-170"
      modelCommand: custom
      command:
        - "/bin/sh"
        - "-c"
      args:
        - "vllm serve /models/Qwen3-0___6B --host 0.0.0.0 --port 8200 --served-model-name Qwen3-0.6B --max-model-len 1024"
      env:
        - name: UCX_TLS
          value: "cuda_ipc,cuda_copy,tcp"
        - name: VLLM_NIXL_SIDE_CHANNEL_HOST
          valueFrom:
            fieldRef:
              fieldPath: status.podIP
        - name: VLLM_NIXL_SIDE_CHANNEL_PORT
          value: "5557"
        - name: VLLM_LOGGING_LEVEL
          value: DEBUG
      ports:
        - containerPort: 5557
          protocol: TCP
        - containerPort: 8200
          name: metrics
          protocol: TCP
      resources:
        limits:
          nvidia.com/gpu: "1"
        requests:
          nvidia.com/gpu: "1"
      mountModelVolume: true
      volumeMounts:
        - name: metrics-volume
          mountPath: /.config
        - name: torch-compile-cache
          mountPath: /.cache
  volumes:
    - name: metrics-volume
      emptyDir: {}
    - name: torch-compile-cache
      emptyDir: {}

prefill:
  create: true
  replicas: 1
  monitoring:
    podmonitor:
      enabled: true
      portName: "metrics"
      path: "/metrics"
      interval: "30s"
  containers:
    - name: "vllm-prefill"
      image: "ghcr.io/llm-d/llm-d-dev:pr-170"
      modelCommand: custom
      command:
        - "/bin/sh"
        - "-c"
      args:
        - "vllm serve /models/Qwen3-0___6B --host 0.0.0.0 --port 8200 --served-model-name Qwen3-0.6B --max-model-len 1024"
      env:
        - name: CUDA_VISIBLE_DEVICES
          value: "0"
        - name: UCX_TLS
          value: "cuda_ipc,cuda_copy,tcp"
        - name: VLLM_NIXL_SIDE_CHANNEL_HOST
          valueFrom:
            fieldRef:
              fieldPath: status.podIP
        - name: VLLM_NIXL_SIDE_CHANNEL_PORT
          value: "5558"
        - name: VLLM_LOGGING_LEVEL
          value: DEBUG
      ports:
        - containerPort: 5558
          protocol: TCP
        - containerPort: 8300
          name: metrics
          protocol: TCP
      resources:
        limits:
          nvidia.com/gpu: "1"
        requests:
          nvidia.com/gpu: "1"
      mountModelVolume: true
      volumeMounts:
        - name: metrics-volume
          mountPath: /.config
        - name: torch-compile-cache
          mountPath: /.cache
  volumes:
    - name: metrics-volume
      emptyDir: {}
    - name: torch-compile-cache
      emptyDir: {}

accelerator:
  type: "nvidia"
  resources:
    nvidia: "nvidia.com/gpu"
  env: {}
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model Not Properly Registered to Gateway: Shows as "random/model" Instead of "Qwen3-0.6B" #233

Description

Configuration

What I've Tried

Expected Behavior

Actual Behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Model Not Properly Registered to Gateway: Shows as "random/model" Instead of "Qwen3-0.6B" #233

Description

Description

Configuration

What I've Tried

Expected Behavior

Actual Behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions