[Doc]: Kubernetes deployment in CPU mode fails (No CUDA..)

### 📚 The doc issue

I want to run "vllm serve" for testing purposes (API testing etc.) in my Kubernetes cluster. I followed a doc page <https://docs.vllm.ai/en/stable/deployment/k8s/#deployment-with-cpus>. I created the required resources as described, providing just `Deployment` config here:
~~~yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: vllm-server
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: vllm
  template:
    metadata:
      labels:
        app.kubernetes.io/name: vllm
    spec:
      containers:
      - name: vllm
        image: vllm/vllm-openai:latest
        command: ["/bin/sh", "-c"]
        args: [
          "vllm serve HuggingFaceTB/SmolLM2-135M"
        ]
        env:
        ports:
          - containerPort: 8000
        volumeMounts:
          - name: llama-storage
            mountPath: /root/.cache/huggingface
      volumes:
      - name: llama-storage
        persistentVolumeClaim:
          claimName: vllm-models
~~~

However the pod fails with this error:

~~~
INFO 01-27 02:06:24 [importing.py:44] Triton is installed but 0 active driver(s) found (expected 1). Disabling Triton to prevent runtime errors.
INFO 01-27 02:06:24 [importing.py:68] Triton not installed or not compatible; certain GPU-related functions will not be available.
WARNING 01-27 02:06:24 [interface.py:222] Failed to import from vllm._C: ImportError('libcuda.so.1: cannot open shared object file: No such file or directory')
W0127 02:06:28.021000 7 torch/utils/cpp_extension.py:117] No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
Traceback (most recent call last):
  File "/usr/local/bin/vllm", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 66, in main
    cmd.subparser_init(subparsers).set_defaults(dispatch_function=cmd.cmd)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 76, in subparser_init
    serve_parser = make_arg_parser(serve_parser)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/cli_args.py", line 296, in make_arg_parser
    parser = AsyncEngineArgs.add_cli_args(parser)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 2049, in add_cli_args
    parser = EngineArgs.add_cli_args(parser)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 1148, in add_cli_args
    vllm_kwargs = get_kwargs(VllmConfig)
                  ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 349, in get_kwargs
    return copy.deepcopy(_compute_kwargs(cls))
                         ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 261, in _compute_kwargs
    default = default.default_factory()
              ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/pydantic/_internal/_dataclasses.py", line 121, in __init__
    s.__pydantic_validator__.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
  File "/usr/local/lib/python3.12/dist-packages/vllm/config/device.py", line 58, in __post_init__
    raise RuntimeError(
RuntimeError: Failed to infer device type, please set the environment variable `VLLM_LOGGING_LEVEL=DEBUG` to turn on verbose logging to help debug the issue.
stream closed EOF for default/vllm-server-554f9b7686-64xsq (vllm)
~~~

Used image: `docker.io/vllm/vllm-openai@sha256:6bf34e50e2387dc46dc87a9d6a945fdd616a022bccfddd949052f54063ebcb8c`

It seems the configuration provided in the doc page is wrong (some args missing, env. vars, cpu-specific docker image?).

Could anyone help, pls?

### Suggest a potential alternative/fix

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Doc]: Kubernetes deployment in CPU mode fails (No CUDA..) #33161

📚 The doc issue

Suggest a potential alternative/fix

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Doc]: Kubernetes deployment in CPU mode fails (No CUDA..) #33161

Description

📚 The doc issue

Suggest a potential alternative/fix

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions