Skip to content

[Doc]: Kubernetes deployment in CPU mode fails (No CUDA..) #33161

@Josca

Description

@Josca

📚 The doc issue

I want to run "vllm serve" for testing purposes (API testing etc.) in my Kubernetes cluster. I followed a doc page https://docs.vllm.ai/en/stable/deployment/k8s/#deployment-with-cpus. I created the required resources as described, providing just Deployment config here:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: vllm-server
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: vllm
  template:
    metadata:
      labels:
        app.kubernetes.io/name: vllm
    spec:
      containers:
      - name: vllm
        image: vllm/vllm-openai:latest
        command: ["/bin/sh", "-c"]
        args: [
          "vllm serve HuggingFaceTB/SmolLM2-135M"
        ]
        env:
        ports:
          - containerPort: 8000
        volumeMounts:
          - name: llama-storage
            mountPath: /root/.cache/huggingface
      volumes:
      - name: llama-storage
        persistentVolumeClaim:
          claimName: vllm-models

However the pod fails with this error:

INFO 01-27 02:06:24 [importing.py:44] Triton is installed but 0 active driver(s) found (expected 1). Disabling Triton to prevent runtime errors.
INFO 01-27 02:06:24 [importing.py:68] Triton not installed or not compatible; certain GPU-related functions will not be available.
WARNING 01-27 02:06:24 [interface.py:222] Failed to import from vllm._C: ImportError('libcuda.so.1: cannot open shared object file: No such file or directory')
W0127 02:06:28.021000 7 torch/utils/cpp_extension.py:117] No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
Traceback (most recent call last):
  File "/usr/local/bin/vllm", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 66, in main
    cmd.subparser_init(subparsers).set_defaults(dispatch_function=cmd.cmd)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 76, in subparser_init
    serve_parser = make_arg_parser(serve_parser)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/cli_args.py", line 296, in make_arg_parser
    parser = AsyncEngineArgs.add_cli_args(parser)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 2049, in add_cli_args
    parser = EngineArgs.add_cli_args(parser)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 1148, in add_cli_args
    vllm_kwargs = get_kwargs(VllmConfig)
                  ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 349, in get_kwargs
    return copy.deepcopy(_compute_kwargs(cls))
                         ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 261, in _compute_kwargs
    default = default.default_factory()
              ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/pydantic/_internal/_dataclasses.py", line 121, in __init__
    s.__pydantic_validator__.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
  File "/usr/local/lib/python3.12/dist-packages/vllm/config/device.py", line 58, in __post_init__
    raise RuntimeError(
RuntimeError: Failed to infer device type, please set the environment variable `VLLM_LOGGING_LEVEL=DEBUG` to turn on verbose logging to help debug the issue.
stream closed EOF for default/vllm-server-554f9b7686-64xsq (vllm)

Used image: docker.io/vllm/vllm-openai@sha256:6bf34e50e2387dc46dc87a9d6a945fdd616a022bccfddd949052f54063ebcb8c

It seems the configuration provided in the doc page is wrong (some args missing, env. vars, cpu-specific docker image?).

Could anyone help, pls?

Suggest a potential alternative/fix

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions