-
-
Notifications
You must be signed in to change notification settings - Fork 14.9k
Closed
Labels
documentationImprovements or additions to documentationImprovements or additions to documentation
Description
📚 The doc issue
I want to run "vllm serve" for testing purposes (API testing etc.) in my Kubernetes cluster. I followed a doc page https://docs.vllm.ai/en/stable/deployment/k8s/#deployment-with-cpus. I created the required resources as described, providing just Deployment config here:
apiVersion: apps/v1
kind: Deployment
metadata:
name: vllm-server
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: vllm
template:
metadata:
labels:
app.kubernetes.io/name: vllm
spec:
containers:
- name: vllm
image: vllm/vllm-openai:latest
command: ["/bin/sh", "-c"]
args: [
"vllm serve HuggingFaceTB/SmolLM2-135M"
]
env:
ports:
- containerPort: 8000
volumeMounts:
- name: llama-storage
mountPath: /root/.cache/huggingface
volumes:
- name: llama-storage
persistentVolumeClaim:
claimName: vllm-modelsHowever the pod fails with this error:
INFO 01-27 02:06:24 [importing.py:44] Triton is installed but 0 active driver(s) found (expected 1). Disabling Triton to prevent runtime errors.
INFO 01-27 02:06:24 [importing.py:68] Triton not installed or not compatible; certain GPU-related functions will not be available.
WARNING 01-27 02:06:24 [interface.py:222] Failed to import from vllm._C: ImportError('libcuda.so.1: cannot open shared object file: No such file or directory')
W0127 02:06:28.021000 7 torch/utils/cpp_extension.py:117] No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
Traceback (most recent call last):
File "/usr/local/bin/vllm", line 10, in <module>
sys.exit(main())
^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 66, in main
cmd.subparser_init(subparsers).set_defaults(dispatch_function=cmd.cmd)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 76, in subparser_init
serve_parser = make_arg_parser(serve_parser)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/cli_args.py", line 296, in make_arg_parser
parser = AsyncEngineArgs.add_cli_args(parser)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 2049, in add_cli_args
parser = EngineArgs.add_cli_args(parser)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 1148, in add_cli_args
vllm_kwargs = get_kwargs(VllmConfig)
^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 349, in get_kwargs
return copy.deepcopy(_compute_kwargs(cls))
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 261, in _compute_kwargs
default = default.default_factory()
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/pydantic/_internal/_dataclasses.py", line 121, in __init__
s.__pydantic_validator__.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
File "/usr/local/lib/python3.12/dist-packages/vllm/config/device.py", line 58, in __post_init__
raise RuntimeError(
RuntimeError: Failed to infer device type, please set the environment variable `VLLM_LOGGING_LEVEL=DEBUG` to turn on verbose logging to help debug the issue.
stream closed EOF for default/vllm-server-554f9b7686-64xsq (vllm)
Used image: docker.io/vllm/vllm-openai@sha256:6bf34e50e2387dc46dc87a9d6a945fdd616a022bccfddd949052f54063ebcb8c
It seems the configuration provided in the doc page is wrong (some args missing, env. vars, cpu-specific docker image?).
Could anyone help, pls?
Suggest a potential alternative/fix
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
documentationImprovements or additions to documentationImprovements or additions to documentation