Skip to content

RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 802: system not yet initialized #2554

@goudakrishna

Description

@goudakrishna

What happened:
Trying to deploy LLM model using vllm library as a pod in EKS 1.34 cluster p5en.48xlarge type managed node. Nvidia device plugin (v0.18.0) deployed as a daemonset. The model container is failing with runtime errors.

RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 802: system not yet initialized

The same docker image with same 1.34 cluster and same instance type worked in the one aws account with AMI amazon-eks-node-al2023-x86_64-nvidia-1.34-v20251023 and when we try with same infra but with amazon-eks-node-al2023-x86_64-nvidia-1.34-v20251108 version of AMI in another AWS account facing this error.

Working version: amazon-eks-node-al2023-x86_64-nvidia-1.34-v20251023
Problematic version: amazon-eks-node-al2023-x86_64-nvidia-1.34-v20251108

Logs:

chat-model ERROR 12-04 16:10:33 [multiproc_executor.py:597] WorkerProc failed to start.
chat-model ERROR 12-04 16:10:33 [multiproc_executor.py:597] Traceback (most recent call last):
chat-model ERROR 12-04 16:10:33 [multiproc_executor.py:597] File "/usr/local/lib/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 571, in worker_main
chat-model ERROR 12-04 16:10:33 [multiproc_executor.py:597] worker = WorkerProc(*args, **kwargs)
chat-model ERROR 12-04 16:10:33 [multiproc_executor.py:597] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
chat-model ERROR 12-04 16:10:33 [multiproc_executor.py:597] File "/usr/local/lib/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 405, in init
chat-model ERROR 12-04 16:10:33 [multiproc_executor.py:597] wrapper.init_worker(all_kwargs)
chat-model ERROR 12-04 16:10:33 [multiproc_executor.py:597] File "/usr/local/lib/python3.11/site-packages/vllm/worker/worker_base.py", line 212, in init_worker
chat-model ERROR 12-04 16:10:33 [multiproc_executor.py:597] worker_class = resolve_obj_by_qualname(
chat-model ERROR 12-04 16:10:33 [multiproc_executor.py:597] ^^^^^^^^^^^^^^^^^^^^^^^^
chat-model ERROR 12-04 16:10:33 [multiproc_executor.py:597] File "/usr/local/lib/python3.11/site-packages/vllm/utils/init.py", line 2680, in resolve_obj_by_qualname
chat-model ERROR 12-04 16:10:33 [multiproc_executor.py:597] module = importlib.import_module(module_name)
chat-model ERROR 12-04 16:10:33 [multiproc_executor.py:597] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
chat-model ERROR 12-04 16:10:33 [multiproc_executor.py:597] File "/usr/local/lib/python3.11/importlib/init.py", line 126, in import_module
chat-model ERROR 12-04 16:10:33 [multiproc_executor.py:597] return _bootstrap._gcd_import(name[level:], package, level)
chat-model ERROR 12-04 16:10:33 [multiproc_executor.py:597] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
chat-model ERROR 12-04 16:10:33 [multiproc_executor.py:597] File "", line 1204, in _gcd_import
chat-model ERROR 12-04 16:10:33 [multiproc_executor.py:597] File "", line 1176, in _find_and_load
chat-model ERROR 12-04 16:10:33 [multiproc_executor.py:597] File "", line 1147, in _find_and_load_unlocked
chat-model ERROR 12-04 16:10:33 [multiproc_executor.py:597] File "", line 690, in _load_unlocked
chat-model ERROR 12-04 16:10:33 [multiproc_executor.py:597] File "", line 940, in exec_module
chat-model ERROR 12-04 16:10:33 [multiproc_executor.py:597] File "", line 241, in _call_with_frames_removed
chat-model ERROR 12-04 16:10:33 [multiproc_executor.py:597] File "/usr/local/lib/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 34, in
chat-model ERROR 12-04 16:10:33 [multiproc_executor.py:597] from vllm.v1.worker.gpu_model_runner import GPUModelRunner
chat-model ERROR 12-04 16:10:33 [multiproc_executor.py:597] File "/usr/local/lib/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 69, in
chat-model ERROR 12-04 16:10:33 [multiproc_executor.py:597] from vllm.v1.attention.backends.flash_attn import AttentionMetadata
chat-model ERROR 12-04 16:10:33 [multiproc_executor.py:597] File "/usr/local/lib/python3.11/site-packages/vllm/v1/attention/backends/flash_attn.py", line 154, in
chat-model ERROR 12-04 16:10:33 [multiproc_executor.py:597] class FlashAttentionMetadataBuilder(
chat-model ERROR 12-04 16:10:33 [multiproc_executor.py:597] File "/usr/local/lib/python3.11/site-packages/vllm/v1/attention/backends/flash_attn.py", line 175, in FlashAttentionMetadataBuilder
chat-model ERROR 12-04 16:10:33 [multiproc_executor.py:597] if get_flash_attn_version() == 3 else AttentionCGSupport.UNIFORM_BATCH
chat-model ERROR 12-04 16:10:33 [multiproc_executor.py:597] ^^^^^^^^^^^^^^^^^^^^^^^^
chat-model ERROR 12-04 16:10:33 [multiproc_executor.py:597] File "/usr/local/lib/python3.11/site-packages/vllm/attention/utils/fa_utils.py", line 37, in get_flash_attn_version
chat-model ERROR 12-04 16:10:33 [multiproc_executor.py:597] and is_fa_version_supported(3)) else 2
chat-model ERROR 12-04 16:10:33 [multiproc_executor.py:597] ^^^^^^^^^^^^^^^^^^^^^^^^^^
chat-model ERROR 12-04 16:10:33 [multiproc_executor.py:597] File "/usr/local/lib/python3.11/site-packages/vllm/vllm_flash_attn/flash_attn_interface.py", line 57, in is_fa_version_supported
chat-model ERROR 12-04 16:10:33 [multiproc_executor.py:597] return _is_fa3_supported(device)[0]
chat-model ERROR 12-04 16:10:33 [multiproc_executor.py:597] ^^^^^^^^^^^^^^^^^^^^^^^^^
chat-model ERROR 12-04 16:10:33 [multiproc_executor.py:597] File "/usr/local/lib/python3.11/site-packages/vllm/vllm_flash_attn/flash_attn_interface.py", line 43, in _is_fa3_supported
chat-model ERROR 12-04 16:10:33 [multiproc_executor.py:597] if torch.cuda.get_device_capability(device)[0] < 8
chat-model ERROR 12-04 16:10:33 [multiproc_executor.py:597] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
chat-model ERROR 12-04 16:10:33 [multiproc_executor.py:597] File "/usr/local/lib/python3.11/site-packages/torch/cuda/init.py", line 600, in get_device_capability
chat-model ERROR 12-04 16:10:33 [multiproc_executor.py:597] prop = get_device_properties(device)
chat-model ERROR 12-04 16:10:33 [multiproc_executor.py:597] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
chat-model ERROR 12-04 16:10:33 [multiproc_executor.py:597] File "/usr/local/lib/python3.11/site-packages/torch/cuda/init.py", line 616, in get_device_properties
chat-model ERROR 12-04 16:10:33 [multiproc_executor.py:597] _lazy_init() # will define _get_device_properties
chat-model ERROR 12-04 16:10:33 [multiproc_executor.py:597] ^^^^^^^^^^^^
chat-model ERROR 12-04 16:10:33 [multiproc_executor.py:597] File "/usr/local/lib/python3.11/site-packages/torch/cuda/init.py", line 412, in _lazy_init
chat-model ERROR 12-04 16:10:33 [multiproc_executor.py:597] torch._C._cuda_init()
chat-model ERROR 12-04 16:10:33 [multiproc_executor.py:597] RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 802: system not yet initialized

What you expected to happen:
The latest version of AMI should work.

How to reproduce it (as minimally and precisely as possible):

Environment:

  • AWS Region: US-East-1
  • Instance Type(s): p5en.48xlarge
  • Cluster Kubernetes version: 1.34
  • Node Kubernetes version: 1.34
  • AMI Version:
    Working version: amazon-eks-node-al2023-x86_64-nvidia-1.34-v20251023
    Problematic version: amazon-eks-node-al2023-x86_64-nvidia-1.34-v20251108

Metadata

Metadata

Assignees

No one assigned

    Labels

    StalebugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions