Skip to content

[Bug]: No sm_121 (Blackwell) support on aarch64 — NVIDIA DGX Spark / Acer GN100 #36821

@blogtheristo

Description

@blogtheristo

Your current environment

Details**System Info** - GPU: NVIDIA GB10 Grace Blackwell Superchip (sm_121) - Architecture: aarch64 (ARM v9.2-A) - CUDA: 13.0 - OS: NVIDIA DGX OS (Ubuntu 24.04 base) - RAM: 128 GB LPDDR5x unified memory (CPU + GPU shared pool) - Device: NVIDIA DGX Spark / Acer Veriton GN100 - vLLM image tested: avarok/vllm-dgx-spark:v11

Describe the issue

vLLM fails to start on NVIDIA DGX Spark (GB10 Blackwell, sm_121) because the bundled PyTorch binary only includes compiled CUDA kernels through sm_120. The GPU is detected but no compatible kernels exist for the actual compute capability.

This is a build-time issue — the shipped .so files lack sm_121 targets. Runtime workarounds (TORCH_CUDA_ARCH_LIST, LD_LIBRARY_PATH compat libs) have no effect on prebuilt binaries.

Error behavior

vLLM crashes at startup during CUDA kernel initialization. PyTorch reports supported architectures up to sm_120 but the GPU requires sm_121.

Expected behavior

vLLM should support sm_121 (Blackwell) on aarch64, either via:

  1. Official PyTorch wheels built with sm_121 + aarch64 targets
  2. vLLM Docker images compiled against sm_121-capable PyTorch
  3. Documentation of the supported build path for DGX Spark users

Context

The NVIDIA DGX Spark (and its OEM variant Acer Veriton GN100) is shipping to customers now. It is an ARM64-only device with 128 GB unified memory and sm_121 Blackwell GPU. This is becoming a common edge AI inference platform. Currently the only working local LLM serving option on this hardware is Ollama (which ships its own llama.cpp backend with Blackwell support).

Workaround

Use Ollama for inference. vLLM is not functional on this hardware with any available container image.

Additional context

  • PyTorch nightly may have sm_121 support but no official aarch64 + cu130 wheels are published yet
  • NVIDIA NIM containers may work but are model-limited and require enterprise licensing
  • Community image avarok/vllm-dgx-spark:v11 was the closest attempt but its PyTorch build stops at sm_120
The output of python collect_env.py
Your output of `python collect_env.py` here

🐛 Describe the bug

vLLM's prebuilt container ships PyTorch binaries compiled with CUDA kernels for GPU architectures up to sm_120. The DGX Spark's Blackwell GPU requires sm_121. Since there are no matching kernels in the binary, vLLM crashes at startup. It's a compile-time gap — the fix has to come from rebuilding PyTorch and vLLM with sm_121 targets, not from runtime configuration.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions