[Bug]: No sm_121 (Blackwell) support on aarch64 — NVIDIA DGX Spark / Acer GN100

### Your current environment

<details>**System Info**
- GPU: NVIDIA GB10 Grace Blackwell Superchip (sm_121)
- Architecture: aarch64 (ARM v9.2-A)
- CUDA: 13.0
- OS: NVIDIA DGX OS (Ubuntu 24.04 base)
- RAM: 128 GB LPDDR5x unified memory (CPU + GPU shared pool)
- Device: NVIDIA DGX Spark / Acer Veriton GN100
- vLLM image tested: avarok/vllm-dgx-spark:v11

**Describe the issue**

vLLM fails to start on NVIDIA DGX Spark (GB10 Blackwell, sm_121) because the bundled PyTorch binary only includes compiled CUDA kernels through sm_120. The GPU is detected but no compatible kernels exist for the actual compute capability.

This is a build-time issue — the shipped .so files lack sm_121 targets. Runtime workarounds (TORCH_CUDA_ARCH_LIST, LD_LIBRARY_PATH compat libs) have no effect on prebuilt binaries.

**Error behavior**

vLLM crashes at startup during CUDA kernel initialization. PyTorch reports supported architectures up to sm_120 but the GPU requires sm_121.

**Expected behavior**

vLLM should support sm_121 (Blackwell) on aarch64, either via:
1. Official PyTorch wheels built with sm_121 + aarch64 targets
2. vLLM Docker images compiled against sm_121-capable PyTorch
3. Documentation of the supported build path for DGX Spark users

**Context**

The NVIDIA DGX Spark (and its OEM variant Acer Veriton GN100) is shipping to customers now. It is an ARM64-only device with 128 GB unified memory and sm_121 Blackwell GPU. This is becoming a common edge AI inference platform. Currently the only working local LLM serving option on this hardware is Ollama (which ships its own llama.cpp backend with Blackwell support).

**Workaround**

Use Ollama for inference. vLLM is not functional on this hardware with any available container image.

**Additional context**

- PyTorch nightly may have sm_121 support but no official aarch64 + cu130 wheels are published yet
- NVIDIA NIM containers may work but are model-limited and require enterprise licensing
- Community image avarok/vllm-dgx-spark:v11 was the closest attempt but its PyTorch build stops at sm_120


<summary>The output of <code>python collect_env.py</code></summary>

```text
Your output of `python collect_env.py` here
```

</details>


### 🐛 Describe the bug

vLLM's prebuilt container ships PyTorch binaries compiled with CUDA kernels for GPU architectures up to sm_120. The DGX Spark's Blackwell GPU requires sm_121. Since there are no matching kernels in the binary, vLLM crashes at startup. It's a compile-time gap — the fix has to come from rebuilding PyTorch and vLLM with sm_121 targets, not from runtime configuration.

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: No sm_121 (Blackwell) support on aarch64 — NVIDIA DGX Spark / Acer GN100 #36821

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: No sm_121 (Blackwell) support on aarch64 — NVIDIA DGX Spark / Acer GN100 #36821

Description

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions