You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs: overhaul installation instructions around uv + bring-your-own versions
Harmonize and correct installation docs across README, CLAUDE.md, and the
Sphinx install page, and fix stale package-metadata URLs.
- Lead with uv + cu13 as the recommended install; pip is a documented fallback.
- Emphasize bring-your-own Python (>=3.10) / PyTorch (>=2.6) / CUDA: nemo-toolkit
only pins torch>=2.6, so a pre-installed PyTorch is kept, not replaced.
- Frame the uv.lock/container combo (Python 3.13, PyTorch 2.12, CUDA 12.6/13.2)
as the actively-supported stack, not a hard requirement.
- Document the compiled / compiled-a100 extras (source-built GPU kernels for
SpeechLM2 / Automodel: Transformer Engine, FlashAttention, Mamba, grouped-GEMM,
DeepEP), including the H100+ vs A100 split and that they build via the Dockerfile.
- Fix broken commands: GPU pip install now shows the required --extra-index-url;
test/docs are PEP 735 groups (--group), not extras.
- Correct the Python floor (3.10), torch version (2.12), and clone URL
(NVIDIA-NeMo/NeMo); add an NGC container placeholder pending the image.
- Update stale repo URLs to NVIDIA-NeMo/NeMo in pyproject.toml and package_info.py.
Validated installability in Docker (py3.10/3.11/3.12; preinstalled torch
2.6/2.8/official cu124 kept; default + cu13 GPU paths resolve and import).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: CLAUDE.md
+2-6Lines changed: 2 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,13 +8,9 @@ NeMo Speech — toolkit for training/deploying speech models (ASR, TTS, Speech L
8
8
9
9
## Build & Install
10
10
11
-
```bash
12
-
pip install -e '.[all]'# Full dev install
13
-
pip install -e '.[asr]'# ASR only
14
-
pip install -e '.[test]'# With test deps
15
-
```
11
+
See the canonical installation guide — [`docs/source/starthere/install.rst`](docs/source/starthere/install.rst) (published at https://docs.nvidia.com/nemo/speech/nightly/) — for the uv, pip (bring-your-own Python/PyTorch/CUDA), Docker, and optional `compiled` (SpeechLM2/Automodel) install paths.
16
12
17
-
Requires Python 3.10+, PyTorch 2.6+.
13
+
Dev quickstart: `uv sync --extra all --extra cu13` (Python 3.10+, PyTorch 2.6+; `test`/`docs` are `--group`s, not extras).
Copy file name to clipboardExpand all lines: README.md
+49-6Lines changed: 49 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -49,9 +49,13 @@ For technical documentation, please see the
49
49
50
50
## Requirements
51
51
52
-
- Python 3.12 or above
53
-
- Pytorch 2.6 or above
54
-
- NVIDIA GPU (if you intend to do model training)
52
+
NeMo Speech works with the **Python, PyTorch, and CUDA versions of your choosing**:
53
+
54
+
- Python 3.10 or above
55
+
- PyTorch 2.6 or above
56
+
- NVIDIA GPU + CUDA (required for training; recommended for inference)
57
+
58
+
If you already have a Python/PyTorch/CUDA stack, NeMo Speech installs on top of it **without replacing it** — the `nemo-toolkit` package only requires `torch>=2.6`, so your existing PyTorch build is kept (see the install options below). The versions pinned in `uv.lock` and shipped in the official container — Python 3.13, PyTorch 2.12, CUDA 12.6/13.2 — are simply the combination we actively test and support. They make setup turnkey and reproducible, but they are **not** a hard requirement.
55
59
56
60
As of [Pytorch 2.6](https://docs.pytorch.org/docs/stable/notes/serialization.html#torch-load-with-weights-only-true),
57
61
`torch.load` defaults to using `weights_only=True`. Some model checkpoints may require using `weights_only=False`.
@@ -68,9 +72,48 @@ can have the risk of arbitrary code execution.
68
72
69
73
## Install NeMo Speech
70
74
71
-
NeMo Speech is installable via pip: `pip install 'nemo-toolkit[all]'`
72
-
To install with extra dependencies for CUDA 12.x or 13.x, use `pip install 'nemo-toolkit[all,cu12]'`
73
-
or `pip install 'nemo-toolkit[all,cu13]'` respectively.
75
+
The recommended way to install NeMo Speech is from source with [uv](https://docs.astral.sh/uv/), which reproduces our actively-tested stack from the committed `uv.lock`. If you need different Python/PyTorch/CUDA versions, NeMo also installs over your existing environment via pip — see the [pip fallback](#from-pypi-with-pip-fallback--bring-your-own-versions) below.
76
+
77
+
### From source with uv (recommended)
78
+
79
+
```bash
80
+
git clone https://github.com/NVIDIA-NeMo/NeMo.git
81
+
cd NeMo
82
+
uv sync --extra all --extra cu13 # CUDA 13.x (recommended) — use --extra cu12 for CUDA 12.x
83
+
```
84
+
85
+
This installs our supported stack (Python 3.13, PyTorch 2.12, CUDA 13.2) into `.venv/` with NeMo editable. Add `--group test` for the test suite or `--group docs` to build the docs; run tools via `uv run <cmd>` or activate with `source .venv/bin/activate`. On Linux, `cu12` and `cu13` are mutually exclusive — pass exactly one (`cu13` is the default).
86
+
87
+
> **SpeechLM2 / Automodel:** the Automodel backend runs **without** any compiled dependencies. It can *optionally* benefit from dedicated accelerated backends (Transformer Engine, FlashAttention, Mamba, grouped-GEMM/MoE, DeepEP) for better performance — these source-built kernels come from the `compiled` (Hopper/Blackwell) or `compiled-a100` (A100) extras, built by `docker/Dockerfile` (`GPU_TARGET=h100plus` / `a100`). See the [installation guide](https://docs.nvidia.com/nemo/speech/nightly/) for the full list and build details.
88
+
89
+
### Docker (turnkey, our supported stack)
90
+
91
+
> **NGC container:**_Coming soon — the pull command for the prebuilt NeMo Speech container image will be published here._
92
+
93
+
To build the container from source (CUDA 13 / H100+ by default):
See the header of [`docker/Dockerfile`](docker/Dockerfile) for CUDA 12 / A100 build arguments (`BASE_IMAGE`, `GPU_TARGET`).
102
+
103
+
### From PyPI with pip (fallback — bring your own versions)
104
+
105
+
Prefer your own Python/PyTorch/CUDA? `nemo-toolkit` only requires `torch>=2.6`, so install your PyTorch first (any version ≥ 2.6 for your CUDA — see the [PyTorch install matrix](https://pytorch.org/get-started/locally/)), then add NeMo and it **keeps your build**:
106
+
107
+
```bash
108
+
pip install nemo_toolkit[asr,tts] # also: [asr,tts,audio], [speechlm2], etc.
109
+
```
110
+
111
+
To have pip install our pinned PyTorch build instead, add the CUDA extra and the matching wheel index (pip does not read uv's index configuration, so `--extra-index-url` is required):
112
+
113
+
```bash
114
+
pip install 'nemo-toolkit[asr,tts,cu13]' --extra-index-url https://download.pytorch.org/whl/cu132 # CUDA 13.x
115
+
pip install 'nemo-toolkit[asr,tts,cu12]' --extra-index-url https://download.pytorch.org/whl/cu126 # CUDA 12.x
@@ -8,60 +8,51 @@ This page covers how to install NVIDIA NeMo for speech AI tasks (ASR, TTS, speak
8
8
Prerequisites
9
9
-------------
10
10
11
-
Before installing NeMo, ensure you have:
11
+
NeMo Speech works with the **Python, PyTorch, and CUDA versions of your choosing**:
12
12
13
-
#. **Python** 3.12 or above
14
-
#. **PyTorch** 2.7+ (install **before** NeMo so CUDA wheels match your GPU driver)
15
-
#. **NVIDIA GPU** (required for training; CPU-only inference is possible but slow)
13
+
#. **Python** 3.10 or above
14
+
#. **PyTorch** 2.6 or above
15
+
#. **NVIDIA GPU + CUDA** (required for training; CPU-only inference is possible but slow)
16
16
17
-
Recommended installation order
18
-
------------------------------
17
+
.. admonition:: Bring your own Python / PyTorch / CUDA
18
+
:class: important
19
19
20
-
Install dependencies in this order when setting up a **local GPU** environment:
20
+
The recommended install path is uv (below), which gives you our actively-tested stack. But NeMo Speech can also install *on top of* an existing environment: the ``nemo-toolkit`` package only requires ``torch>=2.6``, so if you already have a Python, PyTorch, and CUDA stack, your pre-installed PyTorch is **kept, not replaced** (see :ref:`the pip fallback <install-from-pypi>`).
21
21
22
-
#. Create and activate a Python environment.
23
-
#. Install a **CUDA toolkit** (or rely on a driver + PyTorch bundle that matches your CUDA major version).
24
-
#. Install **PyTorch** (and torchvision if you need it) from the index that matches your CUDA build.
25
-
#. Install **NeMo** (from PyPI or editable source) **with the extras** for the collections you need (``asr``, ``tts``, etc.).
22
+
The versions pinned in ``uv.lock`` and shipped in the official container — **Python 3.13, PyTorch 2.12, CUDA 12.6/13.2** — are simply the combination we actively test and support. They make setup turnkey and reproducible, but they are **not** a hard requirement.
26
23
27
-
Putting PyTorch in place first avoids mismatched CUDA runtimes and makes NeMo’s optional GPU-dependent packages resolve correctly.
24
+
.. note::
28
25
29
-
**Example (conda + pip, CUDA 13.0 PyTorch wheels):**
26
+
As of `PyTorch 2.6 <https://docs.pytorch.org/docs/stable/notes/serialization.html#torch-load-with-weights-only-true>`_, ``torch.load`` defaults to ``weights_only=True``. Some checkpoints require ``weights_only=False``; in that case set ``TORCH_FORCE_NO_WEIGHTS_ONLY_LOAD=1`` before loading, and only with trusted files (loading untrusted files with full pickle support risks arbitrary code execution).
30
27
31
-
.. code-block:: bash
32
-
33
-
# 1) New environment (adjust Python version if your platform requires it)
34
-
conda create -n nemo python=3.12 -y
35
-
conda activate nemo
36
-
37
-
# 2) CUDA toolkit from conda (optional if you already have a compatible toolkit via the driver)
38
-
conda install nvidia::cuda-toolkit
28
+
.. _install-from-source:
39
29
40
-
# 3) PyTorch built for CUDA 13.x — change cu130 / URL if you use cu124 or CPU-only
# 4) NeMo: use extras for ASR/TTS/etc. For a clone of the repo, use editable install (see below)
44
-
pip install nemo_toolkit[asr,tts]
33
+
The recommended way to install NeMo Speech is from source with `uv <https://docs.astral.sh/uv/>`_, which reproduces our actively-tested stack from the committed ``uv.lock``:
45
34
46
-
Adjust the PyTorch ``--index-url`` (e.g. ``cu124``, ``cu121``, or CPU) to match `PyTorch’s install matrix <https://pytorch.org/get-started/locally/>`_ and your NVIDIA driver.
35
+
.. code-block:: bash
47
36
48
-
Install from PyPI
49
-
-----------------
37
+
git clone https://github.com/NVIDIA-NeMo/NeMo.git
38
+
cd NeMo
50
39
51
-
The quickest way to install NeMo is via pip. Install only the collections you need:
40
+
# CUDA 13.x (recommended). Use --extra cu12 for CUDA 12.x. uv resolves the
41
+
# matching PyTorch CUDA wheel automatically from the pinned indexes.
42
+
uv sync --extra all --extra cu13
52
43
53
-
.. code-block:: bash
44
+
# Optional: add the test suite tooling, or the docs build dependencies
45
+
# uv sync --extra all --extra cu13 --group test
46
+
# uv sync --group docs
54
47
55
-
# Install ASR and TTS (most common)
56
-
pip install nemo_toolkit[asr,tts]
48
+
``uv sync`` creates a virtual environment in ``.venv/`` with NeMo installed in editable mode, matching our supported stack (Python 3.13, PyTorch 2.12, CUDA 13.2 by default). Run commands with ``uv run <cmd>`` or activate the environment with ``source .venv/bin/activate``.
57
49
58
-
# Install everything speech-related
59
-
pip install nemo_toolkit[asr,tts,audio]
50
+
On Linux, pass exactly one of ``--extra cu13`` (recommended) or ``--extra cu12`` — they are mutually exclusive. If you omit both, uv installs the generic PyPI PyTorch wheel instead of NVIDIA's CUDA-matched build.
60
51
61
-
Available extras:
52
+
Available collection extras (combine with one CUDA extra above):
62
53
63
54
.. list-table::
64
-
:widths:15 85
55
+
:widths:18 82
65
56
:header-rows: 1
66
57
67
58
* - Extra
@@ -72,32 +63,106 @@ Available extras:
72
63
- Text-to-Speech models, vocoders, and audio codecs
- Speech language models (includes NeMo Automodel)
68
+
* - ``all``
69
+
- All of the collections above
70
+
* - ``cu12`` / ``cu13``
71
+
- Our pinned CUDA 12.x / 13.x PyTorch build (Linux; pick at most one)
75
72
76
-
.. _install-from-source:
73
+
.. note::
77
74
78
-
Install from Source
79
-
-------------------
75
+
``test`` and ``docs`` are dependency *groups* (PEP 735), not extras. Install them with ``--group`` (e.g. ``uv sync --group test``) — the bracket form ``.[test]`` does not work.
76
+
77
+
.. _install-compiled-extras:
78
+
79
+
Optional compiled dependencies for SpeechLM2 / Automodel (``compiled`` / ``compiled-a100``)
The Automodel backend used for SpeechLM2 **does not require any compiled dependencies — it runs without them.** The ``compiled`` and ``compiled-a100`` extras are an *optional* performance add-on: when their source-built GPU kernels are installed, Automodel can route to dedicated accelerated backends (FP8 Transformer kernels via Transformer Engine, FlashAttention, Mamba/state-space layers, and Mixture-of-Experts ops). They contain:
83
+
84
+
.. list-table::
85
+
:widths: 30 70
86
+
:header-rows: 1
87
+
88
+
* - Package
89
+
- Purpose
90
+
* - ``transformer-engine``
91
+
- NVIDIA Transformer Engine — FP8 and accelerated Transformer kernels
- Grouped GEMM kernels for Mixture-of-Experts (MoE) layers
98
+
* - ``deep_ep`` (DeepEP)
99
+
- Expert-parallel communication kernels for MoE (``compiled`` only — see below)
100
+
* - ``onnx-ir`` + ``onnxscript``
101
+
- Pinned ONNX export tooling
102
+
103
+
Choose the variant that matches your GPU (the two are mutually exclusive):
104
+
105
+
* ``compiled`` — Hopper/Blackwell and newer (SM90/SM100/SM120, e.g. H100/H200/B200). Includes DeepEP.
106
+
* ``compiled-a100`` — Ampere A100 (SM80). Omits DeepEP, which requires a separately-built, patched version on A100.
107
+
108
+
.. warning::
80
109
81
-
For the latest development version or if you plan to contribute, clone the repository and install in editable mode.
110
+
These packages **build from source** and need a full CUDA build environment — build tools, matching ``TORCH_CUDA_ARCH_LIST`` / ``NVTE_CUDA_ARCHS`` flags, ``--no-build-isolation``, and (for ``compiled``) extra manual build steps that the Dockerfile performs (e.g. flash-attn-4 and DeepEP patches). The supported, reproducible way to get them is the container build, which sets all of this up for you:
82
111
83
-
The ``test`` extra pulls in **pytest and tooling for the test suite**. It does **not** install NeMo collection dependencies (ASR, TTS, audio, etc.). Add those extras explicitly or imports like ``nemo.collections.asr`` will fail.
Prefer your own Python/PyTorch/CUDA? Install your preferred PyTorch first (any version ≥ 2.6, built for your CUDA — see `PyTorch's install matrix <https://pytorch.org/get-started/locally/>`_), then add NeMo with the collections you need. Because ``nemo-toolkit`` only requires ``torch>=2.6``, your pre-installed PyTorch is kept, not replaced:
147
+
148
+
.. code-block:: bash
149
+
150
+
# 1) Your choice of PyTorch (example: CUDA 12.6 build). Skip if you already have one.
pip install nemo_toolkit[asr,tts] # also: [asr,tts,audio], [speechlm2], etc.
155
+
156
+
To have pip install our pinned PyTorch build instead, add the matching CUDA extra **and** the PyTorch wheel index. pip does not read uv's index configuration, so the ``--extra-index-url`` is required:
157
+
158
+
.. code-block:: bash
159
+
160
+
pip install 'nemo-toolkit[asr,tts,cu13]' --extra-index-url https://download.pytorch.org/whl/cu132 # CUDA 13.x
161
+
pip install 'nemo-toolkit[asr,tts,cu12]' --extra-index-url https://download.pytorch.org/whl/cu126 # CUDA 12.x
162
+
163
+
.. tip::
99
164
100
-
NVIDIA provides Docker containers with NeMo pre-installed. Check the `NeMo GitHub releases <https://github.com/NVIDIA/NeMo/releases>`_ for the latest container tags.
165
+
Prefer a conda environment? Create and activate one (``conda create -n nemo python=3.10 -y && conda activate nemo``), then run the same ``uv`` or ``pip`` commands above inside it. NeMo Speech does not require a separate conda CUDA toolkit or a manual ``torchvision`` install.
0 commit comments