You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs: extend install consistency sweep + clarify A100 CUDA support
Incorporate the useful parts of a parallel install-docs review and apply a
broader consistency pass:
- Distinguish uv sync --locked (exact supported baseline; add --python 3.13)
from uv pip / pip (bring-your-own), with a warning not to use uv sync --locked
for BYO. Offer uv pip alongside pip for the fallback path.
- Clarify A100: works with BOTH CUDA 12 and CUDA 13 — CUDA 13 (default base
image) recommended, CUDA 12 base offered only as a convenience.
- Broaden PyTorch targets to CPU/CUDA/ROCm/Apple Silicon; note cu12/cu13 also
add the matching CUDA Python deps (cuda-python, numba-cuda).
- Route scattered pages to the canonical install guide via :ref:`installation`
(g2p, magpietts-finetuning, nemo_forced_aligner) and modernize index.rst /
speechlm2/intro.rst snippets; add a docker run example and a lighter
import-only verify step.
- Align docs build with CI (uv sync --locked --group docs; uv run make linkcheck);
prune the now-fixed nemo_forced_aligner entry from the broken-links list.
- Normalize stale install references in the model-card template, NFA tool docs,
and runtime error messages (nemo-toolkit name; NVIDIA-NeMo/NeMo clone URL).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: README.md
+10-7Lines changed: 10 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -52,7 +52,7 @@ For technical documentation, please see the
52
52
NeMo Speech works with the **Python, PyTorch, and CUDA versions of your choosing**:
53
53
54
54
- Python 3.10 or above
55
-
- PyTorch 2.6 or above
55
+
- PyTorch 2.6 or above (CPU, CUDA, ROCm, or Apple Silicon build — your choice)
56
56
- NVIDIA GPU + CUDA (required for training; recommended for inference)
57
57
58
58
If you already have a Python/PyTorch/CUDA stack, NeMo Speech installs on top of it **without replacing it** — the `nemo-toolkit` package only requires `torch>=2.6`, so your existing PyTorch build is kept (see the install options below). The versions pinned in `uv.lock` and shipped in the official container — Python 3.13, PyTorch 2.12, CUDA 12.6/13.2 — are simply the combination we actively test and support. They make setup turnkey and reproducible, but they are **not** a hard requirement.
@@ -82,7 +82,7 @@ cd NeMo
82
82
uv sync --extra all --extra cu13 # CUDA 13.x (recommended) — use --extra cu12 for CUDA 12.x
83
83
```
84
84
85
-
This installs our supported stack (Python 3.13, PyTorch 2.12, CUDA 13.2) into `.venv/` with NeMo editable. Add `--group test` for the test suite or `--group docs` to build the docs; run tools via `uv run <cmd>` or activate with `source .venv/bin/activate`. On Linux, `cu12` and `cu13` are mutually exclusive — pass exactly one (`cu13` is the default).
85
+
This installs our supported stack (Python 3.13, PyTorch 2.12, CUDA 13.2) into `.venv/` with NeMo editable. Add `--group test` for the test suite or `--group docs` to build the docs; run tools via `uv run <cmd>` or activate with `source .venv/bin/activate`. On Linux, `cu12` and `cu13` are mutually exclusive — pass exactly one (`cu13` is the default). For the **exact** container baseline, add `--locked --python 3.13` (the path the Dockerfile and CI use).
86
86
87
87
> **SpeechLM2 / Automodel:** the Automodel backend runs **without** any compiled dependencies. It can *optionally* benefit from dedicated accelerated backends (Transformer Engine, FlashAttention, Mamba, grouped-GEMM/MoE, DeepEP) for better performance — these source-built kernels come from the `compiled` (Hopper/Blackwell) or `compiled-a100` (A100) extras, built by `docker/Dockerfile` (`GPU_TARGET=h100plus` / `a100`). See the [installation guide](https://docs.nvidia.com/nemo/speech/nightly/) for the full list and build details.
88
88
@@ -95,20 +95,23 @@ To build the container from source (CUDA 13 / H100+ by default):
docker run --rm -it --gpus all -v "$PWD:/workspace" nemo-speech bash
99
100
```
100
101
101
-
See the header of [`docker/Dockerfile`](docker/Dockerfile) for CUDA 12 / A100 build arguments (`BASE_IMAGE`, `GPU_TARGET`).
102
+
For A100, set `GPU_TARGET=a100` — A100 works with **both CUDA 12 and CUDA 13** (CUDA 13, the default base image, is recommended; the CUDA 12 base is a convenience). See the header of [`docker/Dockerfile`](docker/Dockerfile) for all build arguments (`BASE_IMAGE`, `GPU_TARGET`).
102
103
103
104
### From PyPI with pip (fallback — bring your own versions)
104
105
105
-
Prefer your own Python/PyTorch/CUDA? `nemo-toolkit` only requires `torch>=2.6`, so install your PyTorch first (any version ≥ 2.6 for your CUDA— see the [PyTorch install matrix](https://pytorch.org/get-started/locally/)), then add NeMo and it **keeps your build**:
106
+
Prefer your own Python/PyTorch/CUDA? `nemo-toolkit` only requires `torch>=2.6`, so install your PyTorch first (any version ≥ 2.6 for your CPU/CUDA/ROCm/Apple Silicon target — see the [PyTorch install matrix](https://pytorch.org/get-started/locally/)), then add NeMo and it **keeps your build**. `uv pip` (uv's fast, pip-compatible installer) works like `pip`:
106
107
107
108
```bash
108
-
pip install nemo_toolkit[asr,tts] # also: [asr,tts,audio], [speechlm2], etc.
109
+
uv pip install 'nemo-toolkit[asr,tts]'# or plain: pip install 'nemo-toolkit[asr,tts]'
109
110
```
110
111
111
-
To have pip install our pinned PyTorch build instead, add the CUDA extra and the matching wheel index (pip does not read uv's index configuration, so `--extra-index-url` is required):
112
+
> ⚠️ Do **not** use `uv sync --locked` for a bring-your-own stack — it applies `uv.lock` and replaces your Python/PyTorch/CUDA with the supported baseline. Use `uv pip`/`pip` here; reserve `uv sync --locked` for reproducing our stack.
113
+
114
+
To instead pull *our* pinned PyTorch build, add the CUDA extra and the matching wheel index (pip/uv pip do not read uv's project index config, so `--extra-index-url` is required):
112
115
113
116
```bash
114
117
pip install 'nemo-toolkit[asr,tts,cu13]' --extra-index-url https://download.pytorch.org/whl/cu132 # CUDA 13.x
Copy file name to clipboardExpand all lines: docs/source/starthere/install.rst
+39-15Lines changed: 39 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,8 +11,9 @@ Prerequisites
11
11
NeMo Speech works with the **Python, PyTorch, and CUDA versions of your choosing**:
12
12
13
13
#. **Python** 3.10 or above
14
-
#. **PyTorch** 2.6 or above
14
+
#. **PyTorch** 2.6 or above, for your chosen target (CPU, CUDA, ROCm, or Apple Silicon)
15
15
#. **NVIDIA GPU + CUDA** (required for training; CPU-only inference is possible but slow)
16
+
#. **uv** for the fastest source/PyPI workflow (``pip`` also works in a prepared environment)
16
17
17
18
.. admonition:: Bring your own Python / PyTorch / CUDA
18
19
:class: important
@@ -45,7 +46,7 @@ The recommended way to install NeMo Speech is from source with `uv <https://docs
45
46
# uv sync --extra all --extra cu13 --group test
46
47
# uv sync --group docs
47
48
48
-
``uv sync`` creates a virtual environment in ``.venv/`` with NeMo installed in editable mode, matching our supported stack (Python 3.13, PyTorch 2.12, CUDA 13.2 by default). Run commands with ``uv run <cmd>`` or activate the environment with ``source .venv/bin/activate``.
49
+
``uv sync`` creates a virtual environment in ``.venv/`` with NeMo installed in editable mode, matching our supported stack (Python 3.13, PyTorch 2.12, CUDA 13.2 by default). Run commands with ``uv run <cmd>`` or activate the environment with ``source .venv/bin/activate``. For the **exact** container baseline, add ``--locked --python 3.13`` (i.e. ``uv sync --locked --python 3.13 --extra all --extra cu13``) — this is the path the Dockerfile and CI use.
49
50
50
51
On Linux, pass exactly one of ``--extra cu13`` (recommended) or ``--extra cu12`` — they are mutually exclusive. If you omit both, uv installs the generic PyPI PyTorch wheel instead of NVIDIA's CUDA-matched build.
51
52
@@ -68,7 +69,7 @@ Available collection extras (combine with one CUDA extra above):
68
69
* - ``all``
69
70
- All of the collections above
70
71
* - ``cu12`` / ``cu13``
71
-
- Our pinned CUDA 12.x / 13.x PyTorch build (Linux; pick at most one)
72
+
- Our pinned CUDA 12.x / 13.x PyTorch build **plus** the matching CUDA Python deps (``cuda-python``, ``numba-cuda``). Linux; pick at most one.
72
73
73
74
.. note::
74
75
@@ -134,26 +135,46 @@ To build the container from source, use the provided ``docker/Dockerfile`` (CUDA
docker run --rm -it --gpus all -v "$PWD:/workspace" nemo-speech bash
138
140
139
-
See the header of ``docker/Dockerfile`` for CUDA 12 / A100 build arguments (``BASE_IMAGE``, ``GPU_TARGET``).
141
+
For A100, set ``GPU_TARGET=a100``. A100 works with **both CUDA 12 and CUDA 13** — CUDA 13 (the default base image) is recommended; the CUDA 12 base is offered only as a convenience:
142
+
143
+
.. code-block:: bash
144
+
145
+
# A100 on CUDA 13 (recommended) — uses the default CUDA 13 base image
Prefer your own Python/PyTorch/CUDA? Install your preferred PyTorch first (any version ≥ 2.6, built for your CUDA— see `PyTorch's install matrix <https://pytorch.org/get-started/locally/>`_), then add NeMo with the collections you need. Because ``nemo-toolkit`` only requires ``torch>=2.6``, your pre-installed PyTorch is kept, not replaced:
160
+
Prefer your own Python/PyTorch/CUDA? Install your preferred PyTorch first (any version ≥ 2.6for your CPU/CUDA/ROCm/Apple Silicon target — see `PyTorch's install matrix <https://pytorch.org/get-started/locally/>`_), then add NeMo. Because ``nemo-toolkit`` only requires ``torch>=2.6``, your pre-installed PyTorch is kept, not replaced. ``uv pip`` (uv's fast, pip-compatible installer) works just like ``pip``:
147
161
148
162
.. code-block:: bash
149
163
164
+
uv venv --python 3.12 # any Python >= 3.10 your PyTorch supports — or use your own env
165
+
source .venv/bin/activate
166
+
150
167
# 1) Your choice of PyTorch (example: CUDA 12.6 build). Skip if you already have one.
# 2) NeMo — your PyTorch above is kept (plain `pip install` works identically)
171
+
uv pip install 'nemo-toolkit[asr,tts]'# also: [asr,tts,audio], [speechlm2], etc.
152
172
153
-
# 2) NeMo — your PyTorch above is kept
154
-
pip install nemo_toolkit[asr,tts] # also: [asr,tts,audio], [speechlm2], etc.
173
+
.. warning::
174
+
175
+
Do **not** use ``uv sync --locked`` for a bring-your-own stack — it intentionally applies ``uv.lock`` and replaces your Python/PyTorch/CUDA with the supported container baseline. Use ``uv pip`` (or ``pip``) here; reserve ``uv sync --locked`` for reproducing the supported stack (above).
155
176
156
-
To have pip install our pinned PyTorch build instead, add the matching CUDA extra **and** the PyTorch wheel index. pip does not read uv's index configuration, so the ``--extra-index-url`` is required:
177
+
To instead have the installer pull *our* pinned PyTorch build, add the matching CUDA extra **and** the PyTorch wheel index (``pip`` / ``uv pip`` do not read uv's project index config, so ``--extra-index-url`` is required):
157
178
158
179
.. code-block:: bash
159
180
@@ -167,16 +188,19 @@ To have pip install our pinned PyTorch build instead, add the matching CUDA extr
167
188
Verify Installation
168
189
-------------------
169
190
170
-
After installing, verify that NeMo is working:
191
+
After installing, verify that the chosen collection imports:
192
+
193
+
.. code-block:: bash
194
+
195
+
python -c "import nemo.collections.asr as nemo_asr; print('NeMo ASR installed')"
196
+
197
+
If you installed with ``uv sync`` and have not activated ``.venv``, run the check through ``uv run python``. To also exercise a model download:
171
198
172
199
.. code-block:: python
173
200
174
201
import nemo.collections.asr as nemo_asr
175
-
print("NeMo ASR installed successfully!")
176
-
177
-
# Quick test: load a pretrained model
178
202
model = nemo_asr.models.ASRModel.from_pretrained("nvidia/parakeet-tdt-0.6b-v2")
Copy file name to clipboardExpand all lines: docs/source/tts/magpietts-finetuning.rst
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,7 +20,7 @@ Before finetuning, you will need:
20
20
- A pretrained Magpie-TTS checkpoint (``pretrained.ckpt`` or ``pretrained.nemo``). Public checkpoints (``https://huggingface.co/nvidia/magpie_tts_multilingual_357m``) are available on Hugging Face.
21
21
- The audio codec model (``https://huggingface.co/nvidia/nemo-nano-codec-22khz-1.89kbps-21.5fps``), available on Hugging Face alongside the TTS checkpoint.
22
22
- A prepared dataset. For faster finetuning audio codec tokens must be pre-extracted from your audio files. See the *Dataset Preparation* section below.
23
-
- NeMo installed from source or via the NeMo container. See the `NeMo GitHub page <https://github.com/NVIDIA/NeMo>`_ for installation instructions.
23
+
- NeMo installed from source or with the local Dockerfile. See :ref:`installation` for installation instructions.
0 commit comments