Skip to content

Commit dee9c27

Browse files
pzelaskoclaude
andcommitted
docs: extend install consistency sweep + clarify A100 CUDA support
Incorporate the useful parts of a parallel install-docs review and apply a broader consistency pass: - Distinguish uv sync --locked (exact supported baseline; add --python 3.13) from uv pip / pip (bring-your-own), with a warning not to use uv sync --locked for BYO. Offer uv pip alongside pip for the fallback path. - Clarify A100: works with BOTH CUDA 12 and CUDA 13 — CUDA 13 (default base image) recommended, CUDA 12 base offered only as a convenience. - Broaden PyTorch targets to CPU/CUDA/ROCm/Apple Silicon; note cu12/cu13 also add the matching CUDA Python deps (cuda-python, numba-cuda). - Route scattered pages to the canonical install guide via :ref:`installation` (g2p, magpietts-finetuning, nemo_forced_aligner) and modernize index.rst / speechlm2/intro.rst snippets; add a docker run example and a lighter import-only verify step. - Align docs build with CI (uv sync --locked --group docs; uv run make linkcheck); prune the now-fixed nemo_forced_aligner entry from the broken-links list. - Normalize stale install references in the model-card template, NFA tool docs, and runtime error messages (nemo-toolkit name; NVIDIA-NeMo/NeMo clone URL). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent d69f2eb commit dee9c27

18 files changed

Lines changed: 72 additions & 53 deletions

File tree

CLAUDE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ Markers: `unit`, `integration`, `system`, `pleasefixme` (broken — skip), `skip
4545
Sphinx-based docs live in `docs/source/`. Build with:
4646

4747
```bash
48-
uv sync --group docs # one-time setup
48+
uv sync --locked --group docs # one-time setup (matches CI)
4949
uv run make -C docs clean html # full rebuild
5050
uv run make -C docs html # incremental rebuild
5151
```

README.md

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ For technical documentation, please see the
5252
NeMo Speech works with the **Python, PyTorch, and CUDA versions of your choosing**:
5353

5454
- Python 3.10 or above
55-
- PyTorch 2.6 or above
55+
- PyTorch 2.6 or above (CPU, CUDA, ROCm, or Apple Silicon build — your choice)
5656
- NVIDIA GPU + CUDA (required for training; recommended for inference)
5757

5858
If you already have a Python/PyTorch/CUDA stack, NeMo Speech installs on top of it **without replacing it** — the `nemo-toolkit` package only requires `torch>=2.6`, so your existing PyTorch build is kept (see the install options below). The versions pinned in `uv.lock` and shipped in the official container — Python 3.13, PyTorch 2.12, CUDA 12.6/13.2 — are simply the combination we actively test and support. They make setup turnkey and reproducible, but they are **not** a hard requirement.
@@ -82,7 +82,7 @@ cd NeMo
8282
uv sync --extra all --extra cu13 # CUDA 13.x (recommended) — use --extra cu12 for CUDA 12.x
8383
```
8484

85-
This installs our supported stack (Python 3.13, PyTorch 2.12, CUDA 13.2) into `.venv/` with NeMo editable. Add `--group test` for the test suite or `--group docs` to build the docs; run tools via `uv run <cmd>` or activate with `source .venv/bin/activate`. On Linux, `cu12` and `cu13` are mutually exclusive — pass exactly one (`cu13` is the default).
85+
This installs our supported stack (Python 3.13, PyTorch 2.12, CUDA 13.2) into `.venv/` with NeMo editable. Add `--group test` for the test suite or `--group docs` to build the docs; run tools via `uv run <cmd>` or activate with `source .venv/bin/activate`. On Linux, `cu12` and `cu13` are mutually exclusive — pass exactly one (`cu13` is the default). For the **exact** container baseline, add `--locked --python 3.13` (the path the Dockerfile and CI use).
8686

8787
> **SpeechLM2 / Automodel:** the Automodel backend runs **without** any compiled dependencies. It can *optionally* benefit from dedicated accelerated backends (Transformer Engine, FlashAttention, Mamba, grouped-GEMM/MoE, DeepEP) for better performance — these source-built kernels come from the `compiled` (Hopper/Blackwell) or `compiled-a100` (A100) extras, built by `docker/Dockerfile` (`GPU_TARGET=h100plus` / `a100`). See the [installation guide](https://docs.nvidia.com/nemo/speech/nightly/) for the full list and build details.
8888
@@ -95,20 +95,23 @@ To build the container from source (CUDA 13 / H100+ by default):
9595
```bash
9696
git clone https://github.com/NVIDIA-NeMo/NeMo.git
9797
cd NeMo
98-
docker buildx build -f docker/Dockerfile -t nemo-speech .
98+
docker buildx build -f docker/Dockerfile -t nemo-speech . # CUDA 13 / H100+ (default)
99+
docker run --rm -it --gpus all -v "$PWD:/workspace" nemo-speech bash
99100
```
100101

101-
See the header of [`docker/Dockerfile`](docker/Dockerfile) for CUDA 12 / A100 build arguments (`BASE_IMAGE`, `GPU_TARGET`).
102+
For A100, set `GPU_TARGET=a100` — A100 works with **both CUDA 12 and CUDA 13** (CUDA 13, the default base image, is recommended; the CUDA 12 base is a convenience). See the header of [`docker/Dockerfile`](docker/Dockerfile) for all build arguments (`BASE_IMAGE`, `GPU_TARGET`).
102103

103104
### From PyPI with pip (fallback — bring your own versions)
104105

105-
Prefer your own Python/PyTorch/CUDA? `nemo-toolkit` only requires `torch>=2.6`, so install your PyTorch first (any version ≥ 2.6 for your CUDA — see the [PyTorch install matrix](https://pytorch.org/get-started/locally/)), then add NeMo and it **keeps your build**:
106+
Prefer your own Python/PyTorch/CUDA? `nemo-toolkit` only requires `torch>=2.6`, so install your PyTorch first (any version ≥ 2.6 for your CPU/CUDA/ROCm/Apple Silicon target — see the [PyTorch install matrix](https://pytorch.org/get-started/locally/)), then add NeMo and it **keeps your build**. `uv pip` (uv's fast, pip-compatible installer) works like `pip`:
106107

107108
```bash
108-
pip install nemo_toolkit[asr,tts] # also: [asr,tts,audio], [speechlm2], etc.
109+
uv pip install 'nemo-toolkit[asr,tts]' # or plain: pip install 'nemo-toolkit[asr,tts]'
109110
```
110111

111-
To have pip install our pinned PyTorch build instead, add the CUDA extra and the matching wheel index (pip does not read uv's index configuration, so `--extra-index-url` is required):
112+
> ⚠️ Do **not** use `uv sync --locked` for a bring-your-own stack — it applies `uv.lock` and replaces your Python/PyTorch/CUDA with the supported baseline. Use `uv pip`/`pip` here; reserve `uv sync --locked` for reproducing our stack.
113+
114+
To instead pull *our* pinned PyTorch build, add the CUDA extra and the matching wheel index (pip/uv pip do not read uv's project index config, so `--extra-index-url` is required):
112115

113116
```bash
114117
pip install 'nemo-toolkit[asr,tts,cu13]' --extra-index-url https://download.pytorch.org/whl/cu132 # CUDA 13.x

docs/README.md

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,12 +2,10 @@
22

33
## Building the Documentation
44

5-
1. Create and activate a virtual environment.
6-
7-
1. Install the documentation dependencies:
5+
1. Install the documentation dependencies into the locked `uv` environment:
86

97
```console
10-
$ uv sync --group docs
8+
$ uv sync --locked --group docs
119
```
1210

1311
1. Build the documentation:
@@ -21,7 +19,7 @@
2119
1. Build the documentation, as described in the preceding section, but use the following command:
2220

2321
```shell
24-
make -C docs clean linkcheck
22+
uv run make -C docs clean linkcheck
2523
```
2624

2725
1. Run the link-checking script:

docs/source/broken_links_needing_review..json

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -6,14 +6,6 @@
66
"uri": "https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/core/optimizer/optimizer.py#L793",
77
"info": "Anchor 'L793' not found"
88
}
9-
{
10-
"filename": "tools/nemo_forced_aligner.rst",
11-
"lineno": 22,
12-
"status": "broken",
13-
"code": 0,
14-
"uri": "https://github.com/NVIDIA/NeMo#installation",
15-
"info": "Anchor 'installation' not found"
16-
}
179
{
1810
"filename": "checkpoints/intro.rst",
1911
"lineno": 28,

docs/source/index.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -57,11 +57,11 @@ What is NeMo?
5757
- **Scalable training** — multi-GPU/multi-node via PyTorch Lightning with mixed-precision support
5858
- **Simple configuration** — YAML-based experiment configs with `Hydra <https://hydra.cc/>`__
5959

60-
Get started in 30 seconds:
60+
Get started (install the PyTorch build for your platform first):
6161

6262
.. code-block:: bash
6363
64-
pip install nemo_toolkit[asr,tts]
64+
uv pip install 'nemo-toolkit[asr,tts]'
6565
6666
.. code-block:: python
6767

docs/source/speechlm2/intro.rst

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,9 @@ SpeechLM2
55
The SpeechLM2 collection is still in active development and the code is likely to keep changing.
66

77
.. note::
8-
Install with ``pip install nemo-toolkit[speechlm2]`` to get all required dependencies including NeMo Automodel.
8+
Install your chosen compatible PyTorch stack first, then install SpeechLM2 with
9+
``uv pip install 'nemo-toolkit[speechlm2]'`` (or, from a source checkout, ``uv pip install -e '.[speechlm2]'``)
10+
to get all required dependencies including NeMo Automodel. See :ref:`installation` for details.
911

1012
SpeechLM2 refers to a collection that augments pre-trained Large Language Models (LLMs) with speech understanding and generation capabilities.
1113

docs/source/starthere/install.rst

Lines changed: 39 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,9 @@ Prerequisites
1111
NeMo Speech works with the **Python, PyTorch, and CUDA versions of your choosing**:
1212

1313
#. **Python** 3.10 or above
14-
#. **PyTorch** 2.6 or above
14+
#. **PyTorch** 2.6 or above, for your chosen target (CPU, CUDA, ROCm, or Apple Silicon)
1515
#. **NVIDIA GPU + CUDA** (required for training; CPU-only inference is possible but slow)
16+
#. **uv** for the fastest source/PyPI workflow (``pip`` also works in a prepared environment)
1617

1718
.. admonition:: Bring your own Python / PyTorch / CUDA
1819
:class: important
@@ -45,7 +46,7 @@ The recommended way to install NeMo Speech is from source with `uv <https://docs
4546
# uv sync --extra all --extra cu13 --group test
4647
# uv sync --group docs
4748
48-
``uv sync`` creates a virtual environment in ``.venv/`` with NeMo installed in editable mode, matching our supported stack (Python 3.13, PyTorch 2.12, CUDA 13.2 by default). Run commands with ``uv run <cmd>`` or activate the environment with ``source .venv/bin/activate``.
49+
``uv sync`` creates a virtual environment in ``.venv/`` with NeMo installed in editable mode, matching our supported stack (Python 3.13, PyTorch 2.12, CUDA 13.2 by default). Run commands with ``uv run <cmd>`` or activate the environment with ``source .venv/bin/activate``. For the **exact** container baseline, add ``--locked --python 3.13`` (i.e. ``uv sync --locked --python 3.13 --extra all --extra cu13``) — this is the path the Dockerfile and CI use.
4950

5051
On Linux, pass exactly one of ``--extra cu13`` (recommended) or ``--extra cu12`` — they are mutually exclusive. If you omit both, uv installs the generic PyPI PyTorch wheel instead of NVIDIA's CUDA-matched build.
5152

@@ -68,7 +69,7 @@ Available collection extras (combine with one CUDA extra above):
6869
* - ``all``
6970
- All of the collections above
7071
* - ``cu12`` / ``cu13``
71-
- Our pinned CUDA 12.x / 13.x PyTorch build (Linux; pick at most one)
72+
- Our pinned CUDA 12.x / 13.x PyTorch build **plus** the matching CUDA Python deps (``cuda-python``, ``numba-cuda``). Linux; pick at most one.
7273

7374
.. note::
7475

@@ -134,26 +135,46 @@ To build the container from source, use the provided ``docker/Dockerfile`` (CUDA
134135
135136
git clone https://github.com/NVIDIA-NeMo/NeMo.git
136137
cd NeMo
137-
docker buildx build -f docker/Dockerfile -t nemo-speech .
138+
docker buildx build -f docker/Dockerfile -t nemo-speech . # CUDA 13 / H100+ (default)
139+
docker run --rm -it --gpus all -v "$PWD:/workspace" nemo-speech bash
138140
139-
See the header of ``docker/Dockerfile`` for CUDA 12 / A100 build arguments (``BASE_IMAGE``, ``GPU_TARGET``).
141+
For A100, set ``GPU_TARGET=a100``. A100 works with **both CUDA 12 and CUDA 13** — CUDA 13 (the default base image) is recommended; the CUDA 12 base is offered only as a convenience:
142+
143+
.. code-block:: bash
144+
145+
# A100 on CUDA 13 (recommended) — uses the default CUDA 13 base image
146+
docker buildx build -f docker/Dockerfile --build-arg GPU_TARGET=a100 -t nemo-speech:a100 .
147+
148+
# A100 on CUDA 12 (convenience)
149+
docker buildx build -f docker/Dockerfile \
150+
--build-arg BASE_IMAGE=nvcr.io/nvidia/cuda-dl-base:25.06-cuda12.9-devel-ubuntu24.04 \
151+
--build-arg GPU_TARGET=a100 -t nemo-speech:a100-cu12 .
152+
153+
See the header of ``docker/Dockerfile`` for all build arguments (``BASE_IMAGE``, ``GPU_TARGET``).
140154

141155
.. _install-from-pypi:
142156

143157
Install from PyPI with pip (fallback — bring your own versions)
144158
---------------------------------------------------------------
145159

146-
Prefer your own Python/PyTorch/CUDA? Install your preferred PyTorch first (any version ≥ 2.6, built for your CUDA — see `PyTorch's install matrix <https://pytorch.org/get-started/locally/>`_), then add NeMo with the collections you need. Because ``nemo-toolkit`` only requires ``torch>=2.6``, your pre-installed PyTorch is kept, not replaced:
160+
Prefer your own Python/PyTorch/CUDA? Install your preferred PyTorch first (any version ≥ 2.6 for your CPU/CUDA/ROCm/Apple Silicon target — see `PyTorch's install matrix <https://pytorch.org/get-started/locally/>`_), then add NeMo. Because ``nemo-toolkit`` only requires ``torch>=2.6``, your pre-installed PyTorch is kept, not replaced. ``uv pip`` (uv's fast, pip-compatible installer) works just like ``pip``:
147161

148162
.. code-block:: bash
149163
164+
uv venv --python 3.12 # any Python >= 3.10 your PyTorch supports — or use your own env
165+
source .venv/bin/activate
166+
150167
# 1) Your choice of PyTorch (example: CUDA 12.6 build). Skip if you already have one.
151-
pip install torch --index-url https://download.pytorch.org/whl/cu126
168+
uv pip install torch --index-url https://download.pytorch.org/whl/cu126
169+
170+
# 2) NeMo — your PyTorch above is kept (plain `pip install` works identically)
171+
uv pip install 'nemo-toolkit[asr,tts]' # also: [asr,tts,audio], [speechlm2], etc.
152172
153-
# 2) NeMo — your PyTorch above is kept
154-
pip install nemo_toolkit[asr,tts] # also: [asr,tts,audio], [speechlm2], etc.
173+
.. warning::
174+
175+
Do **not** use ``uv sync --locked`` for a bring-your-own stack — it intentionally applies ``uv.lock`` and replaces your Python/PyTorch/CUDA with the supported container baseline. Use ``uv pip`` (or ``pip``) here; reserve ``uv sync --locked`` for reproducing the supported stack (above).
155176

156-
To have pip install our pinned PyTorch build instead, add the matching CUDA extra **and** the PyTorch wheel index. pip does not read uv's index configuration, so the ``--extra-index-url`` is required:
177+
To instead have the installer pull *our* pinned PyTorch build, add the matching CUDA extra **and** the PyTorch wheel index (``pip`` / ``uv pip`` do not read uv's project index config, so ``--extra-index-url`` is required):
157178

158179
.. code-block:: bash
159180
@@ -167,16 +188,19 @@ To have pip install our pinned PyTorch build instead, add the matching CUDA extr
167188
Verify Installation
168189
-------------------
169190

170-
After installing, verify that NeMo is working:
191+
After installing, verify that the chosen collection imports:
192+
193+
.. code-block:: bash
194+
195+
python -c "import nemo.collections.asr as nemo_asr; print('NeMo ASR installed')"
196+
197+
If you installed with ``uv sync`` and have not activated ``.venv``, run the check through ``uv run python``. To also exercise a model download:
171198

172199
.. code-block:: python
173200
174201
import nemo.collections.asr as nemo_asr
175-
print("NeMo ASR installed successfully!")
176-
177-
# Quick test: load a pretrained model
178202
model = nemo_asr.models.ASRModel.from_pretrained("nvidia/parakeet-tdt-0.6b-v2")
179-
print(f"Model loaded: {model.__class__.__name__}")
203+
print(f"Loaded: {model.__class__.__name__}")
180204
181205
What's Next?
182206
------------

docs/source/tools/nemo_forced_aligner.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ Demos & Tutorials
1919
Quickstart
2020
----------
2121

22-
1. Install `NeMo <https://github.com/NVIDIA/NeMo#installation>`__.
22+
1. Install NeMo with the ASR collection. See :ref:`installation`.
2323
2. Prepare a NeMo-style manifest containing the paths of audio files you would like to proces, and (optionally) their text.
2424
3. Run NFA's ``align.py`` script with the desired config, e.g.:
2525

docs/source/tts/g2p.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -126,7 +126,7 @@ Using this unknown token forces a G2P model to produce the same masking token as
126126
Requirements
127127
------------
128128

129-
G2P requires the NeMo ASR collection to be installed (``pip install nemo_toolkit[asr]``).
129+
G2P requires the NeMo ASR collection to be installed. See :ref:`installation` and include the ``asr`` extra.
130130

131131

132132
References

docs/source/tts/magpietts-finetuning.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ Before finetuning, you will need:
2020
- A pretrained Magpie-TTS checkpoint (``pretrained.ckpt`` or ``pretrained.nemo``). Public checkpoints (``https://huggingface.co/nvidia/magpie_tts_multilingual_357m``) are available on Hugging Face.
2121
- The audio codec model (``https://huggingface.co/nvidia/nemo-nano-codec-22khz-1.89kbps-21.5fps``), available on Hugging Face alongside the TTS checkpoint.
2222
- A prepared dataset. For faster finetuning audio codec tokens must be pre-extracted from your audio files. See the *Dataset Preparation* section below.
23-
- NeMo installed from source or via the NeMo container. See the `NeMo GitHub page <https://github.com/NVIDIA/NeMo>`_ for installation instructions.
23+
- NeMo installed from source or with the local Dockerfile. See :ref:`installation` for installation instructions.
2424

2525

2626
Dataset Preparation

0 commit comments

Comments
 (0)