You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
4. With CI/CD process in place, the PR will be accepted and the corresponding issue closed only after adequate testing has been completed, manually, by the developer and NVRx engineer reviewing the code.
85
85
86
+
#### Documentation Building
87
+
88
+
When contributing documentation changes, ensure the documentation builds correctly. See the [docs CI workflow](https://github.com/NVIDIA/nvidia-resiliency-ext/blob/main/.github/workflows/build_docs.yml) for up-to-date instructions:
You can then view the locally built documentation under `public` directory or `docs/build/html` (e.g., `open public/index.html`). Ensure that all documentation changes are properly formatted and that the build completes without warnings or errors.
Copy file name to clipboardExpand all lines: docs/source/checkpointing/async/usage_guide.rst
-1Lines changed: 0 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,7 +3,6 @@ Usage guide
3
3
The :py:class:`nvidia_resiliency_ext.checkpointing.async_ckpt.core.AsyncCallsQueue`
4
4
provides application users with an interface to schedule :py:class:`nvidia_resiliency_ext.checkpointing.async_ckpt.core.AsyncRequest`,
5
5
which defines checkpoint routine, its args/kwargs and finalization steps when the checkpoint routine is finished.
6
-
This class is a singleton, implying each rank will have only one instance of this class.
7
6
It is recommended to call the `close()` API on the `AsyncCallsQueue` at the end of training to ensure a clean shutdown of the process that manages async checkpointing.
8
7
We also extend the API of `abort_nvrx_checkpoint()` to abort the async processes and cleanly restart the `AsyncCallsQueue` in case of any restarts of the training processes.
0 commit comments