Skip to content

Commit a170baa

Browse files
authored
Merge pull request #686 from NVIDIA/am/workloads-doc
Add documentation for workloads
2 parents cd5ada8 + 3f4f8cf commit a170baa

File tree

15 files changed

+756
-2
lines changed

15 files changed

+756
-2
lines changed

.github/workflows/ci.yml

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ jobs:
1818
uses: actions/setup-python@v5
1919

2020
- name: Install dependencies
21-
run: pip install '.[dev]'
21+
run: pip install '.[dev,docs]'
2222

2323
- name: Run ruff linter
2424
run: ruff check
@@ -42,6 +42,14 @@ jobs:
4242
4343
taplo fmt --check --diff
4444
45+
- name: Build documentation
46+
run: |
47+
set -eE
48+
set -o pipefail
49+
50+
cd doc
51+
make html
52+
4553
test:
4654
name: Run pytest
4755

doc/conf.py

Lines changed: 43 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,41 @@
33
# For the full list of built-in configuration values, see the documentation:
44
# https://www.sphinx-doc.org/en/master/usage/configuration.html
55

6+
import os
7+
import re
8+
import sys
9+
10+
# Add the project source to Python path for autodoc
11+
sys.path.insert(0, os.path.abspath("../src"))
12+
13+
14+
# Custom autodoc processing to clean up Pydantic classes
15+
def autodoc_skip_member(app, what, name, obj, skip, options):
16+
"""Skip unwanted Pydantic and other internal members."""
17+
exclude_patterns = {re.compile(r"model_.*")}
18+
19+
if any(pattern.match(name) for pattern in exclude_patterns):
20+
return True
21+
22+
# Skip private methods starting with underscore (except __init__)
23+
if name.startswith("_") and name != "__init__":
24+
return True
25+
26+
return skip
27+
28+
29+
def setup(app):
30+
app.connect("autodoc-skip-member", autodoc_skip_member)
31+
32+
633
# -- Project information -----------------------------------------------------
734
# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information
835

936
project = "CloudAI"
1037
copyright = "2025, NVIDIA CORPORATION & AFFILIATES"
1138
author = "NVIDIA CORPORATION & AFFILIATES"
39+
version = "1.4.0-beta"
40+
release = "1.4.0-beta"
1241

1342
# -- General configuration ---------------------------------------------------
1443
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
@@ -17,12 +46,25 @@
1746
"sphinx.ext.autodoc",
1847
"sphinx.ext.viewcode",
1948
"sphinx.ext.napoleon",
49+
"sphinx.ext.autosummary",
2050
"myst_parser",
2151
"sphinxcontrib.mermaid",
52+
"sphinx_copybutton",
2253
]
2354

2455
exclude_patterns = ["_build"]
2556

57+
# -- Autodoc configuration ---------------------------------------------------
58+
autodoc_default_options = {
59+
"members": True,
60+
"member-order": "bysource",
61+
"special-members": "__init__",
62+
"undoc-members": False, # Don't show undocumented members
63+
}
64+
65+
# Generate autosummary even if no references
66+
autosummary_generate = True
67+
2668
# -- Options for HTML output -------------------------------------------------
2769
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output
2870

@@ -35,7 +77,7 @@
3577
"html_image",
3678
]
3779

38-
# Configure MyST to handle mermaid code blocks properly
80+
# Configure MyST to handle code blocks as directives
3981
myst_fence_as_directive = ["mermaid"]
4082

4183
# Mermaid configuration

doc/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -172,4 +172,5 @@ For more detailed instructions and guidance, including advanced usage and troubl
172172
DEV
173173
ai_dynamo
174174
reporting
175+
workloads/index
175176
```

doc/workloads/ai_dynamo.rst

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
AI Dynamo
2+
=========
3+
4+
This workload (`test_template_name` is ``AIDynamo``) runs AI inference benchmarks using the Dynamo framework with distributed prefill and decode workers.
5+
6+
7+
Usage Example
8+
-------------
9+
10+
See :doc:`../ai_dynamo` for details.
11+
12+
API Documentation
13+
-----------------
14+
15+
Command Arguments
16+
~~~~~~~~~~~~~~~~~
17+
18+
.. autoclass:: cloudai.workloads.ai_dynamo.ai_dynamo.AIDynamoCmdArgs
19+
:members:
20+
:show-inheritance:
21+
22+
Test Definition
23+
~~~~~~~~~~~~~~~
24+
25+
.. autoclass:: cloudai.workloads.ai_dynamo.ai_dynamo.AIDynamoTestDefinition
26+
:members:
27+
:show-inheritance:

doc/workloads/bash_cmd.rst

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
Bash Command
2+
============
3+
4+
This workload (`test_template_name` is ``BashCmd``) allows you to execute arbitrary bash commands within the CloudAI framework. This is useful for simple scripts, custom testing commands, or integrating external tools.
5+
6+
``cmd`` specified in the ``cmd_args`` section will be added as-is into generated sbatch script.
7+
8+
Usage Example
9+
-------------
10+
11+
Test TOML example:
12+
13+
.. code-block:: toml
14+
15+
name = "my_bash_test"
16+
description = "Example bash command test"
17+
test_template_name = "BashCmd"
18+
19+
[cmd_args]
20+
cmd = "echo 'Hello from CloudAI!'"
21+
22+
Test Scenario example:
23+
24+
.. code-block:: toml
25+
26+
name = "bash-test"
27+
28+
[[Tests]]
29+
id = "bash.1"
30+
num_nodes = 1
31+
time_limit = "00:05:00"
32+
33+
test_name = "my_bash_test"
34+
35+
Test-in-Scenario example:
36+
37+
.. code-block:: toml
38+
39+
name = "bash-test"
40+
41+
[[Tests]]
42+
id = "bash.1"
43+
num_nodes = 1
44+
time_limit = "00:05:00"
45+
46+
name = "my_bash_test"
47+
description = "Example bash command test"
48+
test_template_name = "BashCmd"
49+
50+
[Tests.cmd_args]
51+
cmd = "echo 'Hello from CloudAI!'"
52+
53+
API Documentation
54+
---------------------------------
55+
56+
Command Arguments
57+
~~~~~~~~~~~~~~~~~
58+
59+
.. autoclass:: cloudai.workloads.bash_cmd.bash_cmd.BashCmdArgs
60+
:members:
61+
:show-inheritance:
62+
63+
Test Definition
64+
~~~~~~~~~~~~~~~
65+
66+
.. autoclass:: cloudai.workloads.bash_cmd.bash_cmd.BashCmdTestDefinition
67+
:members:
68+
:show-inheritance:

doc/workloads/chakra_replay.rst

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
Chakra Replay
2+
=============
3+
4+
This workload (`test_template_name` is ``ChakraReplay``) replays execution traces from the Chakra execution trace format for performance analysis and debugging.
5+
6+
Usage Example
7+
-------------
8+
9+
Test TOML example:
10+
11+
.. code-block:: toml
12+
13+
name = "my_chakra_test"
14+
description = "Example Chakra replay test"
15+
test_template_name = "ChakraReplay"
16+
17+
[cmd_args]
18+
trace_path = "/path/to/trace.et"
19+
20+
Test Scenario example:
21+
22+
.. code-block:: toml
23+
24+
name = "chakra-replay-test"
25+
26+
[[Tests]]
27+
id = "chakra.1"
28+
num_nodes = 1
29+
time_limit = "00:10:00"
30+
31+
test_name = "my_chakra_test"
32+
33+
Test-in-Scenario example:
34+
35+
.. code-block:: toml
36+
37+
name = "chakra-replay-test"
38+
39+
[[Tests]]
40+
id = "chakra.1"
41+
num_nodes = 1
42+
time_limit = "00:10:00"
43+
44+
name = "my_chakra_test"
45+
description = "Example Chakra replay test"
46+
test_template_name = "ChakraReplay"
47+
48+
[Tests.cmd_args]
49+
trace_path = "/path/to/trace.et"
50+
51+
API Documentation
52+
-----------------
53+
54+
Command Arguments
55+
~~~~~~~~~~~~~~~~~
56+
57+
.. autoclass:: cloudai.workloads.chakra_replay.chakra_replay.ChakraReplayCmdArgs
58+
:members:
59+
:show-inheritance:
60+
61+
Test Definition
62+
~~~~~~~~~~~~~~~
63+
64+
.. autoclass:: cloudai.workloads.chakra_replay.chakra_replay.ChakraReplayTestDefinition
65+
:members:
66+
:show-inheritance:

doc/workloads/index.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# Workloads Documentation
2+
3+
This section contains automatically generated documentation for all CloudAI workloads. Each workload provides specific functionality for running different types of tests and benchmarks.
4+
5+
## Available Workloads
6+
7+
```{toctree}
8+
:maxdepth: 1
9+
:caption: Workloads:
10+
11+
ai_dynamo
12+
bash_cmd
13+
chakra_replay
14+
nccl
15+
nemo_run
16+
nixl_bench
17+
nixl_kvbench
18+
nixl_perftest
19+
sleep
20+
slurm_container
21+
```
22+
23+
## Adding New Workloads
24+
25+
To add documentation for a new workload:
26+
27+
1. **Add docstrings** to your Python classes and methods
28+
1. **Create a markdown file** in `doc/workloads/` (e.g., `my_workload.md`)
29+
1. **Add it to the toctree** in this index file
30+
31+
The documentation will be automatically generated during the build process!

doc/workloads/nccl.rst

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
NCCL
2+
====
3+
4+
This workload (`test_template_name` is ``NcclTest``) allows you to execute NCCL benchmarks within the CloudAI framework.
5+
6+
Usage Example
7+
-------------
8+
9+
Test TOML example:
10+
11+
.. code-block:: toml
12+
13+
name = "my_nccl_test"
14+
description = "Example bash command test"
15+
test_template_name = "NcclTest"
16+
17+
[cmd_args]
18+
docker_image_url = "nvcr.io#nvidia/pytorch:25.06-py3"
19+
20+
Test Scenario example:
21+
22+
.. code-block:: toml
23+
24+
name = "nccl-test"
25+
26+
[[Tests]]
27+
id = "nccl.1"
28+
num_nodes = 1
29+
time_limit = "00:05:00"
30+
31+
test_name = "my_nccl_test"
32+
33+
Test-in-Scenario example:
34+
35+
.. code-block:: toml
36+
37+
name = "nccl-test"
38+
39+
[[Tests]]
40+
id = "nccl.1"
41+
num_nodes = 1
42+
time_limit = "00:05:00"
43+
44+
name = "my_nccl_test"
45+
description = "Example bash command test"
46+
test_template_name = "NcclTest"
47+
48+
[Tests.cmd_args]
49+
docker_image_url = "nvcr.io#nvidia/pytorch:25.06-py3"
50+
subtest_name = "all_reduce_perf_mpi"
51+
iters = 100
52+
53+
API Documentation
54+
---------------------------------
55+
56+
Command Arguments
57+
~~~~~~~~~~~~~~~~~
58+
59+
.. autoclass:: cloudai.workloads.nccl_test.nccl.NCCLCmdArgs
60+
:members:
61+
:show-inheritance:
62+
63+
Test Definition
64+
~~~~~~~~~~~~~~~
65+
66+
.. autoclass:: cloudai.workloads.nccl_test.nccl.NCCLTestDefinition
67+
:members:
68+
:show-inheritance:

0 commit comments

Comments
 (0)