Skip to content

Commit 33ab41e

Browse files
Merge pull request #765 from srivatsankrishnan/m-bridge
M bridge Documentation
2 parents 99f9158 + 5bb5d53 commit 33ab41e

File tree

3 files changed

+83
-1
lines changed

3 files changed

+83
-1
lines changed

conf/experimental/megatron_bridge/test/megatron_bridge_qwen_30b.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,4 +32,4 @@ domain = "llm"
3232
task = "pretrain"
3333
compute_dtype = "fp8_mx"
3434

35-
hf_token = "REPLACE_ME_WITH_HF_TOKEN"
35+
hf_token = ""

doc/workloads/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ Available Workloads
1818
":doc:`deepep`", "✅", "❌", "❌", "❌"
1919
":doc:`jax_toolbox`", "✅", "❌", "❌", "❌"
2020
"MegatronRun", "✅", "❌", "❌", "❌"
21+
":doc:`megatron_bridge`", "✅", "❌", "❌", "❌"
2122
":doc:`nccl`", "✅", "✅", "✅", "❌"
2223
":doc:`nemo_launcher`", "✅", "❌", "❌", "❌"
2324
":doc:`nemo_run`", "✅", "❌", "❌", "❌"

doc/workloads/megatron_bridge.rst

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
MegatronBridge
2+
==============
3+
4+
This workload (`test_template_name` is ``MegatronBridge``) submits training and finetuning tasks based on Megatron-Bridge framework.
5+
6+
7+
Usage Examples
8+
-------------
9+
10+
Test TOML example:
11+
12+
.. code-block:: toml
13+
14+
name = "megatron_bridge_qwen_30b"
15+
description = "Megatron-Bridge run via CloudAI SlurmSystem for Qwen3 30B A3B"
16+
test_template_name = "MegatronBridge"
17+
18+
[cmd_args]
19+
# Container can be an NGC/enroot URL (nvcr.io#...) or a local .sqsh path.
20+
container_image = "nvcr.io#nvidia/nemo:25.11.01"
21+
22+
model_name = "qwen3"
23+
model_size = "30b_a3b"
24+
task = "pretrain"
25+
domain = "llm"
26+
compute_dtype = "fp8_mx"
27+
28+
hf_token = "hf_xxx"
29+
30+
Test Scenario example:
31+
32+
.. code-block:: toml
33+
34+
name = "megatron_bridge_qwen_30b"
35+
36+
[[Tests]]
37+
id = "megatron_bridge_qwen_30b"
38+
test_name = "megatron_bridge_qwen_30b"
39+
num_nodes = "2"
40+
41+
Test-in-Scenario example:
42+
43+
.. code-block:: toml
44+
45+
name = "megatron-bridge-test"
46+
47+
[[Tests]]
48+
id = "mbridge.1"
49+
num_nodes = 2
50+
time_limit = "00:30:00"
51+
52+
name = "megatron_bridge_qwen_30b"
53+
description = "Megatron-Bridge run via CloudAI SlurmSystem for Qwen3 30B A3B"
54+
test_template_name = "MegatronBridge"
55+
56+
[Tests.cmd_args]
57+
container_image = "nvcr.io#nvidia/nemo:25.11.01"
58+
model_name = "qwen3"
59+
model_size = "30b_a3b"
60+
task = "pretrain"
61+
domain = "llm"
62+
compute_dtype = "fp8_mx"
63+
hf_token = "hf_xxx"
64+
65+
66+
API Documentation
67+
-----------------
68+
69+
Command Arguments
70+
~~~~~~~~~~~~~~~~~
71+
72+
.. autoclass:: cloudai.workloads.megatron_bridge.megatron_bridge.MegatronBridgeCmdArgs
73+
:members:
74+
:show-inheritance:
75+
76+
Test Definition
77+
~~~~~~~~~~~~~~~
78+
79+
.. autoclass:: cloudai.workloads.megatron_bridge.megatron_bridge.MegatronBridgeTestDefinition
80+
:members:
81+
:show-inheritance:

0 commit comments

Comments
 (0)