You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: USER_GUIDE.md
+41-30Lines changed: 41 additions & 30 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -123,20 +123,24 @@ Test Scenario uses Test description from the previous step. Below is the `myconf
123
123
```toml
124
124
name = "nccl-test"
125
125
126
-
[Tests.1]
127
-
name = "nccl_test_all_reduce_single_node"
128
-
time_limit = "00:20:00"
129
-
130
-
[Tests.2]
131
-
name = "nccl_test_all_reduce_single_node"
132
-
time_limit = "00:20:00"
133
-
[Tests.2.dependencies]
134
-
start_post_comp = { name = "Tests.1", time = 0 }
126
+
[[Tests]]
127
+
id = "Tests.1"
128
+
test_name = "nccl_test_all_reduce_single_node"
129
+
time_limit = "00:20:00"
130
+
131
+
[[Tests]]
132
+
id = "Tests.2"
133
+
test_name = "nccl_test_all_reduce_single_node"
134
+
time_limit = "00:20:00"
135
+
[[Tests.dependencies]]
136
+
type = "start_post_comp"
137
+
id = "Tests.1"
138
+
time = 0
135
139
```
136
140
137
141
Notes on the test scenario:
138
-
1.`name` is a mandatory filed. Other fields describe arbitrary number of tests and their dependencies.
139
-
1. The `name` of the tests should be found in the test schema files. Node lists and time limits are optional.
142
+
1.`id` is a mandatory filed and must be uniq for each test.
143
+
1. The `test_name` specifies test definition from one of the Test TOML files. Node lists and time limits are optional.
140
144
1. If needed, `nodes` should be described as a list of node names as shown in a Slurm system. Alternatively, if groups are defined in the system schema, you can ask CloudAI to allocate a specific number of nodes from a specified partition and group. For example `nodes = ['PARTITION:GROUP:16']`: 16 nodes are allocated from a group `GROUP`, from a partition `PARTITION`.
141
145
1. There are three types of dependencies: `start_post_comp`, `start_post_init` and `end_post_comp`.
142
146
1.`start_post_comp` means that the current test should be started after a specific delay of the completion of the depending test.
## Describing a Test Scenario in the Test Scenario Schema
245
249
A test scenario is a set of tests with specific dependencies between them. A test scenario is described in a TOML schema file. This is an example of a test scenario file:
246
-
```
250
+
```toml
247
251
name = "nccl-test"
248
252
249
-
[Tests.1]
250
-
name = "nccl_test_all_reduce"
251
-
num_nodes = "2"
252
-
time_limit = "00:20:00"
253
-
254
-
[Tests.2]
255
-
name = "nccl_test_all_gather"
256
-
num_nodes = "2"
257
-
time_limit = "00:20:00"
258
-
[Tests.2.dependencies]
259
-
start_post_comp = { name = "Tests.1", time = 0 }
260
-
261
-
[Tests.3]
262
-
name = "nccl_test_reduce_scatter"
263
-
num_nodes = "2"
264
-
time_limit = "00:20:00"
265
-
[Tests.3.dependencies]
266
-
start_post_comp = { name = "Tests.2", time = 0 }
253
+
[[Tests]]
254
+
id = "Tests.1"
255
+
test_name = "nccl_test_all_reduce"
256
+
num_nodes = "2"
257
+
time_limit = "00:20:00"
258
+
259
+
[[Tests]]
260
+
id = "Tests.2"
261
+
test_name = "nccl_test_all_gather"
262
+
num_nodes = "2"
263
+
time_limit = "00:20:00"
264
+
[[Tests.dependencies]]
265
+
type = "start_post_comp"
266
+
id = "Tests.1"
267
+
time = 0
268
+
269
+
[[Tests]]
270
+
id = "Tests.3"
271
+
templat_test = "nccl_test_reduce_scatter"
272
+
num_nodes = "2"
273
+
time_limit = "00:20:00"
274
+
[[Tests.dependencies]]
275
+
type = "start_post_comp"
276
+
id = "Tests.2"
277
+
time = 0
267
278
```
268
279
269
280
The `name` field is the test scenario name, which can be any unique identifier for the scenario. Each test has a section name, following the convention `Tests.1`, `Tests.2`, etc., with an increasing index. The `name` of a test should be specified in this section and must correspond to an entry in the test schema. If a test in a test scenario is not present in the test schema, CloudAI will not be able to identify it.
0 commit comments