Skip to content

Commit ca678b2

Browse files
committed
Fix USERGUIDE and acceptance test paths
1 parent d2c813d commit ca678b2

File tree

2 files changed

+13
-9
lines changed

2 files changed

+13
-9
lines changed

USER_GUIDE.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -48,15 +48,15 @@ CloudAI allows users to package workloads as test templates to facilitate the au
4848
```
4949

5050
#### Step 2: Prepare configuration files
51-
CloudAI is fully configurable via set of TOML configuration files. You can find examples of these files under `conf/`. In this guide, we will use the following configuration files:
51+
CloudAI is fully configurable via set of TOML configuration files. You can find examples of these files under `conf/common`. In this guide, we will use the following configuration files:
5252
1. `myconfig/test_templates/nccl_template.toml` - Describes the test template configuration.
5353
1. `myconfig/system.toml` - Describes the system configuration.
5454
1. `myconfig/tests/nccl_test.toml` - Describes the test to run.
5555
1. `myconfig/scenario.toml` - Describes the test scenario configuration.
5656

5757

5858
#### Step 3: Test Template
59-
Test template config describes all arguments of a test. Let's create a test template file for the NCCL test. You can find more examples of test templates under `conf/test_template/`. Our example will be small for demonstration purposes. Below is the `myconfig/test_templates/nccl_template.toml` file:
59+
Test template config describes all arguments of a test. Let's create a test template file for the NCCL test. You can find more examples of test templates under `conf/common/test_template/`. Our example will be small for demonstration purposes. Below is the `myconfig/test_templates/nccl_template.toml` file:
6060
```toml
6161
name = "NcclTest"
6262

@@ -93,7 +93,7 @@ name = "NcclTest"
9393
Notice that `cmd_args.docker_image_url` uses `nvcr.io/nvidia/pytorch:24.02-py3`, but you can use Docker image from Step 1.
9494

9595
#### Step 3: System Config
96-
System config describes the system configuration. You can find more examples of system configs under `conf/system/`. Our example will be small for demonstration purposes. Below is the `myconfig/system.toml` file:
96+
System config describes the system configuration. You can find more examples of system configs under `conf/common/system/`. Our example will be small for demonstration purposes. Below is the `myconfig/system.toml` file:
9797
```toml
9898
name = "my-cluster"
9999
scheduler = "slurm"
@@ -139,7 +139,7 @@ extra_cmd_args = "--stepfactor 2"
139139
"iters" = "5"
140140
"warmup_iters" = "3"
141141
```
142-
You can find more examples under `conf/test`. In a test schema file, you can adjust arguments as shown above. In the `cmd_args` section, you can provide different values other than the default values for each argument. In `extra_cmd_args`, you can provide additional arguments that will be appended after the NCCL test command. You can specify additional environment variables in the `extra_env_vars` section.
142+
You can find more examples under `conf/common/test`. In a test schema file, you can adjust arguments as shown above. In the `cmd_args` section, you can provide different values other than the default values for each argument. In `extra_cmd_args`, you can provide additional arguments that will be appended after the NCCL test command. You can specify additional environment variables in the `extra_env_vars` section.
143143

144144
#### Step 6: Run Experiments
145145
Test Scenario uses Test description from the previous step. Below is the `myconfig/scenario.toml` file:
@@ -361,7 +361,7 @@ You can update the fields to adjust the behavior. For example, you can update th
361361
### Note: For running Nemo Llama model, it is important to follow these additional steps:
362362
1. Go to https://huggingface.co/docs/transformers/en/model_doc/llama#usage-tips.
363363
2. Follow the instructions under 'Usage Tips' on how to download the tokenizer.
364-
3. Replace "training.model.tokenizer.model=TOKENIZER_MODEL" with "training.model.tokenizer.model=YOUR_TOKENIZER_PATH" (the tokenizer should be a .model file) in conf/general/test/llama.toml.
364+
3. Replace "training.model.tokenizer.model=TOKENIZER_MODEL" with "training.model.tokenizer.model=YOUR_TOKENIZER_PATH" (the tokenizer should be a .model file) in conf/common/test/llama.toml.
365365

366366
## Troubleshooting
367367
In this section, we will guide you through identifying the root cause of issues, determining whether they stem from system infrastructure or a bug in CloudAI. Users should closely follow the USER_GUIDE.md and README.md for installation, adding test templates, tests, and test scenarios.

tests/test_acceptance.py

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -27,8 +27,12 @@
2727
from cloudai.systems.slurm import SlurmNode, SlurmNodeState
2828

2929
SLURM_TEST_SCENARIOS = [
30-
{"path": Path("conf/test_scenario/sleep.toml"), "expected_dirs_number": 4, "log_file": "sleep_debug.log"},
31-
{"path": Path("conf/test_scenario/ucc_test.toml"), "expected_dirs_number": 5, "log_file": "ucc_test_debug.log"},
30+
{"path": Path("conf/common/test_scenario/sleep.toml"), "expected_dirs_number": 4, "log_file": "sleep_debug.log"},
31+
{
32+
"path": Path("conf/common/test_scenario/ucc_test.toml"),
33+
"expected_dirs_number": 5,
34+
"log_file": "ucc_test_debug.log",
35+
},
3236
]
3337

3438

@@ -39,8 +43,8 @@ def test_slurm(tmp_path: Path, scenario: Dict):
3943
log_file = scenario.get("log_file")
4044
log_file_path = tmp_path / str(log_file)
4145

42-
parser = Parser(Path("conf/system/example_slurm_cluster.toml"), Path("conf/test_template"))
43-
system, tests, test_scenario = parser.parse(Path("conf/test"), test_scenario_path)
46+
parser = Parser(Path("conf/common/system/example_slurm_cluster.toml"), Path("conf/common/test_template"))
47+
system, tests, test_scenario = parser.parse(Path("conf/common/test"), test_scenario_path)
4448
system.output_path = str(tmp_path)
4549
assert test_scenario is not None, "Test scenario is None"
4650
setup_logging(str(log_file_path), "DEBUG")

0 commit comments

Comments
 (0)