You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: USER_GUIDE.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -48,15 +48,15 @@ CloudAI allows users to package workloads as test templates to facilitate the au
48
48
```
49
49
50
50
#### Step 2: Prepare configuration files
51
-
CloudAI is fully configurable via set of TOML configuration files. You can find examples of these files under `conf/`. In this guide, we will use the following configuration files:
51
+
CloudAI is fully configurable via set of TOML configuration files. You can find examples of these files under `conf/common`. In this guide, we will use the following configuration files:
52
52
1.`myconfig/test_templates/nccl_template.toml` - Describes the test template configuration.
53
53
1.`myconfig/system.toml` - Describes the system configuration.
54
54
1.`myconfig/tests/nccl_test.toml` - Describes the test to run.
55
55
1.`myconfig/scenario.toml` - Describes the test scenario configuration.
56
56
57
57
58
58
#### Step 3: Test Template
59
-
Test template config describes all arguments of a test. Let's create a test template file for the NCCL test. You can find more examples of test templates under `conf/test_template/`. Our example will be small for demonstration purposes. Below is the `myconfig/test_templates/nccl_template.toml` file:
59
+
Test template config describes all arguments of a test. Let's create a test template file for the NCCL test. You can find more examples of test templates under `conf/common/test_template/`. Our example will be small for demonstration purposes. Below is the `myconfig/test_templates/nccl_template.toml` file:
60
60
```toml
61
61
name = "NcclTest"
62
62
@@ -93,7 +93,7 @@ name = "NcclTest"
93
93
Notice that `cmd_args.docker_image_url` uses `nvcr.io/nvidia/pytorch:24.02-py3`, but you can use Docker image from Step 1.
94
94
95
95
#### Step 3: System Config
96
-
System config describes the system configuration. You can find more examples of system configs under `conf/system/`. Our example will be small for demonstration purposes. Below is the `myconfig/system.toml` file:
96
+
System config describes the system configuration. You can find more examples of system configs under `conf/common/system/`. Our example will be small for demonstration purposes. Below is the `myconfig/system.toml` file:
You can find more examples under `conf/test`. In a test schema file, you can adjust arguments as shown above. In the `cmd_args` section, you can provide different values other than the default values for each argument. In `extra_cmd_args`, you can provide additional arguments that will be appended after the NCCL test command. You can specify additional environment variables in the `extra_env_vars` section.
142
+
You can find more examples under `conf/common/test`. In a test schema file, you can adjust arguments as shown above. In the `cmd_args` section, you can provide different values other than the default values for each argument. In `extra_cmd_args`, you can provide additional arguments that will be appended after the NCCL test command. You can specify additional environment variables in the `extra_env_vars` section.
143
143
144
144
#### Step 6: Run Experiments
145
145
Test Scenario uses Test description from the previous step. Below is the `myconfig/scenario.toml` file:
@@ -361,7 +361,7 @@ You can update the fields to adjust the behavior. For example, you can update th
361
361
### Note: For running Nemo Llama model, it is important to follow these additional steps:
362
362
1. Go to https://huggingface.co/docs/transformers/en/model_doc/llama#usage-tips.
363
363
2. Follow the instructions under 'Usage Tips' on how to download the tokenizer.
364
-
3. Replace "training.model.tokenizer.model=TOKENIZER_MODEL" with "training.model.tokenizer.model=YOUR_TOKENIZER_PATH" (the tokenizer should be a .model file) in conf/general/test/llama.toml.
364
+
3. Replace "training.model.tokenizer.model=TOKENIZER_MODEL" with "training.model.tokenizer.model=YOUR_TOKENIZER_PATH" (the tokenizer should be a .model file) in conf/common/test/llama.toml.
365
365
366
366
## Troubleshooting
367
367
In this section, we will guide you through identifying the root cause of issues, determining whether they stem from system infrastructure or a bug in CloudAI. Users should closely follow the USER_GUIDE.md and README.md for installation, adding test templates, tests, and test scenarios.
0 commit comments