You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: doc/DEV.md
-9Lines changed: 0 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -35,15 +35,6 @@ We use [import-linter](https://github.com/seddonym/import-linter) to ensure no c
35
35
36
36
`Registry` object is a singleton that holds implementation mappings. Users can register their own implementations to the registry or replace the default implementations.
37
37
38
-
## Runners
39
-
TBD
40
-
41
-
## Installers
42
-
TBD
43
-
44
-
## Systems
45
-
TBD
46
-
47
38
## Cache
48
39
Some prerequisites can be installed: docker images, git repos with executable scripts, etc. All such "installables" are kept under System's `install_path`.
Relevant Test Configs should specify `test_template_name = MyTest` to use the custom test definition.
72
72
73
-
## Step 3: System Configuration
73
+
## Step 4: System Configuration
74
74
System configuration describes the system configuration. You can find more examples of system configs under `conf/common/system/`. Our example will be small for demonstration purposes. Below is the `myconfig/system.toml` file:
75
75
```toml
76
76
name = "my-cluster"
@@ -90,15 +90,15 @@ name = "partition_1"
90
90
```
91
91
Replace `<YOUR PARTITION NAME>` with the name of the partition you want to use. You can find the partition name by running `sinfo` on the cluster.
92
92
93
-
## Step 4: Install Test Requirements
93
+
## Step 5: Install Test Requirements
94
94
Once all configs are ready, it is time to install test requirements. It is done once so that you can run multiple experiments without reinstalling the requirements. This step requires the system config file from the step 3.
95
95
```bash
96
96
cloudai install \
97
97
--system-config myconfig/system.toml \
98
98
--tests-dir myconfig/tests/
99
99
```
100
100
101
-
## Step 5: Test Configuration
101
+
## Step 6: Test Configuration
102
102
Test Configuration describes a particular test configuration to be run. It is based on Test definition and will be used in Test Sceanrio. Below is the `myconfig/tests/nccl_test.toml` file, definition is based on built-in `NcclTest` definition:
You can find more examples under `conf/common/test`. In a test schema file, you can adjust arguments as shown above. In the `cmd_args` section, you can provide different values other than the default values for each argument. In `extra_cmd_args`, you can provide additional arguments that will be appended after the NCCL test command. You can specify additional environment variables in the `extra_env_vars` section.
118
118
119
-
## Step 6: Run Experiments
119
+
## Step 7: Run Experiments
120
120
Test Scenario uses Test description from step 5. Below is the `myconfig/scenario.toml` file:
121
121
```toml
122
122
name = "nccl-test"
@@ -147,7 +147,7 @@ Notes on the test scenario:
147
147
All dependencies are described as a pair of the depending test name and a delay. The name should be taken from the test name as set in the test scenario. The delay is described in the number of seconds.
148
148
149
149
150
-
To generate NCCL test commands without actual execution, use the `dry-run` mode. You can review `debug.log` (or other file specifued with `--log-file`) to see the generated commands from CloudAI. Please note that group node allocations are not currently supported in the `dry-run` mode.
150
+
To generate NCCL test commands without actual execution, use the `dry-run` mode. You can review `debug.log` (or other file specified with `--log-file`) to see the generated commands from CloudAI. Please note that group node allocations are not currently supported in the `dry-run` mode.
151
151
```bash
152
152
cloudai dry-run \
153
153
--test-scenario myconfig/scenario.toml \
@@ -163,7 +163,7 @@ cloudai run \
163
163
--tests-dir myconfig/tests/
164
164
```
165
165
166
-
## Step 7: Generate Reports
166
+
## Step 8: Generate Reports
167
167
Once the test scenario is completed, you can generate reports using the following command:
168
168
```bash
169
169
cloudai generate-report \
@@ -392,9 +392,9 @@ rm_extracted: False # Preprocess script will remove extracted files after prepro
392
392
You can update the fields to adjust the behavior. For example, you can update the file_numbers field to adjust the number of dataset files to download. This will allow you to save disk space.
393
393
394
394
## Note: For running Nemo Llama model, it is important to follow these additional steps:
395
-
1. Go to https://huggingface.co/docs/transformers/en/model_doc/llama.
396
-
2. Follow the instructions under 'Usage Tips' on how to download the tokenizer.
397
-
3. Replace "training.model.tokenizer.model=TOKENIZER_MODEL" with "training.model.tokenizer.model=YOUR_TOKENIZER_PATH" (the tokenizer should be a .model file) in conf/common/test/llama.toml.
395
+
1. Go to [🤗 Hugging Face](https://huggingface.co/docs/transformers/en/model_doc/llama).
396
+
2. Follow the instructions on how to download the tokenizer.
397
+
3. Replace `TOKENIZER_MODEL` in `training.model.tokenizer.model=TOKENIZER_MODEL` with your path (the tokenizer should be a `.model` file) in `conf/common/test/llama.toml`.
0 commit comments