Skip to content

Commit 58ebe25

Browse files
authored
Merge branch 'main' into nsarka/ddlb-integration
2 parents e389674 + 453f185 commit 58ebe25

File tree

3 files changed

+52
-85
lines changed

3 files changed

+52
-85
lines changed

README.md

Lines changed: 7 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -3,31 +3,16 @@
33
CloudAI benchmark framework aims to develop an industry standard benchmark focused on grading Data Center (DC) scale AI systems in the Cloud. The primary motivation is to provide automated benchmarking on various systems.
44

55
## Get Started
6-
**Note**: instructions for installing a custom python version are available [here](#install-custom-python-version).
7-
86
**Note**: instructions for setting up access for `enroot` are available [here](#set-up-access-to-the-private-ngc-registry).
97

10-
1. Clone the CloudAI repository to your local machine:
11-
```bash
12-
git clone git@github.com:NVIDIA/cloudai.git
13-
cd cloudai
14-
```
15-
16-
2. Create a virtual environment:
17-
```bash
18-
python -m venv venv
19-
source venv/bin/activate
20-
```
21-
22-
3. Next, install the required packages:
23-
```bash
24-
pip install .
25-
```
8+
Using `uv` tool allows users to run CloudAI without manually managing required Python versions and dependencies.
9+
```bash
10+
git clone git@github.com:NVIDIA/cloudai.git
11+
cd cloudai
12+
uv run cloudai --help
13+
```
2614

27-
For development please use the following command:
28-
```bash
29-
pip install -e '.[dev]'
30-
```
15+
For details and `pip`-based installation, please refer to the [documentation](https://nvidia.github.io/cloudai/#get-started).
3116

3217
## Key Concepts
3318
CloudAI operates on four main schemas:
@@ -75,17 +60,6 @@ machine nvcr.io login $oauthtoken password <api-key>
7560
Replace `<api-key>` with your respective credentials. Keep `$oauthtoken` as is.
7661

7762

78-
### Install custom python version
79-
If your system python version is not supported, you can install a custom version using [uv](https://docs.astral.sh/uv/getting-started/installation/) tool:
80-
```bash
81-
curl -LsSf https://astral.sh/uv/install.sh | sh
82-
source $HOME/.local/bin/env
83-
uv venv -p 3.10
84-
source .venv/bin/activate
85-
# optionally you might need to install pip which is not installed by default:
86-
uv pip install -U pip
87-
```
88-
8963
## CloudAI Modes Usage Examples
9064

9165
CloudAI supports five modes:

doc/USER_GUIDE.md

Lines changed: 21 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@ scheduler = "slurm"
7878

7979
install_path = "./install"
8080
output_path = "./results"
81-
cache_docker_images_locally = "True"
81+
cache_docker_images_locally = true
8282
default_partition = "<YOUR PARTITION NAME>"
8383

8484
mpi = "pmix"
@@ -104,15 +104,15 @@ Test Configuration describes a particular test configuration to be run. It is ba
104104
name = "nccl_test_all_reduce_single_node"
105105
description = "all_reduce"
106106
test_template_name = "NcclTest"
107-
extra_cmd_args = "--stepfactor 2"
108107

109108
[cmd_args]
110-
"subtest_name" = "all_reduce_perf_mpi"
111-
"ngpus" = "1"
112-
"minbytes" = "8M"
113-
"maxbytes" = "16G"
114-
"iters" = "5"
115-
"warmup_iters" = "3"
109+
subtest_name = "all_reduce_perf_mpi"
110+
ngpus = 1
111+
minbytes = "8M"
112+
maxbytes = "16G"
113+
iters = 5
114+
warmup_iters = 3
115+
stepfactor = 2
116116
```
117117
You can find more examples under `conf/common/test`. In a test schema file, you can adjust arguments as shown above. In the `cmd_args` section, you can provide different values other than the default values for each argument. In `extra_cmd_args`, you can provide additional arguments that will be appended after the NCCL test command. You can specify additional environment variables in the `extra_env_vars` section.
118118

@@ -122,12 +122,14 @@ Test Scenario uses Test description from step 5. Below is the `myconfig/scenario
122122
name = "nccl-test"
123123

124124
[[Tests]]
125-
id = "Tests.1"
125+
id = "allreduce.1"
126+
num_nodes = 1
126127
test_name = "nccl_test_all_reduce_single_node"
127128
time_limit = "00:20:00"
128129

129130
[[Tests]]
130-
id = "Tests.2"
131+
id = "allreduce.2"
132+
num_nodes = 1
131133
test_name = "nccl_test_all_reduce_single_node"
132134
time_limit = "00:20:00"
133135
[[Tests.dependencies]]
@@ -178,7 +180,7 @@ cloudai generate-report \
178180
# Describing a System in the System Schema
179181
In this section, we introduce the concept of the system schema, explain the meaning of each field, and describe how the fields should be used. The system schema is a TOML file that allows users to define a system's configuration.
180182

181-
```
183+
```toml
182184
name = "example-cluster"
183185
scheduler = "slurm"
184186

@@ -207,14 +209,14 @@ name = "partition_1"
207209
name = "partition_2"
208210

209211
[global_env_vars]
210-
# NCCL Specific Configurations
211-
NCCL_IB_GID_INDEX = "3"
212-
NCCL_IB_TIMEOUT = "20"
213-
NCCL_IB_QPS_PER_CONNECTION = "4"
214-
215-
# Device Visibility Configuration
216-
MELLANOX_VISIBLE_DEVICES = "0,3,4,5,6,9,10,11"
217-
CUDA_VISIBLE_DEVICES = "0,1,2,3,4,5,6,7"
212+
# NCCL Specific Configurations
213+
NCCL_IB_GID_INDEX = "3"
214+
NCCL_IB_TIMEOUT = "20"
215+
NCCL_IB_QPS_PER_CONNECTION = "4"
216+
217+
# Device Visibility Configuration
218+
MELLANOX_VISIBLE_DEVICES = "0,3,4,5,6,9,10,11"
219+
CUDA_VISIBLE_DEVICES = "0,1,2,3,4,5,6,7"
218220
```
219221

220222
## Field Descriptions

doc/index.md

Lines changed: 24 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -3,31 +3,35 @@
33
CloudAI benchmark framework aims to develop an industry standard benchmark focused on grading Data Center (DC) scale AI systems in the Cloud. The primary motivation is to provide automated benchmarking on various systems.
44

55
## Get Started
6-
**Note**: instructions for installing a custom python version are available [here](install-custom-python-version).
7-
86
**Note**: instructions for setting up access for `enroot` are available [here](set-up-access-to-the-private-ngc-registry).
97

10-
1. Clone the CloudAI repository to your local machine:
11-
```bash
12-
git clone git@github.com:NVIDIA/cloudai.git
13-
cd cloudai
14-
```
8+
```bash
9+
git clone git@github.com:NVIDIA/cloudai.git
10+
cd cloudai
11+
uv run cloudai --help
12+
```
1513

16-
2. Create a virtual environment:
17-
```bash
18-
python -m venv venv
19-
source venv/bin/activate
20-
```
14+
### `pip`-based installation
15+
See required Python version in the `.python-version` file, please ensure you have it installed (see how a custom python version [can be installed](#install-custom-python-version)). Follow these steps:
16+
```bash
17+
git clone git@github.com:NVIDIA/cloudai.git
18+
cd cloudai
19+
python -m venv venv
20+
source venv/bin/activate
21+
pip install -e .
22+
```
2123

22-
3. Next, install the required packages:
23-
```bash
24-
pip install .
25-
```
24+
(install-custom-python-version)=
25+
### Install custom python version
26+
If your system python version is not supported, you can install a custom version using [uv](https://docs.astral.sh/uv/getting-started/installation/) tool:
27+
```bash
28+
curl -LsSf https://astral.sh/uv/install.sh | sh
29+
source $HOME/.local/bin/env
30+
uv venv --seed # this will pick up the python version from .python-version file
31+
# --seed will install pip and setuptools
32+
source .venv/bin/activate
33+
```
2634

27-
For development please use the following command:
28-
```bash
29-
pip install -e '.[dev]'
30-
```
3135

3236
## Key Concepts
3337
CloudAI operates on four main schemas:
@@ -73,19 +77,6 @@ machine nvcr.io login $oauthtoken password <api-key>
7377
```
7478
Replace `<api-key>` with your respective credentials. Keep `$oauthtoken` as is.
7579

76-
77-
(install-custom-python-version)=
78-
### Install custom python version
79-
If your system python version is not supported, you can install a custom version using [uv](https://docs.astral.sh/uv/getting-started/installation/) tool:
80-
```bash
81-
curl -LsSf https://astral.sh/uv/install.sh | sh
82-
source $HOME/.local/bin/env
83-
uv venv -p 3.10
84-
source .venv/bin/activate
85-
# optionally you might need to install pip which is not installed by default:
86-
uv pip install -U pip
87-
```
88-
8980
## CloudAI Modes Usage Examples
9081

9182
CloudAI supports five modes:

0 commit comments

Comments
 (0)