You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+7-7Lines changed: 7 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,22 +11,22 @@ cd cloudai
11
11
uv run cloudai --help
12
12
```
13
13
14
-
Please refer to the [installation guide](https://nvidia.github.io/cloudai/workloads_requirements_installation.html) for details on setting up workloads' requirements.
14
+
For details on setting up workloads' requirements, please refer to the [installation guide](https://nvidia.github.io/cloudai/workloads_requirements_installation.html)
15
15
16
16
For details and `pip`-based installation, please refer to the [documentation](https://nvidia.github.io/cloudai/#get-started).
17
17
18
18
## Key Concepts
19
19
20
-
CloudAI operates on four main schemas:
20
+
CloudAI operates on three main schemas:
21
21
22
-
-**System Schema**: Describes the system, including the scheduler type, node list, and global environment variables.
23
-
-**Test Schema**: An instance of a test template with custom arguments and environment variables.
24
-
-**Test Scenario Schema**: A set of tests with dependencies and additional descriptions about the test scenario.
22
+
-**System Schema**: Describes the system, including the scheduler type, node list, and global environment variables
23
+
-**Test Schema**: An instance of a test template with custom arguments and environment variables
24
+
-**Test Scenario Schema**: A set of tests with dependencies and additional descriptions about the test scenario
25
25
26
26
These schemas enable CloudAI to be flexible and compatible with different systems and configurations.
27
27
28
28
29
-
## Support matrix
29
+
## Support Matrix
30
30
|Test|Slurm|Kubernetes|RunAI|Standalone|
31
31
|---|---|---|---|---|
32
32
|AI Dynamo|✅|✅|❌|❌|
@@ -47,7 +47,7 @@ These schemas enable CloudAI to be flexible and compatible with different system
47
47
|Triton Inference|✅|❌|❌|❌|
48
48
|UCC|✅|❌|❌|❌|
49
49
50
-
*deprecated means that a workload support exists, but we are not maintaining it actively anymore and newer configurations might not work.
50
+
Note: Deprecated means that a workload support exists, but we are not maintaining it actively anymore and newer configurations might not work.
51
51
52
52
For more detailed information, please refer to the [official documentation](https://nvidia.github.io/cloudai/workloads/index.html).
Copy file name to clipboardExpand all lines: doc/DEV.md
+2-1Lines changed: 2 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -36,7 +36,8 @@ We use [import-linter](https://github.com/seddonym/import-linter) to ensure no c
36
36
`Registry` object is a singleton that holds implementation mappings. Users can register their own implementations to the registry or replace the default implementations.
37
37
38
38
## Cache
39
-
Some prerequisites can be installed: docker images, git repos with executable scripts, etc. All such "installables" are kept under System's `install_path`.
39
+
Some prerequisites can be installed. For example: Docker images, git repos with executable scripts, etc.
40
+
All such "installables" are kept under System's `install_path`.
40
41
41
42
Installables are shared among all tests. So if any number of tests use the same installable, it is installed only once for a particular System TOML.
Copy file name to clipboardExpand all lines: doc/USER_GUIDE.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,5 @@
1
1
# CloudAI User Guide
2
-
This is a CloudAI user guide to help users use CloudAI, covering topics such as adding new tests and downloading datasets for running NeMo-launcher.
2
+
The purpose of this guide is to help users use CloudAI. The user guide covers topics such as adding new tests and downloading datasets for running NeMo-launcher.
Copy file name to clipboardExpand all lines: doc/index.md
+8-8Lines changed: 8 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,10 +9,10 @@ cd cloudai
9
9
uv run cloudai --help
10
10
```
11
11
12
-
**Note**: instructions for setting up access for `enroot` are available[installation guide](./workloads_requirements_installation.rst).
12
+
**Note**: For instructions for setting up access for `enroot`, see[installation guide](./workloads_requirements_installation.rst).
13
13
14
-
### `pip`-based installation
15
-
See required Python version in the `.python-version` file, please ensure you have it installed (see how a custom python version [can be installed](#install-custom-python-version)). Follow these steps:
14
+
### `pip`-based Installation
15
+
See required Python version in the `.python-version` file and make sure you have it installed (For Installation, see [Custom Python version](#install-custom-python-version)). Follow these steps:
16
16
```bash
17
17
git clone git@github.com:NVIDIA/cloudai.git
18
18
cd cloudai
@@ -22,7 +22,7 @@ pip install -e .
22
22
```
23
23
24
24
(install-custom-python-version)=
25
-
### Install custom python version
25
+
### Install Custom Python Version
26
26
If your system python version is not supported, you can install a custom version using [uv](https://docs.astral.sh/uv/getting-started/installation/) tool:
27
27
```bash
28
28
curl -LsSf https://astral.sh/uv/install.sh | sh
@@ -34,11 +34,11 @@ source .venv/bin/activate
34
34
35
35
36
36
## Key Concepts
37
-
CloudAI operates on four main schemas:
37
+
CloudAI operates on three main schemas:
38
38
39
-
-**System Schema**: Describes the system, including the scheduler type, node list, and global environment variables.
40
-
-**Test Schema**: An instance of a test template with custom arguments and environment variables.
41
-
-**Test Scenario Schema**: A set of tests with dependencies and additional descriptions about the test scenario.
39
+
-**System Schema**: Describes the system, including the scheduler type, node list, and global environment variables
40
+
-**Test Schema**: An instance of a test template with custom arguments and environment variables
41
+
-**Test Scenario Schema**: A set of tests with dependencies and additional descriptions about the test scenario
42
42
43
43
These schemas enable CloudAI to be flexible and compatible with different systems and configurations.
Copy file name to clipboardExpand all lines: doc/reporting.md
+13-9Lines changed: 13 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,42 +3,46 @@ This document describes the reporting system in CloudAI.
3
3
4
4
5
5
## Overview
6
-
CloudAI has two reporting levels: per-test (per each case in a test scenario) and per-scenario (per each test scenario). All reports are generated after the test scenario is completed as part of main CloudAI process. For Slurm this means that login node is used to generate reports.
6
+
CloudAI has two reporting levels:
7
+
- per-test (per each case in a test scenario)
8
+
- per-scenario (per each test scenario)
9
+
10
+
All reports are generated after the test scenario is completed as part of main CloudAI process. For Slurm this means that login node is used to generate reports.
7
11
8
12
Per-test reports are linked to a particular workload type (e.g. `NcclTest`). All per-test reports are implemented as part of `per_test` scenario report and can be enabled/disabled via single configuration option, see [Enable, disable and configure reports](enable-disable-and-configure-reports) section.
9
13
10
-
To list all available reports, one can use `cloudai list-reports` command. Use verbose output to also print report configurations.
14
+
To list all available reports, users can use `cloudai list-reports` command. Use verbose output to also print report configurations.
11
15
12
16
13
-
## Notes and general flow
17
+
## Notes and General Flow
14
18
1. All reports should be registered via `Registry()` (`.add_report()` or `.add_scenario_report()`).
15
19
1. Scenario reports are configurable via system config (Slurm-only for now) and scenario config.
16
20
1. Configuration in a scenario config has the highest priority. Next, system config is checked. Then it defaults to report config from the registry.
17
21
1. Then report is generated (or not) according to this final config.
18
22
19
23
20
24
(enable-disable-and-configure-reports)=
21
-
## Enable, disable and configure reports
22
-
**NOTE** Only scenario-level reports can be configured today.
25
+
## Enable, Disable and Configure Reports
26
+
**NOTE** Only scenario-level reports can be configured.
23
27
24
-
To enable or disable a report, one needs to do it via System configuration:
28
+
To enable or disable a report, users need to do it via system configuration:
Each report can define its own configuration which is constructed and passed as an argument to `Registry.add_scenario_report` method. `reports` field is parsed during TOMLs reading and respective Pydantic model is created.
44
+
## Report Configuration Implementation
45
+
Each report can define its own configuration, which is constructed and passed as an argument to `Registry.add_scenario_report` method. `reports` field is parsed during TOMLs reading and respective Pydantic model is created.
42
46
43
47
For example, we can define a custom report configuration:
0 commit comments