|
1 | 1 | # Custom VARS |
2 | 2 |
|
3 | | -NCCLX CVARS - Strongly typed configurable knobs for NCCLX. All |
4 | | -configuration knobs are defined here, and can be used in source |
5 | | -file by including `nccl_cvars.h` and use typed CVAR by its name. |
| 3 | +NCCLX CVARS - Strongly typed configurable knobs for NCCLX. All configuration |
| 4 | +knobs are defined here, and can be used in source file by including |
| 5 | +`nccl_cvars.h` and use typed CVAR by its name. |
6 | 6 |
|
7 | 7 | ## User Guide |
8 | 8 |
|
9 | | -Refer to `nccl_cvars.yaml` for CVAR documentation and their default |
10 | | -values. |
| 9 | +Refer to `nccl_cvars.yaml` for CVAR documentation and their default values. |
11 | 10 |
|
12 | 11 | CVAR can be provided to program in two ways |
13 | | -1) Environment variable - e.g. `NCCL_DEBUG=warn nccl_allreduce_perf ...` |
14 | | -2) Config Variable - define in `/etc/nccl.conf` and it'll be picked up |
| 12 | + |
| 13 | +1. Environment variable - e.g. `NCCL_DEBUG=warn nccl_allreduce_perf ...` |
| 14 | +2. Config Variable - define in `/etc/nccl.conf` and it'll be picked up |
15 | 15 | automatically by program. |
16 | 16 |
|
17 | | -Environment variable will take precedence over config variable. If not |
18 | | -specified in either, the default value will be used. |
| 17 | +Environment variable will take precedence over config variable. If not specified |
| 18 | +in either, the default value will be used. |
19 | 19 |
|
20 | 20 | ## Developer Guide |
21 | 21 |
|
22 | 22 | All CVARs are defined in `nccl_cvars.yaml`. To add a new CVAR: |
23 | | -1) Add the CVAR definition in `nccl_cvars.yaml` |
24 | | -2) Build any target that depends on `//comms/utils/cvars:ncclx-cvars` - the files will be auto-generated via genrule |
25 | | -3) Include `#include "comms/utils/cvars/nccl_cvars.h"` and use your CVAR in program |
26 | 23 |
|
27 | | -**Note:** `nccl_cvars.h` and `nccl_cvars.cc` are now generated at build time using a genrule. |
28 | | -They should **NOT** be manually edited or committed to the repository. The genrule automatically |
29 | | -generates these files from `nccl_cvars.yaml` using `extractcvars.py` whenever you build a target |
30 | | -that depends on the `ncclx-cvars` library. |
| 24 | +1. Add the CVAR definition in `nccl_cvars.yaml` |
| 25 | +2. Build any target that depends on `//comms/utils/cvars:ncclx-cvars` - the |
| 26 | + files will be auto-generated via genrule |
| 27 | +3. Include `#include "comms/utils/cvars/nccl_cvars.h"` and use your CVAR in |
| 28 | + program |
| 29 | + |
| 30 | +**Note:** `nccl_cvars.h` and `nccl_cvars.cc` are now generated at build time |
| 31 | +using a genrule. They should **NOT** be manually edited or committed to the |
| 32 | +repository. The genrule automatically generates these files from |
| 33 | +`nccl_cvars.yaml` using `extractcvars.py` whenever you build a target that |
| 34 | +depends on the `ncclx-cvars` library. |
31 | 35 |
|
32 | 36 | To regenerate the files manually (for development/testing), you can run: |
| 37 | + |
33 | 38 | ```bash |
34 | 39 | cd ~/fbsource/fbcode && buck2 run comms/utils/cvars:extractcvars |
35 | 40 | ``` |
36 | 41 |
|
37 | | -The CVAR is initialized as part of ncclInit and it is done by `initEnv` from `init.cc`. CVAR |
38 | | -must not be used before initialization. |
| 42 | +The CVAR is initialized as part of ncclInit and it is done by `initEnv` from |
| 43 | +`init.cc`. CVAR must not be used before initialization. |
| 44 | + |
| 45 | +## Including CVARs in Build Scripts (OSS/Non-Buck Builds) |
| 46 | + |
| 47 | +For OSS builds or other build scripts that don't use Buck2, you need to generate |
| 48 | +the `nccl_cvars.h` and `nccl_cvars.cc` files before building. There are two |
| 49 | +approaches: |
| 50 | + |
| 51 | +### Option 1: Using Buck2 Genrule (when Buck2 is available) |
| 52 | + |
| 53 | +If Buck2 is available in your build environment, you can use the genrule to |
| 54 | +generate the files: |
| 55 | + |
| 56 | +```bash |
| 57 | +GENRULE_OUTPUT=$(buck2 build fbcode//comms/utils/cvars:generate_nccl_cvars --show-full-output 2>&1 | grep "generate_nccl_cvars" | awk '{print $2}') |
| 58 | +if [ -n "$GENRULE_OUTPUT" ]; then |
| 59 | + cp "$GENRULE_OUTPUT/nccl_cvars.h" "$CVARS_DIR/nccl_cvars.h" |
| 60 | + cp "$GENRULE_OUTPUT/nccl_cvars.cc" "$CVARS_DIR/nccl_cvars.cc" |
| 61 | +fi |
| 62 | +``` |
| 63 | + |
| 64 | +### Option 2: Running extractcvars.py Directly (recommended for OSS builds) |
| 65 | + |
| 66 | +For builds outside of Buck2 (e.g., conda/Docker builds), run the |
| 67 | +`extractcvars.py` script directly: |
| 68 | + |
| 69 | +```bash |
| 70 | +# Set the output directory for the generated files |
| 71 | +CVARS_DIR="$FBCODE_DIR/comms/utils/cvars" |
| 72 | + |
| 73 | +# Validate that the required source files exist |
| 74 | +if [ ! -f "$CVARS_DIR/extractcvars.py" ]; then |
| 75 | + echo "ERROR: extractcvars.py not found" |
| 76 | + exit 1 |
| 77 | +fi |
| 78 | +if [ ! -f "$CVARS_DIR/nccl_cvars.yaml" ]; then |
| 79 | + echo "ERROR: nccl_cvars.yaml not found" |
| 80 | + exit 1 |
| 81 | +fi |
| 82 | +if [ ! -f "$CVARS_DIR/nccl_cvars.cc.in" ]; then |
| 83 | + echo "ERROR: nccl_cvars.cc.in not found" |
| 84 | + exit 1 |
| 85 | +fi |
| 86 | + |
| 87 | +# Install ruamel-yaml (required by extractcvars.py) |
| 88 | +conda install ruamel.yaml --yes # or: pip install ruamel.yaml |
| 89 | + |
| 90 | +# Run the script to generate the files |
| 91 | +export NCCL_CVARS_OUTPUT_DIR="$CVARS_DIR" |
| 92 | +python3 "$CVARS_DIR/extractcvars.py" |
| 93 | + |
| 94 | +# Verify the files were generated |
| 95 | +if [ ! -f "$CVARS_DIR/nccl_cvars.h" ] || [ ! -f "$CVARS_DIR/nccl_cvars.cc" ]; then |
| 96 | + echo "ERROR: Failed to generate nccl_cvars files" |
| 97 | + exit 1 |
| 98 | +fi |
| 99 | +``` |
39 | 100 |
|
40 | 101 | ## Changed NCCL CVAR Default values |
41 | 102 |
|
42 | | -NCCL_RAS_ENABLE - default value changed from 1 to 0 |
43 | | -NCCL_CTRAN_IB_MAX_QPS - default value changed from 1 to 16 |
44 | | -NCCL_CTRAN_IB_QP_MAX_MSGS - default value changed from 4 to 128 |
45 | | -NCCL_CTRAN_IB_QP_SCALING_THRESHOLD - default value changed from 131072 to 524288 |
46 | | -NCCL_CTRAN_IB_QP_CONFIG_XDC - default value changed from "" to "1048576,16,spray,128" |
47 | | -NCCL_CTRAN_IB_QP_CONFIG_XRACK - default value changed from "" to "1048576,16,spray,128" |
48 | | -NCCL_CTRAN_IB_QP_CONFIG_XZONE - default value changed from "" to "1048576,16,spray,128" |
49 | | -NCCL_CTRAN_IB_VC_MODE - default value changed from "spray" to "dqplb" |
| 103 | +NCCL_RAS_ENABLE - default value changed from 1 to 0 NCCL_CTRAN_IB_MAX_QPS - |
| 104 | +default value changed from 1 to 16 NCCL_CTRAN_IB_QP_MAX_MSGS - default value |
| 105 | +changed from 4 to 128 NCCL_CTRAN_IB_QP_SCALING_THRESHOLD - default value changed |
| 106 | +from 131072 to 524288 NCCL_CTRAN_IB_QP_CONFIG_XDC - default value changed from |
| 107 | +"" to "1048576,16,spray,128" NCCL_CTRAN_IB_QP_CONFIG_XRACK - default value |
| 108 | +changed from "" to "1048576,16,spray,128" NCCL_CTRAN_IB_QP_CONFIG_XZONE - |
| 109 | +default value changed from "" to "1048576,16,spray,128" NCCL_CTRAN_IB_VC_MODE - |
| 110 | +default value changed from "spray" to "dqplb" |
50 | 111 |
|
51 | 112 | ## NCCL Baseline Adapter |
52 | 113 |
|
53 | | -The NCCL Baseline Adapter API is designed to provide a similar interface to the baseline/third-party NCCL library's `ncclGetEnv` and `ncclLoadParam` functions. |
| 114 | +The NCCL Baseline Adapter API is designed to provide a similar interface to the |
| 115 | +baseline/third-party NCCL library's `ncclGetEnv` and `ncclLoadParam` functions. |
0 commit comments