Skip to content

Commit 171df87

Browse files
Ben Carverfacebook-github-bot
authored andcommitted
Remove nccl_cvars.(h|cc) in favor of genrule-based generation (#84)
Summary: This diff removes the `nccl_cvars.cc` and `nccl_cvars.h` files now that they'll instead be built/generated/provided by `genrule`. This diff also updates several build scripts to be compatible with the new `genrule`-based build for the `ncclx-cvars` library: - MCCL build script, `comms/mccl/build/build.sh` - `comms/ncclx/v2_27/maint/oss_build.sh` - `comms/ncclx/v2_28/maint/oss_build.sh` - `comms/github/build_rcclx.sh` - `comms/github/build_ncclx.sh` - `conda/feedstock/nccl/recipe.yaml` **Note**: this diff was originally published, landed, and reverted as D87668052. Differential Revision: D88748045
1 parent 0a34bb5 commit 171df87

File tree

6 files changed

+196
-7856
lines changed

6 files changed

+196
-7856
lines changed

build_ncclx.sh

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -178,6 +178,7 @@ function build_third_party {
178178
zstd
179179
conda-forge::zlib
180180
conda-forge::libopenssl-static
181+
ruamel.yaml
181182
fmt
182183
)
183184
conda install "${DEPS[@]}" --yes
@@ -248,6 +249,41 @@ if [[ -z "${NCCL_BUILD_SKIP_DEPS}" ]]; then
248249
build_comms_tracing_service
249250
fi
250251

252+
# Generate nccl_cvars files (these are no longer checked into the repo)
253+
# The files are generated by extractcvars.py which reads nccl_cvars.yaml and nccl_cvars.cc.in
254+
echo "Generating nccl_cvars files..."
255+
CVARS_DIR="$BASE_DIR/comms/utils/cvars"
256+
257+
# Validate that the required source files exist
258+
if [ ! -f "$CVARS_DIR/extractcvars.py" ]; then
259+
echo "ERROR: extractcvars.py not found at $CVARS_DIR/extractcvars.py"
260+
exit 1
261+
fi
262+
if [ ! -f "$CVARS_DIR/nccl_cvars.yaml" ]; then
263+
echo "ERROR: nccl_cvars.yaml not found at $CVARS_DIR/nccl_cvars.yaml"
264+
exit 1
265+
fi
266+
if [ ! -f "$CVARS_DIR/nccl_cvars.cc.in" ]; then
267+
echo "ERROR: nccl_cvars.cc.in not found at $CVARS_DIR/nccl_cvars.cc.in"
268+
exit 1
269+
fi
270+
271+
# Install ruamel-yaml if not already installed (required by extractcvars.py)
272+
if [[ -z "${NCCL_SKIP_CONDA_INSTALL}" ]]; then
273+
conda install ruamel.yaml --yes
274+
fi
275+
276+
# Run the extractcvars.py script directly to generate the files
277+
export NCCL_CVARS_OUTPUT_DIR="$CVARS_DIR"
278+
python3 "$CVARS_DIR/extractcvars.py"
279+
280+
# Verify the files were generated
281+
if [ ! -f "$CVARS_DIR/nccl_cvars.h" ] || [ ! -f "$CVARS_DIR/nccl_cvars.cc" ]; then
282+
echo "ERROR: Failed to generate nccl_cvars files"
283+
exit 1
284+
fi
285+
echo "Successfully generated nccl_cvars files in $CVARS_DIR"
286+
251287
# set up the third-party ldflags
252288
export PKG_CONFIG_PATH="${CONDA_LIB_DIR}"/pkgconfig
253289
THRIFT_SERVICE_LDFLAGS=(

build_rcclx.sh

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -169,6 +169,7 @@ function build_third_party {
169169
xxhash
170170
zstd
171171
conda-forge::zlib
172+
ruamel.yaml
172173
fmt
173174
glog==0.4.0
174175
)
@@ -216,6 +217,41 @@ if [[ -z "${NCCL_BUILD_SKIP_DEPS}" ]]; then
216217
build_third_party
217218
fi
218219

220+
# Generate nccl_cvars files (these are no longer checked into the repo)
221+
# The files are generated by extractcvars.py which reads nccl_cvars.yaml and nccl_cvars.cc.in
222+
echo "Generating nccl_cvars files..."
223+
CVARS_DIR="$BASE_DIR/comms/utils/cvars"
224+
225+
# Validate that the required source files exist
226+
if [ ! -f "$CVARS_DIR/extractcvars.py" ]; then
227+
echo "ERROR: extractcvars.py not found at $CVARS_DIR/extractcvars.py"
228+
exit 1
229+
fi
230+
if [ ! -f "$CVARS_DIR/nccl_cvars.yaml" ]; then
231+
echo "ERROR: nccl_cvars.yaml not found at $CVARS_DIR/nccl_cvars.yaml"
232+
exit 1
233+
fi
234+
if [ ! -f "$CVARS_DIR/nccl_cvars.cc.in" ]; then
235+
echo "ERROR: nccl_cvars.cc.in not found at $CVARS_DIR/nccl_cvars.cc.in"
236+
exit 1
237+
fi
238+
239+
# Install ruamel-yaml if not already installed (required by extractcvars.py)
240+
if [[ -z "${NCCL_SKIP_CONDA_INSTALL}" ]]; then
241+
conda install ruamel.yaml --yes
242+
fi
243+
244+
# Run the extractcvars.py script directly to generate the files
245+
export NCCL_CVARS_OUTPUT_DIR="$CVARS_DIR"
246+
python3 "$CVARS_DIR/extractcvars.py"
247+
248+
# Verify the files were generated
249+
if [ ! -f "$CVARS_DIR/nccl_cvars.h" ] || [ ! -f "$CVARS_DIR/nccl_cvars.cc" ]; then
250+
echo "ERROR: Failed to generate nccl_cvars files"
251+
exit 1
252+
fi
253+
echo "Successfully generated nccl_cvars files in $CVARS_DIR"
254+
219255
if [ "$CLEAN_BUILD" == 1 ]; then
220256
rm -rf "$BUILDDIR"
221257
fi

comms/ncclx/v2_27/maint/oss_build.sh

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -288,6 +288,41 @@ fi
288288

289289
mkdir -p $BUILDDIR
290290

291+
# Generate nccl_cvars files (these are no longer checked into the repo)
292+
# The files are generated by extractcvars.py which reads nccl_cvars.yaml and nccl_cvars.cc.in
293+
echo "Generating nccl_cvars files..."
294+
CVARS_DIR="$FBCODE_DIR/comms/utils/cvars"
295+
296+
# Validate that the required source files exist
297+
if [ ! -f "$CVARS_DIR/extractcvars.py" ]; then
298+
echo "ERROR: extractcvars.py not found at $CVARS_DIR/extractcvars.py"
299+
exit 1
300+
fi
301+
if [ ! -f "$CVARS_DIR/nccl_cvars.yaml" ]; then
302+
echo "ERROR: nccl_cvars.yaml not found at $CVARS_DIR/nccl_cvars.yaml"
303+
exit 1
304+
fi
305+
if [ ! -f "$CVARS_DIR/nccl_cvars.cc.in" ]; then
306+
echo "ERROR: nccl_cvars.cc.in not found at $CVARS_DIR/nccl_cvars.cc.in"
307+
exit 1
308+
fi
309+
310+
# Install ruamel-yaml if not already installed (required by extractcvars.py)
311+
if [ -z "$SKIP_CONDA_INSTALL" ]; then
312+
conda install -p "$CONDA_DIR" ruamel.yaml --yes
313+
fi
314+
315+
# Run the extractcvars.py script directly to generate the files
316+
export NCCL_CVARS_OUTPUT_DIR="$CVARS_DIR"
317+
python3 "$CVARS_DIR/extractcvars.py"
318+
319+
# Verify the files were generated
320+
if [ ! -f "$CVARS_DIR/nccl_cvars.h" ] || [ ! -f "$CVARS_DIR/nccl_cvars.cc" ]; then
321+
echo "ERROR: Failed to generate nccl_cvars files"
322+
exit 1
323+
fi
324+
echo "Successfully generated nccl_cvars files in $CVARS_DIR"
325+
291326
# Use nccl relative to fbcode dir (configurable for Docker builds)
292327
export NCCL_HOME=${NCCL_HOME:-$FBCODE_DIR/comms/ncclx/v2_27}
293328
pushd "${NCCL_HOME}"

comms/utils/cvars/README.md

Lines changed: 89 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,53 +1,115 @@
11
# Custom VARS
22

3-
NCCLX CVARS - Strongly typed configurable knobs for NCCLX. All
4-
configuration knobs are defined here, and can be used in source
5-
file by including `nccl_cvars.h` and use typed CVAR by its name.
3+
NCCLX CVARS - Strongly typed configurable knobs for NCCLX. All configuration
4+
knobs are defined here, and can be used in source file by including
5+
`nccl_cvars.h` and use typed CVAR by its name.
66

77
## User Guide
88

9-
Refer to `nccl_cvars.yaml` for CVAR documentation and their default
10-
values.
9+
Refer to `nccl_cvars.yaml` for CVAR documentation and their default values.
1110

1211
CVAR can be provided to program in two ways
13-
1) Environment variable - e.g. `NCCL_DEBUG=warn nccl_allreduce_perf ...`
14-
2) Config Variable - define in `/etc/nccl.conf` and it'll be picked up
12+
13+
1. Environment variable - e.g. `NCCL_DEBUG=warn nccl_allreduce_perf ...`
14+
2. Config Variable - define in `/etc/nccl.conf` and it'll be picked up
1515
automatically by program.
1616

17-
Environment variable will take precedence over config variable. If not
18-
specified in either, the default value will be used.
17+
Environment variable will take precedence over config variable. If not specified
18+
in either, the default value will be used.
1919

2020
## Developer Guide
2121

2222
All CVARs are defined in `nccl_cvars.yaml`. To add a new CVAR:
23-
1) Add the CVAR definition in `nccl_cvars.yaml`
24-
2) Build any target that depends on `//comms/utils/cvars:ncclx-cvars` - the files will be auto-generated via genrule
25-
3) Include `#include "comms/utils/cvars/nccl_cvars.h"` and use your CVAR in program
2623

27-
**Note:** `nccl_cvars.h` and `nccl_cvars.cc` are now generated at build time using a genrule.
28-
They should **NOT** be manually edited or committed to the repository. The genrule automatically
29-
generates these files from `nccl_cvars.yaml` using `extractcvars.py` whenever you build a target
30-
that depends on the `ncclx-cvars` library.
24+
1. Add the CVAR definition in `nccl_cvars.yaml`
25+
2. Build any target that depends on `//comms/utils/cvars:ncclx-cvars` - the
26+
files will be auto-generated via genrule
27+
3. Include `#include "comms/utils/cvars/nccl_cvars.h"` and use your CVAR in
28+
program
29+
30+
**Note:** `nccl_cvars.h` and `nccl_cvars.cc` are now generated at build time
31+
using a genrule. They should **NOT** be manually edited or committed to the
32+
repository. The genrule automatically generates these files from
33+
`nccl_cvars.yaml` using `extractcvars.py` whenever you build a target that
34+
depends on the `ncclx-cvars` library.
3135

3236
To regenerate the files manually (for development/testing), you can run:
37+
3338
```bash
3439
cd ~/fbsource/fbcode && buck2 run comms/utils/cvars:extractcvars
3540
```
3641

37-
The CVAR is initialized as part of ncclInit and it is done by `initEnv` from `init.cc`. CVAR
38-
must not be used before initialization.
42+
The CVAR is initialized as part of ncclInit and it is done by `initEnv` from
43+
`init.cc`. CVAR must not be used before initialization.
44+
45+
## Including CVARs in Build Scripts (OSS/Non-Buck Builds)
46+
47+
For OSS builds or other build scripts that don't use Buck2, you need to generate
48+
the `nccl_cvars.h` and `nccl_cvars.cc` files before building. There are two
49+
approaches:
50+
51+
### Option 1: Using Buck2 Genrule (when Buck2 is available)
52+
53+
If Buck2 is available in your build environment, you can use the genrule to
54+
generate the files:
55+
56+
```bash
57+
GENRULE_OUTPUT=$(buck2 build fbcode//comms/utils/cvars:generate_nccl_cvars --show-full-output 2>&1 | grep "generate_nccl_cvars" | awk '{print $2}')
58+
if [ -n "$GENRULE_OUTPUT" ]; then
59+
cp "$GENRULE_OUTPUT/nccl_cvars.h" "$CVARS_DIR/nccl_cvars.h"
60+
cp "$GENRULE_OUTPUT/nccl_cvars.cc" "$CVARS_DIR/nccl_cvars.cc"
61+
fi
62+
```
63+
64+
### Option 2: Running extractcvars.py Directly (recommended for OSS builds)
65+
66+
For builds outside of Buck2 (e.g., conda/Docker builds), run the
67+
`extractcvars.py` script directly:
68+
69+
```bash
70+
# Set the output directory for the generated files
71+
CVARS_DIR="$FBCODE_DIR/comms/utils/cvars"
72+
73+
# Validate that the required source files exist
74+
if [ ! -f "$CVARS_DIR/extractcvars.py" ]; then
75+
echo "ERROR: extractcvars.py not found"
76+
exit 1
77+
fi
78+
if [ ! -f "$CVARS_DIR/nccl_cvars.yaml" ]; then
79+
echo "ERROR: nccl_cvars.yaml not found"
80+
exit 1
81+
fi
82+
if [ ! -f "$CVARS_DIR/nccl_cvars.cc.in" ]; then
83+
echo "ERROR: nccl_cvars.cc.in not found"
84+
exit 1
85+
fi
86+
87+
# Install ruamel-yaml (required by extractcvars.py)
88+
conda install ruamel.yaml --yes # or: pip install ruamel.yaml
89+
90+
# Run the script to generate the files
91+
export NCCL_CVARS_OUTPUT_DIR="$CVARS_DIR"
92+
python3 "$CVARS_DIR/extractcvars.py"
93+
94+
# Verify the files were generated
95+
if [ ! -f "$CVARS_DIR/nccl_cvars.h" ] || [ ! -f "$CVARS_DIR/nccl_cvars.cc" ]; then
96+
echo "ERROR: Failed to generate nccl_cvars files"
97+
exit 1
98+
fi
99+
```
39100

40101
## Changed NCCL CVAR Default values
41102

42-
NCCL_RAS_ENABLE - default value changed from 1 to 0
43-
NCCL_CTRAN_IB_MAX_QPS - default value changed from 1 to 16
44-
NCCL_CTRAN_IB_QP_MAX_MSGS - default value changed from 4 to 128
45-
NCCL_CTRAN_IB_QP_SCALING_THRESHOLD - default value changed from 131072 to 524288
46-
NCCL_CTRAN_IB_QP_CONFIG_XDC - default value changed from "" to "1048576,16,spray,128"
47-
NCCL_CTRAN_IB_QP_CONFIG_XRACK - default value changed from "" to "1048576,16,spray,128"
48-
NCCL_CTRAN_IB_QP_CONFIG_XZONE - default value changed from "" to "1048576,16,spray,128"
49-
NCCL_CTRAN_IB_VC_MODE - default value changed from "spray" to "dqplb"
103+
NCCL_RAS_ENABLE - default value changed from 1 to 0 NCCL_CTRAN_IB_MAX_QPS -
104+
default value changed from 1 to 16 NCCL_CTRAN_IB_QP_MAX_MSGS - default value
105+
changed from 4 to 128 NCCL_CTRAN_IB_QP_SCALING_THRESHOLD - default value changed
106+
from 131072 to 524288 NCCL_CTRAN_IB_QP_CONFIG_XDC - default value changed from
107+
"" to "1048576,16,spray,128" NCCL_CTRAN_IB_QP_CONFIG_XRACK - default value
108+
changed from "" to "1048576,16,spray,128" NCCL_CTRAN_IB_QP_CONFIG_XZONE -
109+
default value changed from "" to "1048576,16,spray,128" NCCL_CTRAN_IB_VC_MODE -
110+
default value changed from "spray" to "dqplb"
50111

51112
## NCCL Baseline Adapter
52113

53-
The NCCL Baseline Adapter API is designed to provide a similar interface to the baseline/third-party NCCL library's `ncclGetEnv` and `ncclLoadParam` functions.
114+
The NCCL Baseline Adapter API is designed to provide a similar interface to the
115+
baseline/third-party NCCL library's `ncclGetEnv` and `ncclLoadParam` functions.

0 commit comments

Comments
 (0)