What I'd like:
nvbandwidth uses hwloc-ls to bind to the correct memory for Open MPI workloads. To get the best performance, this should be available in the OS for EC2 instances like P6e-GB200.36xlarge.
Any alternatives you've considered:
Not adding this allows the tool to "fall back" to the less performant method without binding to shared memory.