Open
Description
rdvz fail to work with SkyPilot multi-node cluster (probably on k8s).
https://github.com/stas00/ml-engineering/blob/master/network/benchmarks/all_reduce_bench.py
Version & Commit info:
sky -v
: PLEASE_FILL_INsky -c
: PLEASE_FILL_IN