Skip to content

[Example] Make rdvz work with multi-node SkyPilot clusters #4140

Open
@Michaelvll

Description

@Michaelvll

rdvz fail to work with SkyPilot multi-node cluster (probably on k8s).

https://github.com/stas00/ml-engineering/blob/master/network/benchmarks/all_reduce_bench.py

Version & Commit info:

  • sky -v: PLEASE_FILL_IN
  • sky -c: PLEASE_FILL_IN

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions