Skip to content

Support barrier for in the run section #1762

Open
@Michaelvll

Description

@Michaelvll

A request from our user:

Can sky provide some lightweight alternative to mpirun that achieves the equivalent of “running certain command on all nodes of a cluster initiated from head node, and block until this finishes on all nodes before proceeding with the next bash command”?
Right now I think there is only a barrier between setup and run, but further coordination within run is not supported by sky. I’m hoping that we can get rid of the openmpi dependency, and one of the current usage pattern is in the sky run section do something like

if [ "${SKY_NODE_RANK}" == "0" ]; then
  for _ in `seq max_attempt`; do
    mpirun ... kill_dangling_processes_on_all_machines
    mpirun ... build_target_on_all_machines
    mpirun ... run_target_on_all_machines
  done
fi
A possible solution:
run: |
  for _ in `seq max_attempt`; do
    # kill_dangling_processes_on_all_machines
    sky_barrier
    # build_target_on_all_machines
    sky_barrier
    # run_target_on_all_machines
    sky_barrier
  done

The sky_barrier should be a bash function with the following behavior:

  1. Create an indicator file /tmp/{task_id}-node-{SKYPILOT_NODE_RANK}-stage-{s} on the first node (head node of the task) in the SKYPILOT_NODE_IPS, where s is the counter within the current task.
  2. Keep checking if there are echo "$SKYPILOT_NODE_IPS" | wc -l indicator files on the head node of the task, before we continue the execution.

The user further requested that the returncode is needed for the commands:

to further clarify - ideally there can be a command somewhat like mpirun , where you call it from the head node like:
mpirun some_command arg1 arg2 … and the effect is that it will run some_command arg1 arg2 … on all the machines and wait until it finishes on all machines, and return code 0 only if all of them exited with code 0
I think just providing a sky_barrier won’t be enough since there still needs to be some way to initiate the commands from the head node - where the decision logic of what further commands to execute depends on the return code of previous commands.

A possible solution (confirmed by the user):

run: |
  if [ "${SKY_NODE_RANK}" == "0" ]; then
    for i in `seq max_retry`; do
      sky run_on_nodes command arg1 arg2 # exitcode 0 if succeeded; otherwise, nonzero.
      sky run_on_nodes command arg1 arg2
    done
  fi

implementation details:

  1. We add the run_on_nodes in our clis but not expose it to the sky --help
  2. For the run_on_nodes:
    a. check the environment variable $SKYPILOT_NODE_RANK to make sure it is only executed on the head node.
    b. execute the command on all of SKYPILOT_NODE_IPS with ssh.
    c. aggregate the exit codes from all the nodes.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions