Description
A request from our user:
Can sky provide some lightweight alternative to
mpirun
that achieves the equivalent of “running certain command on all nodes of a cluster initiated from head node, and block until this finishes on all nodes before proceeding with the next bash command”?
Right now I think there is only a barrier between setup and run, but further coordination within run is not supported by sky. I’m hoping that we can get rid of the openmpi dependency, and one of the current usage pattern is in the sky run section do something like
if [ "${SKY_NODE_RANK}" == "0" ]; then
for _ in `seq max_attempt`; do
mpirun ... kill_dangling_processes_on_all_machines
mpirun ... build_target_on_all_machines
mpirun ... run_target_on_all_machines
done
fi
run: |
for _ in `seq max_attempt`; do
# kill_dangling_processes_on_all_machines
sky_barrier
# build_target_on_all_machines
sky_barrier
# run_target_on_all_machines
sky_barrier
done
The sky_barrier
should be a bash function with the following behavior:
- Create an indicator file
/tmp/{task_id}-node-{SKYPILOT_NODE_RANK}-stage-{s}
on the first node (head node of the task) in theSKYPILOT_NODE_IPS
, wheres
is the counter within the current task. - Keep checking if there are
echo "$SKYPILOT_NODE_IPS" | wc -l
indicator files on the head node of the task, before we continue the execution.
The user further requested that the returncode is needed for the commands:
to further clarify - ideally there can be a command somewhat like mpirun , where you call it from the head node like:
mpirun some_command arg1 arg2 …
and the effect is that it will runsome_command arg1 arg2 …
on all the machines and wait until it finishes on all machines, and return code 0 only if all of them exited with code 0
I think just providing a sky_barrier won’t be enough since there still needs to be some way to initiate the commands from the head node - where the decision logic of what further commands to execute depends on the return code of previous commands.
A possible solution (confirmed by the user):
run: |
if [ "${SKY_NODE_RANK}" == "0" ]; then
for i in `seq max_retry`; do
sky run_on_nodes command arg1 arg2 # exitcode 0 if succeeded; otherwise, nonzero.
sky run_on_nodes command arg1 arg2
done
fi
implementation details:
- We add the
run_on_nodes
in our clis but not expose it to thesky --help
- For the
run_on_nodes
:
a. check the environment variable$SKYPILOT_NODE_RANK
to make sure it is only executed on the head node.
b. execute the command on all ofSKYPILOT_NODE_IPS
with ssh.
c. aggregate the exit codes from all the nodes.