Skip to content

[Rocm][CI] Enable 2-node test#264

Open
charlifu wants to merge 3 commits intovllm-project:mainfrom
charlifu:amd/enable_2node_test
Open

[Rocm][CI] Enable 2-node test#264
charlifu wants to merge 3 commits intovllm-project:mainfrom
charlifu:amd/enable_2node_test

Conversation

@charlifu
Copy link

@charlifu charlifu commented Jan 8, 2026

This PR tries to enable 2 node test by updating the command to use run-multi-node-test.sh when the step.num_nodes >= 2.

Signed-off-by: charlifu <charlifu@amd.com>
@charlifu charlifu changed the title first try [Rocm][CI] Enable 2-node test Jan 8, 2026
@Alexei-V-Ivanov-AMD Alexei-V-Ivanov-AMD self-requested a review January 9, 2026 02:17
Copy link
Collaborator

@Alexei-V-Ivanov-AMD Alexei-V-Ivanov-AMD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking at the test group definition and it looks like a mixture of commands that should be run "bare-metal" (i.e. through the run-multi-node-test.sh, which spawns the provided container image for each of the two nodes) and things, like pytest invocations that are supposed to be running inside a single container (i.e. our standard testing path we do for all other tests).

Both single-image pytests are succeeding in my local run, as well as two first 2-node tests, however, the third is failing for me locally.

We need to change this PR to enable correct execution of the "2 Node Tests (4 GPUs in total) " test group.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants