-
Notifications
You must be signed in to change notification settings - Fork 462
Open
Description
Describe the bug
When the LNX provider is using multiple NICs, messages can be delivered out of order. The lnx_select_send_endpoints() function round robins between the NICs. If message A is sent on NIC 0, then message B is sent on NIC 1. The problem is that nothing prevents message B from arriving at the destination and being matched before message A.
To Reproduce
export FI_LNX_PROV_LINKS="shm+cxi"
Run the following MPI program on two nodes with one rank each. I ran this with Open MPI 5.0.8, but the version shouldn't matter.
#include <mpi.h>
#include <stdlib.h>
#include <stdio.h>
#define TRANSFERS (1024 * 1024)
static int sdata[TRANSFERS];
static int rdata[TRANSFERS];
int
main(int argc, char *argv[])
{
int rank, size, bad_count = 0;
for (int i = 0; i < TRANSFERS; i++) {
sdata[i] = i;
}
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Barrier(MPI_COMM_WORLD);
for (int i = 0; i < TRANSFERS; i++) {
if (rank == 0) {
MPI_Send(&sdata[i], 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
} else {
MPI_Recv(&rdata[i], 1, MPI_INT, 0, 0, MPI_COMM_WORLD,
MPI_STATUS_IGNORE);
}
}
if (rank == 1) {
for (int i = 0; i < TRANSFERS; i++) {
if (rdata[i] != i)
bad_count++;
}
fprintf(stderr, "Bad count %d %f\n",
bad_count, (float)bad_count / TRANSFERS);
}
MPI_Finalize();
return 0;
}
On my current system I'm seeing a 0.25% error rate, but results will vary.