Skip to content

ncmpi_create Stalls When Using High MPI Rank Counts #142

@Masterwater-y

Description

@Masterwater-y

I'm encountering an issue where the ncmpi_create function appears to stall when running my application with a high number of MPI processes. Specifically, the program hangs at the ncmpi_create call when attempting to create a new NetCDF file.

My Netcdf version is 1.12.1, FLAGS as below

grep "CFLAGS" /home/yhl/green_suite/install/files/pnetcdf-1.12.1/Makefile

CFLAGS = -g -O2 -fPIC
CONFIGURE_ARGS_CLEAN = --prefix=/home/cluster-opt/pnetcdf --enable-shared --enable-fortran --enable-large-file-test CFLAGS="-g -O2 -fPIC" CXXFLAGS="-g -O2 -fPIC" FFLAGS="-g -fPIC" FCFLAGS="-g -fPIC" F90LDFLAGS="-fPIC" FLDFLAGS="-fPIC" LDFLAGS="-fPIC"
FCFLAGS = -g -fPIC
FCFLAGS_F = 
FCFLAGS_F90 = 
FCFLAGS_f = 
FCFLAGS_f90 =

I executed the command below and it will stall at ncmpi_create, there are 4 nodes and each node has 96 cores

mpirun -n 384 -hosts controller1,compute1,compute2storage,compute3storage ./test ./output.nc

if I reduce the number of rank, like mpirun -n 256, it can work.
I want to know what might be causing this, whether it's a network bottleneck or a disk bottleneck, or OS options

My code

#include <stdlib.h>
#include <mpi.h>
#include <pnetcdf.h>
#include <stdio.h>

static void handle_error(int status, int lineno)
{
    fprintf(stderr, "Error at line %d: %s\n", lineno, ncmpi_strerror(status));
    MPI_Abort(MPI_COMM_WORLD, 1);
}

int main(int argc, char **argv) {

    int ret, ncfile, nprocs, rank, dimid1, dimid2, varid1, varid2, ndims;
    MPI_Offset start, count=1;
    int t, i;
    int v1_dimid[2];
    MPI_Offset v1_start[2], v1_count[2];
    int v1_data[4];
    char buf[13] = "Hello World\n";
    int data;

    MPI_Init(&argc, &argv);

    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &nprocs);

    if (argc != 2) {
        if (rank == 0) printf("Usage: %s filename\n", argv[0]);
        MPI_Finalize();
        exit(-1);
    }

    ret = ncmpi_create(MPI_COMM_WORLD, argv[1],
                       NC_CLOBBER, MPI_INFO_NULL, &ncfile);
    if (ret != NC_NOERR) handle_error(ret, __LINE__);

    ret = ncmpi_def_dim(ncfile, "d1", nprocs, &dimid1);
    if (ret != NC_NOERR) handle_error(ret, __LINE__);

    ret = ncmpi_def_dim(ncfile, "time", NC_UNLIMITED, &dimid2);
    if (ret != NC_NOERR) handle_error(ret, __LINE__);

    v1_dimid[0] = dimid2;
    v1_dimid[1] = dimid1;
    ndims = 2;

    ret = ncmpi_def_var(ncfile, "v1", NC_INT, ndims, v1_dimid, &varid1);
    if (ret != NC_NOERR) handle_error(ret, __LINE__);

    ndims = 1;

    ret = ncmpi_def_var(ncfile, "v2", NC_INT, ndims, &dimid1, &varid2);
    if (ret != NC_NOERR) handle_error(ret, __LINE__);

    ret = ncmpi_put_att_text(ncfile, NC_GLOBAL, "string", 13, buf);
    if (ret != NC_NOERR) handle_error(ret, __LINE__);

    /* all processors defined the dimensions, attributes, and variables,
     * but here in ncmpi_enddef is the one place where metadata I/O
     * happens.  Behind the scenes, rank 0 takes the information and writes
     * the netcdf header.  All processes communicate to ensure they have
     * the same (cached) view of the dataset */

    ret = ncmpi_enddef(ncfile);
    if (ret != NC_NOERR) handle_error(ret, __LINE__);

    start=rank, count=1, data=rank;

    ret = ncmpi_put_vara_int_all(ncfile, varid2, &start, &count, &data);
    if (ret != NC_NOERR) handle_error(ret, __LINE__);

    for (t = 0; t<2; t++){

        v1_start[0] = t, v1_start[1] = rank;
        v1_count[0] = 1, v1_count[1] = 1;
        for (i = 0; i<4; i++){
            v1_data[i] = rank+t;
        }
        
        /* in this simple example every process writes its rank to two 1d variables */
        ret = ncmpi_put_vara_int_all(ncfile, varid1, v1_start, v1_count, v1_data);
        if (ret != NC_NOERR) handle_error(ret, __LINE__);

    }
    
    ret = ncmpi_close(ncfile);
    if (ret != NC_NOERR) handle_error(ret, __LINE__);

    MPI_Finalize();

    return 0;
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions