Description
Hello.
Recently I've been doing mpi job for extensive data processing.
However, I'm getting the "Can't perform independent write when MPI_File_sync is required by ROMIO driver." error from the log.
The symptom is like the following:
- works well in the local (master) machine.
- when runs on nodes, it gives the error "Can't perform independent write when MPI_File_sync is required by ROMIO driver.".
- with dxpl_mpio=:collective, it stucks at write.
The local machine is connected directly to a disk, which I will write the hdf5 file, and the remote device was connected to a disk with NFS.
My question is, why does that error appear?
Does it appear because it uses NFS?
If this is avoidable, then how?
And also, In the article "https://www.hdfgroup.org/2015/08/parallel-io-with-hdf5/" there is H5Sselect_none operation for collective mode. Does HDF5.jl have similar functionality? If so, how can I use it?
Thanks.
Here is my test code.
using HDF5
using MPI
function main()
@assert HDF5.has_parallel()
MPI.Init()
comm = MPI.COMM_WORLD
info = MPI.Info()
ff = h5open("test.h5", "w", comm, info)
MPI.Barrier(comm)
Nproc = MPI.Comm_size(comm)
myrank = MPI.Comm_rank(comm)
M = 10
A = fill(myrank, M, 2) # local data
dims = (M, Nproc*2+1) # dimensions of global data
# Create dataset
@show "Create dataset"
dset = create_dataset(ff, "/data", datatype(eltype(A)), dataspace(dims), chunk=(M, 2), dxpl_mpio=:collective)
@show "After dataset"
# Write local data
dset[:, 2*myrank + 1:2*myrank + 2] = A
@show "After write dataset"
close(ff)
MPI.Finalize()
end
main()
And my result of "MPIPreferences.use_system_binary()".
julia> MPIPreferences.use_system_binary()
┌ Info: MPI implementation identified
│ libmpi = "libmpi"
│ version_string = "MPICH Version: 4.1.2\nMPICH Release date: Wed Jun 7 15:22:45 CDT 2023\nMPICH ABI: 15:1:3\nMPICH Device: ch4:ofi\nMPICH configure: --prefix=/home/---/tools/mpich --with-ucx=/home/---/tools/ucx\nMPICH CC: /home/---/tools/gcc/bin/gcc -O2\nMPICH CXX: /home/hyunwook/tools/gcc/bin/g++ -O2\nMPICH F77: /home/---/tools/gcc/bin/gfortran -O2\nMPICH FC: /home/---/tools/gcc/bin/gfortran -O2\n"
│ imply = "MPICH"
│ version = v"4.1.2"
└ abi = "MPICH"
┌ Info: MPIPreferences unchanged
│ binary = "system"
│ libmpi = "libmpi"
│ abi = "MPICH"
│ pieces = "mpiexec"
│ preloads = Any[]
└ preloads_env_switch = nothing
Run script (for sbatch)
#!/bin/bash
#SBATCH -J hdf5_test
#SBATCH -o stdout_log.o%j
#SBATCH -N 1
#SBATCH -n 32
mpiexec.hydra -np $SLURM_NTASKS julia test.jl
My Env
- Centos7.5
- slurm with hydra
- HDF5 1.14.1
- GCC 13.2.0
- MPICH 4.1.2
(Yes, I built HDF5, GCC, and MPICH from the source)