Open
Description
I recently encountered a cryptic error message when working with the ParticleHistogram2D
reduced diagnostics.
I forgot to set the parameter value_function(t,x,y,z,ux,uy,uz,w)
. It would be good if we had something that told the user more clearly what to do.
How to reproduce:
Just run the example at Examples/Physical_applications/laser_ion
but comment out PhaseSpaceElectrons.value_function(t,x,y,z,ux,uy,uz,w) = "w"
from inputs_2d
before running it.
Tested on one node on Crusher (OLCF)
Error output
Memory access fault by GPU node-9 (Agent handle: 0x2ac0aa0) on address (nil). Reason: Unknown.
SIGABRT
Memory access fault by GPU node-6 (Agent handle: 0x2ac0aa0) on address (nil). Reason: Unknown.
SIGABRT
Memory access fault by GPU node-7 (Agent handle: 0x2ac0aa0) on address (nil). Reason: Unknown.
SIGABRT
Memory access fault by GPU node-8 (Agent handle: 0x2ac0aa0) on address (nil). Reason: Unknown.
SIGABRT
See Backtrace.0 file for details
See Backtrace.2 file for details
See Backtrace.1 file for details
See Backtrace.3 file for details
MPICH ERROR [Rank 3] [job id 424102.0] [Mon Dec 18 21:15:34 2023] [crusher020] - Abort(6) (rank 3 in comm 496): application called MPI_Abort(comm=0x84000001, 6) - process 3
Segfault
MPICH ERROR [Rank 0] [job id 424102.0] [Mon Dec 18 21:15:35 2023] [crusher020] - Abort(6) (rank 0 in comm 496): application called MPI_Abort(comm=0x84000002, 6) - process 0
Segfault
MPICH ERROR [Rank 2] [job id 424102.0] [Mon Dec 18 21:15:35 2023] [crusher020] - Abort(6) (rank 2 in comm 496): application called MPI_Abort(comm=0x84000001, 6) - process 2
Segfault
MPICH ERROR [Rank 1] [job id 424102.0] [Mon Dec 18 21:15:35 2023] [crusher020] - Abort(6) (rank 1 in comm 496): application called MPI_Abort(comm=0x84000001, 6) - process 1
Segfault
See Backtrace.3 file for details
See Backtrace.0 file for details
See Backtrace.2 file for details
See Backtrace.1 file for details
MPICH ERROR [Rank 3] [job id 424102.0] [Mon Dec 18 21:15:38 2023] [crusher020] - Abort(11) (rank 3 in comm 496): application called MPI_Abort(comm=0x84000001, 11) - process 3
MPICH ERROR [Rank 0] [job id 424102.0] [Mon Dec 18 21:15:38 2023] [crusher020] - Abort(11) (rank 0 in comm 496): application called MPI_Abort(comm=0x84000002, 11) - process 0
MPICH ERROR [Rank 2] [job id 424102.0] [Mon Dec 18 21:15:38 2023] [crusher020] - Abort(11) (rank 2 in comm 496): application called MPI_Abort(comm=0x84000001, 11) - process 2
MPICH ERROR [Rank 1] [job id 424102.0] [Mon Dec 18 21:15:38 2023] [crusher020] - Abort(11) (rank 1 in comm 496): application called MPI_Abort(comm=0x84000001, 11) - process 1
srun: error: crusher020: task 0: Segmentation fault
srun: Terminating StepId=424102.0
slurmstepd: error: *** STEP 424102.0 ON crusher020 CANCELLED AT 2023-12-18T21:15:38 ***
srun: error: crusher020: tasks 1-2: Segmentation fault
srun: error: crusher020: tasks 4-7: Terminated
srun: error: crusher020: task 3: Segmentation fault (core dumped)
srun: Force Terminated StepId=424102.0