-
Notifications
You must be signed in to change notification settings - Fork 341
Description
Brief summary of bug
In what will be ctsm5.3.016 the following two tests fail at the run step in Initialization
ERP_P64x2_Ld765.f10_f10_mg37.I2000Clm60BgcCrop.derecho_intel.clm-monthly (NLCOMP RUN)
ERS_P128x1_Ld765.f10_f10_mg37.I2000Clm60Fates.derecho_intel.clm-FatesColdNoComp (NLCOMP RUN)
General bug information
CTSM version you are using: ctsm5.3.016
Does this bug cause significantly incorrect results in the model's science? No
Configurations affected: Maybe ER tests for 765 days?
Details of bug
Important details of your setup / configuration so we can reproduce the bug
The initial case runs fine, it's the restart step that fails in the case2/$CASE directory.
Important output or errors that show the problem
ERP_P64x2_Ld765.f10_f10_mg37.I2000Clm60BgcCrop.derecho_intel.clm-monthly fails as follows, only the cesm.log file exists.
cesm.log
dec2343.hsn.de.hpc.ucar.edu 0: (t_initf) profile_outpe_stride= 0
dec2343.hsn.de.hpc.ucar.edu 0: (t_initf) profile_single_file= F
dec2343.hsn.de.hpc.ucar.edu 0: (t_initf) profile_global_stats= T
dec2343.hsn.de.hpc.ucar.edu 0: (t_initf) profile_ovhd_measurement= F
dec2343.hsn.de.hpc.ucar.edu 0: (t_initf) profile_add_detail= F
dec2343.hsn.de.hpc.ucar.edu 0: (t_initf) profile_papi_enable= F
dec2343.hsn.de.hpc.ucar.edu 0: ESMF_Finalize: Error closing trace stream
dec2343.hsn.de.hpc.ucar.edu 0: MPICH ERROR [Rank 0] [job id 2dd16cc6-e949-427e-bb59-48726c16f9fa] [Wed Dec 18 15:47:41 2024] [dec2343] - Abort(1) (rank 0 in comm 496): application called MPI_Abort(comm=0x84000002, 1) - process
0
dec2343.hsn.de.hpc.ucar.edu 0:
dec2343.hsn.de.hpc.ucar.edu 0: forrtl: severe (174): SIGSEGV, segmentation fault occurred
dec2343.hsn.de.hpc.ucar.edu 0: Image PC Routine Line Source
dec2343.hsn.de.hpc.ucar.edu 0: libpthread-2.31.s 000015004133C8C0 Unknown Unknown Unknown
dec2343.hsn.de.hpc.ucar.edu 0: libmpi_intel.so.1 000015003F2FBE7E Unknown Unknown Unknown
dec2343.hsn.de.hpc.ucar.edu 0: libmpi_intel.so.1 000015003F10A22F Unknown Unknown Unknown
dec2343.hsn.de.hpc.ucar.edu 0: libmpi_intel.so.1 000015003D7376A8 MPI_Abort Unknown Unknown
dec2343.hsn.de.hpc.ucar.edu 0: libesmf.so 0000150049332277 _ZN5ESMCI3VMK5abo Unknown Unknown
dec2343.hsn.de.hpc.ucar.edu 0: libesmf.so 0000150049330814 _ZN5ESMCI2VM5abor Unknown Unknown
dec2343.hsn.de.hpc.ucar.edu 0: libesmf.so 00001500493476E5 c_esmc_vmabort_ Unknown Unknown
dec2343.hsn.de.hpc.ucar.edu 0: libesmf.so 0000150049B5C7A8 esmf_vmmod_mp_esm Unknown Unknown
dec2343.hsn.de.hpc.ucar.edu 0: libesmf.so 00001500499CC1EE esmf_initmod_mp_e Unknown Unknown
dec2343.hsn.de.hpc.ucar.edu 0: cesm.exe 0000000000433ADA MAIN__ 132 esmApp.F90
dec2343.hsn.de.hpc.ucar.edu 0: cesm.exe 00000000004230FD Unknown Unknown Unknown
dec2343.hsn.de.hpc.ucar.edu 0: libc-2.31.so 000015003C7E129D __libc_start_main Unknown Unknown
dec2343.hsn.de.hpc.ucar.edu 0: cesm.exe 000000000042302A Unknown Unknown Unknown
It looks like the problem is that DRV_RESTART_POINTER is wrong for case2 as we see here:
./xmlquery DRV_RESTART_POINTER
DRV_RESTART_POINTER: rpointer.cpl.2001-01-18-00000
(ctsm_pylib) case2/ERP_P64x2_Ld765.f10_f10_mg37.I2000Clm60BgcCrop.derecho_intel.clm-monthly.GC.ctsm5316acl_int> ls ../../run/rpointer.cpl.*
../../run/rpointer.cpl.2001-01-19-00000The other problem is that there isn't graceful error reporting that the rpointer file asked for doesn't exist and what needs to be done about it.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status