Skip to content

Bad dphi in ne256 eamxx runs #7862

@amametjanov

Description

@amametjanov

In a 6-month run of --compset F2010-SCREAMv1 --res ne256pg2_ne256pg2 --machine pm-gpu --compiler gnugpu (script below), jobs are running into errors like

$ tail e3sm.log*
219: Bad dphi, dp3d, or vtheta_dp; label: 'DIRK Newton loop nm1'; see hommexx.errlog.256.219
...

$ head hommexx.errlog.256.219
label: DIRK Newton loop nm1
time-level 0            
lat -2.673614897152932e-01 lon  2.998770420815903e+00
ie 936 igll 3 jgll 2 lev 0: bad dphi
level                   dphi                   dp3d              vtheta_dp
    0                   -nan  6.439032159248143e+01  8.787763796686809e+04
...

after ~2 months at YYYYMMDD 20180214.
Run-dir:

/pscratch/sd/a/azamat/e3sm_scratch/pm-gpu/bench/ppe/ne256pg2_ne256pg2.F2010-SCREAMv1.pm-gpu_gnugpu.20251027.ppe.n64.t2/run/

Run-script: run.ne256pg2_ne256pg2.F2010-SCREAMv1.sh
Yaml inputs:

A similar error occurs on --machine aurora --compiler oneapi-ifxgpu:

$ tail e3sm.log*
x4315c4s2b0n0.hsn.cm.aurora.alcf.anl.gov 669: WARNING: Tl1_1 has 1 values <= allowable value.  Resetting to minimum value.
x4314c4s3b0n0.hsn.cm.aurora.alcf.anl.gov 0: bfbhash>           8172 e2675347aabc7a9e (Hommexx)
x4315c4s2b0n0.hsn.cm.aurora.alcf.anl.gov 669: Bad dphi, dp3d, or vtheta_dp; label: 'CaarFunctorImpl::run TagPreExchange'; see hommexx.errlog.768.669
Exiting...

$  head hommexx.errlog.768.669 
label: CaarFunctorImpl::run TagPreExchange
time-level 1
lat -2.469408023496295e-01 lon  2.399145952253143e+00
ie 166 igll 1 jgll 3 lev 121: bad dphi
level                   dphi                   dp3d              vtheta_dp
    0 -1.750491872851586e+04  6.500425338745119e+01  8.627967553816583e+04
...
  120 -9.885440405719305e+01  3.290536127386427e+02  8.899586850071557e+04
  121  4.141349216841650e+02  3.199318230615362e+02  7.667957395378580e+04
  122 -2.574021162628712e+02  3.177201924738347e+02  6.866978494951430e+04
...

Run-dir:

/lus/flare/projects/E3SM_Dec/azamatm/scratch/profiling/ppe/20251028/ne256pg2_ne256pg2.F2010-SCREAMv1.aurora_oneapi-ifxgpu.20251028.ppe.n64/run/

Metadata

Metadata

Assignees

No one assigned

    Labels

    EAMxxC++ based E3SM atmosphere model (aka SCREAM)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions