Skip to content

For perlmutter, revert GNU compiler version from 13.2 to 12.3 #7871

@ndkeen

Description

@ndkeen

In early October, I increased several module versions on pm-cpu/pm-gpu, including the GNU compiler to the current machine default 13.2 #7740. All of the vanilla e3sm tests passed and were BFB. Turns out there was a failing eamxx conus test on pm-gpu that I must have missed #7843. And now we also see there are DEBUG eamxx tests with pm-cpu that sporadically fail #7842. Debugging these, one theory is that they are related to issues with openmp threads (dont see issues without threads), and they may be related. Reverting back to GNU 12.3 (and leaving all other modules the same), I see these tests pass as well as other tests. The vanilla e3sm cases are still BFB, but eamx cases are not -- they were also not BFB when moving to 13.2, so that makes sense. I propose we make this change (revert to 12.3) to see the tests pass and then try to investigate why we see these fails with version 13.2.

Looking at performance, it does not seem to have much, if any impact. I ran ne256 without IO for 5 days using 32,64, and 128 nodes to compare directly the branch with GNU version 12.3 and 13.2. The perf looks to be the same within timing noise.

Tested so far with gnu on pm-cpu: e3sm_developer, e3sm_eamxx_v1, a test with netcdf-4 input
Tested so far with gnugpu on pm-gpu: e3sm_eamxx_v1, e3sm_eamxx_large, and several ne256 performance tests, a test with netcdf-4 input

Metadata

Metadata

Assignees

Labels

EAMxxC++ based E3SM atmosphere model (aka SCREAM)GNUGNU compiler related issuesMachine Filespm-cpuPerlmutter at NERSC (CPU-only nodes)pm-gpuPerlmutter machine at NERSC (GPU nodes)

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions