Skip to content

Conversation

@ndkeen
Copy link
Contributor

@ndkeen ndkeen commented Oct 10, 2024

For ne4 cases, use only 96 tasks as scream requires no more MPI's than number of elements.
SMS.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.pm-cpu_intel

Unrelated: Rename a machinefile to reflect machine name for gcp12 builds with scream.
For this change it fixes E3SM-Project/scream#3036 (at least the build issue)

[bfb]

Rename a machinefile to reflect machine name
@ndkeen ndkeen added Machine Files EAMxx C++ based E3SM atmosphere model (aka SCREAM) GCP google cloud platform pm-cpu Perlmutter at NERSC (CPU-only nodes) labels Oct 10, 2024
@ndkeen ndkeen requested a review from mahf708 October 10, 2024 22:36
Copy link
Contributor

@mahf708 mahf708 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@github-actions
Copy link

PR Preview Action v1.4.8
🚀 Deployed preview to https://E3SM-Project.github.io/E3SM/pr-preview/pr-6676/
on branch gh-pages at 2024-10-10 22:37 UTC

@ndkeen ndkeen added the BFB PR leaves answers BFB label Oct 10, 2024
@rljacob
Copy link
Member

rljacob commented Oct 11, 2024

Why would EAMxx complain about this and not EAM?

@ndkeen
Copy link
Contributor Author

ndkeen commented Oct 11, 2024

I have asked that as well. Must be different paths, but both repos have same:

homme/src/share/prim_driver_base.F90

    ! we want to exit elegantly when we are using too many processors.                                                                     
    if (nelem < par%nprocs) then
       call abortmp('Error: too many MPI tasks. set dyn_npes <= nelem')
    end if

It was so elegant. The most elegant.

ndkeen added a commit that referenced this pull request Oct 14, 2024
…ts' into next (PR #6676)

For ne4 cases, use only 96 tasks as scream requires no more MPI's than number of elements.
SMS.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.pm-cpu_intel

Unrelated: Rename a machinefile to reflect machine name for gcp12 builds with scream.
For this change it fixes E3SM-Project/scream#3036 (at least the build issue)

[bfb]
@rljacob
Copy link
Member

rljacob commented Oct 15, 2024

This changed the layout for several e3sm_integration tests:
ERP_Ln9.ne4pg2_oQU480.WCYCL20TRNS-MMF1.pm-cpu_intel.allactive-mmf_fixed_subcycle
ERS.ne4pg2_oQU480.WCYCL1850NS.pm-cpu_intel
ERS_Vmoab.ne4pg2_oQU480.WCYCL1850NS.pm-cpu_intel
NCK.ne4pg2_oQU480.WCYCL1850NS.pm-cpu_intel

If you want to keep this, the Vmoab test diffs needs to be blessed and those all have namelist diffs to bless.

@ndkeen
Copy link
Contributor Author

ndkeen commented Oct 15, 2024

Makes sense. I could try to make another entry in pelayout XML to match ne4 with SCREAM, but it might be best to just run all ne4 cases with 96 MPI's on pm-cpu. What do we think? Before this change, ne4 tests would have used a full node, which is 128 MPI's on pm-cpu.

@rljacob
Copy link
Member

rljacob commented Oct 15, 2024

Its fine just making the change for all cases.

@ndkeen ndkeen merged commit a1c3cb0 into master Oct 15, 2024
9 checks passed
@ndkeen ndkeen deleted the ndk/machinefiles/pm-cpu-pelayout-updates-for-scream-tests branch October 15, 2024 23:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

BFB PR leaves answers BFB EAMxx C++ based E3SM atmosphere model (aka SCREAM) GCP google cloud platform Machine Files pm-cpu Perlmutter at NERSC (CPU-only nodes)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Build error with new test SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.gcp12_gnu on gcp12

4 participants