Skip to content

Conversation

@DeniseWorthen
Copy link
Contributor

@DeniseWorthen DeniseWorthen commented Nov 8, 2024

Pull Request Summary

Removes un-needed allocations and time-interpolation of import fields when CMEPS is used. Note this PR is built on top of the PIO PR. Changes are minimal when compared to that PR.

Description

Introduces a logical flag (use_cmeps) which is false by default. When true, arrays that are not needed at the global domain resolution are minimally allocated and time-interpolation w/in the update routines is skipped.

To test the impact, three cases were run using the same DATM-S2SW configuration used for testing the PIO work on Gaea. Each case was run out 48 hours on 816 WAV tasks, with WW3 restarts written every 2 hours.

The ProfileMemory configuration flag was turned on and memory statistics for the WW3 PETs were pulled out for every 50th PE (18 total traces). Three values were plotted: VmPeak, VmRSS and CommittedPercent. The first is the max resident size, the second is the mean resident size and the third is (as defined here) an interesting number I happened to notice in the output. I'm not sure of this last value, but it seems relevant---for example, if I use too few tasks for WAV, the model crashes and I also see values like CommittedPercent: 166.939 %.

Each of the following figures show the current dev/ufs-weather-model branch (top), the feature/pio4ww3 branch (middle) and this feature branch (bottom):

VmPeak:

vmpeak plot

VmRss:

Note here that the highest trace on the top figure shows the values for the last (naprst) PET.

vmrss plot

CommittedPercent:

Note here that the lowest trace on all figures is the first PET.

pct plot

My interpretation of these results is that a) the PIO feature itself reduces memory usage by reducing the high-memory cost of the naprst task and b) this feature has a measurable impact on the memory footprint of the model. However, no consistent impact on model execution time was found in my limited testing.

Issue(s) addressed

Commit Message

  • minimally allocate unneeded arrays when CMEPS is in use. Skip time-interpolation for imported fields.

Check list

Testing

  • How were these changes tested?
  • Are the changes covered by regression tests? (If not, why? Do new tests need to be added?)
  • Have the matrix regression tests been run (if yes, please note HPC and compiler)?
  • Please indicate the expected changes in the regression test output, (Note the list of known non-identical tests.)
  • Please provide the summary output of matrix.comp (matrix.Diff.txt, matrixCompFull.txt and matrixCompSummary.txt):

I ran this PR against a baseline created with PR #1303. All baselines passed.

DeniseWorthen and others added 30 commits August 28, 2024 08:06
* remove wav_grdout routine, now moved into wav_history_mod
* remove more cap stuff from w3iorsmd. only ww3 native filenaming
is possible w/ binary restarts
* remove ifdef w3_ascii from w3wavemd, since the ascii commit is not
yet present in mesh cap branch
* clean up config variable logic for filenaming
* nrqrs can be non-zero if also using the restart from binary
option
* flpart needs to be set for either historync or not
* move restart and history nc blocks outside of ww3 time testing
block.
* add log print messages for reading and writing restarts
* get logging working correctly for ufs
* fix noclobber variable and file name in wav_history
* clarify some comments
* make binary history files match when restartnc=true
* rework wav_restart_mod, which originally was designed to be able
to read and write restarts for testing purposes from inside wav_comp_nuopc.
* verboselog is true by default, but can be set false by config
* the header text for ww3 logging as it steps through time is now
turned off in w3init and placed into the mesh cap. this allows the
mesh cap to order the logging correctly
* move block where addrstflds was set to before call to w3init
since restarts are read in w3init
* ensure that if nml file lacks a specification of extra fields,
the default value of "unset" will not be returned as a field name
* only ice is added for now
* tab cleanup in w3grid
* need to send explicit array bounds for ice array since
it is 0:nsea
* all baselines b4b against f9531d0
* intialize floutg and floutg2 which are can be unintialized when
waves are in slow loop and historync is true
* local arrays dimension nx or greater are locally allocated
* deallocations added, even though not required for user clarity
* global_input and _output can be allocatable
* minimally allocate fields used in w3updtmd when using cmeps
* use fields provided by cmeps w/o fake time-interpolation
@DeniseWorthen
Copy link
Contributor Author

DeniseWorthen commented May 16, 2025

An update on this PR; I've known for quite some time that I was seeing non-debug gnu test failures with this branch. GNU +debug itself passes, as do all the INTEL,INTEL+DEBUG. I've just tested on Hera w/ the newly re-enabled GNU tests and I get the same result. On Hera, these two tests fail comparison.

cpld_control_p8_gnu
cpld_control_pdlib_p8_gnu

The same two tests pass on Hercules.

EDIT: The Hera+GNU tests do pass against a generated baselines. Since the DEBUG tests pass against the existing baseline, I think the Hera failures are due to a combination of compiler, along w/ order-of-operation differences in the PR branch.

@DeniseWorthen
Copy link
Contributor Author

DeniseWorthen commented May 22, 2025

I repeated the test cases above using the current develop and this feature branch. I ran two test cases on C6, using the DATM-S2SW configuration and the mesh.uglo_15km.nc. All I/O was off for all components. The cases were identical, differing only in the WW3 branch.

The VmRSS for the two cases over a 4-day integration is shown for the develop branch (upper) and this feature branch (lower)

Screenshot 2025-05-22 at 6 30 44 PM Screenshot 2025-05-22 at 6 30 50 PM

The difference in the CommittedPercent values was ~3% over the course of the integration.

@DeniseWorthen DeniseWorthen marked this pull request as ready for review May 27, 2025 14:56
@JessicaMeixner-NOAA
Copy link
Collaborator

@DeniseWorthen -
I have run the standalone WW3 tests and we just get the expected non B4B so I can confirm no unintentional consequences on the standalone side. I also went through the code itself for double checking and thought things looked good.

I'll run these again when it's closer to time, so I'm not putting the full logs but the standalone regtest diffs:

**********************************************************************
********************* non-identical cases ****************************
**********************************************************************
mww3_test_03/./work_PR3_UQ_MPI_e_c                     (1 files differ)
mww3_test_03/./work_PR3_UNO_MPI_e                     (1 files differ)
mww3_test_03/./work_PR2_UQ_MPI_e                     (1 files differ)
mww3_test_03/./work_PR2_UNO_MPI_d2                     (8 files differ)
mww3_test_03/./work_PR1_MPI_d2                     (10 files differ)
mww3_test_03/./work_PR3_UNO_MPI_d2_c                     (13 files differ)
mww3_test_03/./work_PR3_UQ_MPI_d2_c                     (15 files differ)
mww3_test_03/./work_PR3_UNO_MPI_d2                     (8 files differ)
mww3_test_03/./work_PR2_UQ_MPI_d2                     (15 files differ)
mww3_test_03/./work_PR3_UNO_MPI_e_c                     (1 files differ)
mww3_test_03/./work_PR3_UQ_MPI_d2                     (16 files differ)
mww3_test_09/./work_MPI_ASCII                     (0 files differ)
ww3_tp2.10/./work_MPI_OMPH                     (6 files differ)
ww3_tp2.16/./work_MPI_OMPH                     (4 files differ)
ww3_tp2.6/./work_ST4_ASCII                     (0 files differ)
ww3_ufs1.3/./work_a                     (3 files differ)

@DeniseWorthen
Copy link
Contributor Author

@JessicaMeixner-NOAA One thing I noticed (but did not make a change for) is that there seem to be some missing OMP statements in w3updtmd.F90

WW3/model/src/w3updtmd.F90

Lines 999 to 1022 in f27fe03

#ifdef W3_OMPG
!$OMP PARALLEL DO PRIVATE (ISEA,MI2,MXR,MYR,MDARC)
#endif
!
DO ISEA=1, NSEA
!
TAUA(ISEA) = MA0(ISEA) + RD * MAI(ISEA)
#ifdef W3_WNT2
MI2 = SQRT ( RD2 * MA0(ISEA)**2 + &
RD *(MA0(ISEA)+MAI(ISEA))**2 )
TAUA(ISEA) = TAUA(ISEA) * MIN(1.25,MI2/MAX(1.E-7,TAUA(ISEA)))
#endif
TAUADIR(ISEA) = MD0(ISEA) + RD * MDI(ISEA)
#ifdef W3_SMC
!Li Rotate momentum direction by ANGARC for Arctic part cells.
IF( ARCTC .AND. (ISEA .GT. NGLO) ) THEN
MDARC = TAUADIR(ISEA) + ANGARC( ISEA - NGLO )*DERA
TAUADIR(ISEA) = MOD ( TPI + MDARC, TPI )
ENDIF
#endif
!
END DO
!
RETURN

and

WW3/model/src/w3updtmd.F90

Lines 2697 to 2707 in f27fe03

#ifdef W3_OMPG
!$OMP PARALLEL DO PRIVATE (ISEA,RA0,RAI)
#endif
!
DO ISEA=1, NSEA
!
RHOAIR(ISEA) = RA0(ISEA) + RD * RAI(ISEA)
!
END DO
!
RETURN

@JessicaMeixner-NOAA
Copy link
Collaborator

Thanks for pointing that out. I made an issue: #1444 so we can make sure to go back to that.

@JessicaMeixner-NOAA JessicaMeixner-NOAA self-requested a review May 29, 2025 18:35
Copy link
Collaborator

@JessicaMeixner-NOAA JessicaMeixner-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

**********************************************************************
********************* non-identical cases ****************************
**********************************************************************
mww3_test_03/./work_PR1_MPI_e                     (1 files differ)
mww3_test_03/./work_PR3_UQ_MPI_e_c                     (1 files differ)
mww3_test_03/./work_PR3_UNO_MPI_e                     (1 files differ)
mww3_test_03/./work_PR2_UQ_MPI_e                     (1 files differ)
mww3_test_03/./work_PR2_UNO_MPI_d2                     (15 files differ)
mww3_test_03/./work_PR1_MPI_d2                     (10 files differ)
mww3_test_03/./work_PR3_UNO_MPI_d2_c                     (16 files differ)
mww3_test_03/./work_PR3_UQ_MPI_d2_c                     (15 files differ)
mww3_test_03/./work_PR3_UNO_MPI_d2                     (15 files differ)
mww3_test_03/./work_PR2_UQ_MPI_d2                     (15 files differ)
mww3_test_03/./work_PR3_UQ_MPI_e                     (1 files differ)
mww3_test_03/./work_PR3_UNO_MPI_e_c                     (1 files differ)
mww3_test_03/./work_PR3_UQ_MPI_d2                     (15 files differ)
mww3_test_09/./work_MPI_ASCII                     (0 files differ)
ww3_tp2.10/./work_MPI_OMPH                     (7 files differ)
ww3_tp2.16/./work_MPI_OMPH                     (4 files differ)
ww3_tp2.6/./work_ST4_ASCII                     (0 files differ)
ww3_ufs1.3/./work_a                     (2 files differ)

matrixCompFull.txt
matrixCompSummary.txt
matrixDiff.txt

Thank you @DeniseWorthen !!

@JessicaMeixner-NOAA JessicaMeixner-NOAA merged commit 9939046 into NOAA-EMC:dev/ufs-weather-model Jul 1, 2025
3 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants