Skip to content

Conversation

@xylar
Copy link

@xylar xylar commented May 30, 2025

This merge brings in Aurora support (e.g. E3SM-Project#7357), among many other changes.

Checklist

  • Testing
    • A comment in the PR documents testing used to verify the changes including any tests that are added/modified/impacted.

jgfouca and others added 30 commits May 12, 2025 10:06
…7269)

EAMxx: Using nested parallel_fors/scans in cloud_mod(mo_photo)

Updating the MAM4xx submodule to bring changes to the photo routines. See PR 435

H2O2   (lev,ncol,time)  t_index =      6     6
         38    15696  (    65,   213,     1) (     2,   196,     1) (    66,   192,     1) (    20,   135,     1)
               15696   2.348843120003608E-09   8.023946418062437E-18 2.8E-17  2.378099939193135E-10 3.3E-10  2.059470215892700E-12
               15696   2.348843120003608E-09   8.023946418062437E-18          2.378100216748891E-10          2.059471300094873E-12
               15696  (    65,   213,     1) (     2,   196,     1)
          avg abs field values:    2.107698182651774E-10    rms diff: 3.0E-19   avg rel diff(npos):  3.3E-10

In Frontier, the test ERS.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.frontier_craygnu-hipcc.eamxx-mam4xx-all_mam4xx_procs runs fine.

In Aurora, I merged PR 7234 into my local master branch and rebased this branch. The test
SMS_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.aurora_oneapi-ifxgpu.eamxx-mam4xx-all_mam4xx_procs runs fine.

[NBFB]; the baseline test mam4_aero_microphys_standalone_baseline_cmp fails for gcc-cuda/sp for one
variable with a small difference.

* odiazib/eamxx/mo_photo_cloud_mod:
  EAMxx: Update the MAM4xx submodules with the latest changes in mo_photo.
Adds instructions to handle FAN pools during transient land use and
skip cells that have small column or landunit weight.

Fixes #6743
EAMX: SYCL compiles MAM for Aurora

Removes logic to disable MAM4xx for SYCL builds.

[BFB]

* singhbalwinder/eamxx/sycl-compiles-mam:
  SYCL compiles MAM for Aurora
…7322)

EAMxx: Modify MAM4xx tests to use MAM4xx compset

Revisited all MAM4xx CIME tests to change their compset to the new MAM4xx
compset.

Change all SMS_D tests to REP
Adds another test suite for debug (_D) tests
[NBFB] as test name will change, requiring re-blessing

* singhbalwinder/eamxx/change-mama4xx-tests-compsets:
  Modify MAM4xx test to use MAM4xx compset and change test from SMS to REP
  Factors out the defaults in a seprate script
  Deletes unused files
  Modified compset and all tests are working on Chrysalis
Port eamxx standalone to lychee

I was getting memory errors in SHOC at first. I think the issue is
that the last kernel in SHOCPreprocess::operator() is not over
nlev_packs, so I replaced it with a deep copy since it was just
setting things to zero.

[BFB]
We cannot look at the smallest GID in the map file
to figure out the base index for GIDs in the map file,
since the map need not be surjective or total, meaning
that there may be elements in the codomain or domain
that are not mapped (to or from) by the map.

To fix this, we accept a map file "base_gid" when building
the remapper data. For now, default to 1, which is the
typical value for ncremap files
Remove link-flag -Xsycl-target-backend from compile flags
ZM cleanup - Create zm_conv_cape module

This phase of the ZM cleanup was oriented around moving the
buoyan_dilute() and parcel_dilute() subroutines out of the zm_conv
module and into a new zm_conv_cape module. The primary routine was
renamed to compute_dilute_cape and various portions of code were broken
into private subroutines to improve modularity. I also extensively
renamed variables throughout the zm_conv_cape module for better
readability and grep-ability.

As part of the interface cleanup for this I decided that creating
derived types fo hold ZM constants and namelist parameters would bee
beneficial. I created the zm_conv_types module to contain the
definition of derived types zm_const_t and zm_param_t in addition to
some subroutines for initialization andMPI broadcasting.

[non-BFB] only for GNU on pm-cpu

* whannah/eam/zm-cleanup-08:
  revert "cld" terminology to lcl/eql
  major cleanup and renaming
  update call to compute_dilute_cape
  add default value for zm_const%zvir
  update misc_diagnostics.F90
  update conditional_diag_main.F90
  minor clean up
  add updated zm_conv_util.F90
  add zm_conv_types module
  remove util functions from zm_conv.F90
  update misc_diagnostics.F90
  minor zm_conv_cape.F90 update
  update zm_conv.F90
  move zm_conv_parcel.F90 -> zm_conv_cape.F90
  update misc_diagnostics.F90
  add zm_conv_parcel.F90
gw_project_winds
gw_heating_depth
gw_storm_speed
gw_gw_sources

[BFB]
Use std::set for faster insertion, rather than scanning
unique gids at every insertion.
Remove YAKL backend support for eamxx/rrtmgp

The rrtmgp standalone submodule will still need YAKL support due to
other parts of E3SM still using YAKL. I will investigate transitioning
these parts to Kokkos at some point.

Change Homme cmake to use known Kokkos cmake vars that indicate the
device type, don't rely on arbitrary settings for these that come from
eamxx.

[BFB]
Update cube_to_target Makefile for more flexible FC settings

As part of the v3.RRM development, an error as follows occur on chrysalis
when building with the hard-coded gfortran in the Makefile. This appears
to be due to too low chrysalis gfortran version.

gfortran: error: unrecognized command line option ‘-fallow-argument-mismatch’;
did you mean ‘-Wno-argument-mismatch’?

This PR will allow users to specify other compilers if needed and won't change
existing gfortran as the default compiler. On chrysalis, export FC=ifort built
cube_to_target and created the topography file successfully.

A side benefit is that the executable runs much faster when built with ifort than
with gfortran.

[BFB]
EAMxx: Fixing bug in external forcing

@singhbalwinder
Taufiq has added vertical emissions (external forcing) diagnostics to the code. As we are just reading files and interpolating them, the read-in values should be very close to the diagnostics output. Taufiq is seeing differences that look much more than expected.

@TaufiqHassan :
The elevated emission fluxes for BC/POM between EAMxx and prescribed emissions data are>50%. In contrast, SO2 mean difference between EAMxx and prescribed emissions data is <1%.

@TaufiqHassan performed a verification of extfrc_vert_sum_dz_weighted (see PR 7284 ).

  species         prescribed_emis     eamxx_emis
0     so2       3952.337528087315      3919.2534
1  so4_a1       164.1166022679014        81.9451
2  so4_a2      17.967874885317006       8.971543
3  pom_a4      503.64561857829256      251.47543
4   bc_a4      50.720847076634016      25.325436
5  num_a1  1.5081873428440138e+19   7.530533e+18
6  num_a2  1.2723405815977971e+20  6.3529267e+19
7  num_a4  4.2619691274178514e+20  2.1280446e+20
8    soag      2803.8477596519087      1399.9897
After bug fix:

  species         prescribed_emis eamxx_automated eamxx_extfrcInt
0     so2       3952.337528087315       3919.1548       3918.7275
1  so4_a1       164.1166022679014        162.5891       162.57228
2  so4_a2      17.967874885317006       17.966644       17.963781
3  pom_a4      503.64561857829256       431.20523       431.34055
4   bc_a4      50.720847076634016       41.872025       41.888027
5  num_a1  1.5081873428440138e+19   1.4851163e+19   1.4849639e+19
6  num_a2  1.2723405815977971e+20   1.2722535e+20   1.2720508e+20
7  num_a4  4.2619691274178514e+20   3.6416606e+20   3.6428164e+20
8    soag      2803.8477596519087       2742.7026       2742.5098
The bug was that we were using the same TracerTimeState in a loop. This time structure was modified in each iteration, producing incorrect values.

I expected the baseline tests to fail for all tests where the microphysics interface is involved. In the single-process standalone test, this mam4_aero_microphys_standalone_baseline_cmp should fail.

The test cpu-gcc / SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.eamxx-mam4xx-all_mam4xx_procs should fail with DIFFs.

In addition, I modified the forcing structure to remove an offset parameter that made the code hard to understand.

[NBFB]

* odiazib/eamxx/extfrc_bug_fix:
  EAMxx: Adding field members to the forcing structure instead of using a long array of views.
  EAMxx: Verification of extfrc_vert_sum_dz_weighted shows a discrepancy between the CMIP6 files and the interpolated values in the simulation for extfrc. The issue was that we were using the same TracerTimeState in a loop. This time structure was modified by each iteration, causing errors.
`SMS_D_Lm2.ne4pg2_oQU480.F2010-EAMxx-MAM4xx` was failing in CICE. Updated test to replace CICE with MPASSI to avoid this error.

[BFB]
Add support for coupling ne1024pg2 and RRSwISC6to18E3r5

Adds grid aliases, domain and mapping files to support the following
resolutions:
* ne1024pg2_RRSwISC6to18E3r5
* ne1024pg2_r025_RRSwISC6to18E3r5
* ne1024pg2_r0125_RRSwISC6to18E3r5
Previously the ne1024pg2 grid was only set up to work with v2 HR ocn/ice
meshes.

[BFB]
jonbob and others added 4 commits June 4, 2025 12:31
Revert spun-up ocean IC back to cold-start for SORRME3r3

Changes the ocean spun-up initial condition that was set in #6758 back
to the 'cold-start' initial condition. There was concern using the
initial condition interpolated from the v2.1 spinup was baking in warm
biases in the Southern Ocean.

Ocean initial condition file is staged in public inputdata repo,
world-readable.

[BFB] for all currently tested configurations
[non-BFB] only for B-cases with the SOwISC12to30E3r3 ocean mesh
This is again due to the lack of area consistency between atmosphere and surface
in this configuration due to eps_fraclim. This fixes a POSTRUN_SCRIPT failure on
pm-cpu and Chrysalis.
Updating to SCORPIO v1.8.0 from v1.7.0

SCORPIO v1.8.0 includes,
* SCORPIO API tracing support
* Support for deleting files after conversion from ADIOS BP to NC
* Support for C++ versions > 14.0
* Misc bug fixes
@xylar
Copy link
Author

xylar commented Jun 5, 2025

We're currently waiting on E3SM-Project#7419, which will bring in SCORPIO 1.8.0

brhillman and others added 14 commits June 5, 2025 09:08
Bugfix for inconsistent iceberg melt temperature

Currently:
* MPAS-Seaice iceberg heat flux term includes both latent heat
  associated with melting and the heat needed to raise icebergs from
  -4 degC to 0 degC.
* MPAS-Ocean assumes that iceberg mass flux entered at the local
  freezing point, which is inconsistent with the temperature adjustment
  above
With this PR:
* MPAS-Seaice iceberg heat flux term includes the latent heat associated
  with melting and the heat needed to raise icebergs from an initial
  temperature (defined in the namelist) to 0 degC
* MPAS-Ocean assumes that iceberg mass flux enters at 0 degC, consistent
  with ice runoff fluxes

Fixes: #7110

[NML]
[NCC] only for configurations with MPASSI%DIB. BFB otherwise
Bridge all routines in gw_common and gw_front

Change list:
* Add bridge and unit test for...
  - gw_prof
  - momentum_energy_conservation
  - gwd_compute_stress_profiles_and_diffusivities
  - gwd_project_tau
  - gwd_precalc_rhoi
  - gw_drag_prof
  - gw_front_project_winds
  - gw_front_gw_sources
  - gw_cm_src
* Some minor fixes for gen_boiler to be able to handle multiple arrays of different dims on same row
* Make gen_boiler.py executable so you can run indv doctests
* test-all-eamxx: Add machine.setup calls in thread functions. This was a weird issue I only see on my laptop. The copies of the object that the threads use does not get the initialized class-level data.
* Fix an issue where not enough data was being allocated in C++ for fortran arrays with dims (-foo:foo). These arrays need 2*foo + 1 amount of data, not just 2*foo.

[BFB]
Currently, rad runs on steps 1, rad_freq+1, 2*rad_freq+1, etc.,
which is not what one would expect, and makes it hard to craft
an instant output yaml file to catch rad steps (if freq>1).

This PR makes the frequency logic of rad similar to the output,
with the only exception that we always run it on the 1st step
(regardless of frequency).

[non-BFB] for EAMxx cases
Put together a style guide that covers:

Formatting

- The clang-format-based autoformatting workflow
- General description of the format standard and configuration
- Summary of the workflow
- How to install/configure/use clang-format on your development machine

Style

- General guidelines
- Specific guidelines for:
  - Types
  - Functions/Methods
  - Variables
  - Templating

[BFB]
Add cosine_solar_zenith_angle to Computed fields in rad. This allows for cosine_solar_zenith_angle to be output as a diagnostic and used for offline rad calculations. In the future, this could probably be added as a diagnostic only-field to not pollute the field manager, but being that it is already computed within the rad interface to be used by rrtmgp, adding it as a computed field should not have much impact on memory footprint.

[BFB]
[EAMxx]
[rrtmgpxx]
Updates Chicoma machine files to enable intel compiler

This PR updates the cmake macros for intel_chicoma-cpu to enable
functionality of the intel compiler on the LANL chicoma machine. I have
tested this with the V3 HR WCYCL test and it runs successfully.

[BFB]
land mesh is culled, usually; and it is created either from atm or
from river mesh, meshes that are full
use their global size to write/read the restart data associated to land
Updating to SCORPIO v1.8.0 from v1.7.0

SCORPIO v1.8.0 includes,
* SCORPIO API tracing support
* Support for deleting files after conversion from ADIOS BP to NC
* Support for C++ versions > 14.0
* Misc bug fixes

[BFB]
@philipwjones philipwjones removed their request for review June 10, 2025 15:26
@philipwjones
Copy link

Removed myself as reviewer since I'll be out until the 25th. Once the Scorpio merge has happened, @amametjanov or someone else should be able to handle the merge.

@xylar
Copy link
Author

xylar commented Jun 10, 2025

That merge reportedly just happened. I can retest tomorrow.

rljacob and others added 4 commits June 10, 2025 12:03
Some changes to driver-moab.

Add changes to map migration to support reading in all maps. Use iMOAB_MigrateMapMesh
instead of iMOAB_ComputeCoverageMesh, modernizing and simplifying the mesh mapping workflow.

Add more mesh output under MOABDEBUG.

Aream values are used for correction factors. Aream (area used by mapping codes) are either
computed by tempestremap in compute maps workflow, or read from mapping files in read map workflows.
Aream are set for both source and target meshes, and the last aream computation or reading wins.
For atm and rof components, aream on coupler side are also copied during component coupler init,
from area fields, before any map initialization. This is to avoid cases in which atm or rof do not
participate in any maps, so aream might end up uninitialized.
Also, report min/max for correction factors computed for moab driver, in the same way mct driver reports them

Several corrections were made for fractions computations; When set from mct fractions, global grid
ids from mct are needed too, and they were missing in some data models from regular place. (domain structure in component type)

Also add new coupling features that were only in the mct driver.

Changes related to sediflag for mosart were missing in moab import
Changes related to gustiness were missing in moab driver for land, ice and flux calculations.
Several changes were triggered by glc coupling with ocean in the new paradigm; default driver
input is still needed, and build configuration changes are necessary for the code to be setup, compiled and run

[BFB] for mct coupler
Coupler: Adjust tolerance in nlmaps test.

This is again due to the lack of area consistency between atmosphere and surface
in this configuration due to eps_fraclim. This fixes a POSTRUN_SCRIPT failure on
pm-cpu and Chrysalis.

[BFB]
Fix a couple sources of uninitialized memory errors in cam/gw

vdiff_lu_solver had a couple places where uninitialized memory was
being used. We just need to be sure to initialize arrays when they are
created and this problem goes away.

gw_front::gw_cm_src was trickier, the second spread call was producing
an array that had inconsistent size in dim2 (-ngwv:ngwv) compared to c
and cref which are (-pgwv:pgwv).

[BFB]
This pull request expands the list of variables in the ELM output, and most changes were taken from CLM5.
This adds aerodynamic resistances, friction velocity, and canopy air properties to the ELM output (temperature, humidity, wind speed) to the history file.
These are helpful for assessing the model representation of turbulent fluxes. All new variables are set as "inactive" by default.

The pull request has two very minor additional edits:

Also borrowed from CLM5, this replaces the indices 1 and 2 for the resistances with local variables "above_canopy" and "below_canopy".
This change is simply to improve code readability.

Revised the comments and variable output names for variable obu.
I simply replaced "Monin-Obukhov length" with "Obukhov length", which is the correct name of the variable according to the American Meteorological Society.
[BFB]
@xylar xylar assigned xylar and unassigned philipwjones Jun 12, 2025
@xylar
Copy link
Author

xylar commented Jun 12, 2025

Testing

A test merge of develop with today's E3SM master on Aurora worked great along with my changes in E3SM-Project/polaris#318.

I was able to build the code and run CTests without error.

@xylar xylar merged commit 1fa63b9 into E3SM-Project:develop Jun 12, 2025
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.