Skip to content

Conversation

@stephenprice
Copy link
Contributor

Update existing testing support for IG and BG cases that uses 20km, uniform resolution Greenland mesh, to use newer variable resolution 4-to-40km mesh. All existing and new landice developer tests have been confirmed to pass. Changes to mali section of ./cime_config/config_archive.xml should now allow mali hist files to be compared against a baseline (prev. support used a hist.am file for this). A follow-up PR will fully remove support for Greenland 20km mesh.

Swap out low-res 20km mesh with var. res. 4-to-40km mesh; generalize
shell commands and namelist file changes (not specific to a single mesh
resolution); exclude regional stats generation for MALI (which prohibits
comparison between multiple MALI hist. files); add ERS test for BG
config.
Add reg exp matching to cime archiving script to pull in wider range of
MPAS hist files in testing; Correct pe layouts for Chrys and PM testing.
Updated pes support for testing on Chrys and PM-cpu; reduce no. of days
for ERS tests; remove no longer needed test mod support (due to
default timestepping setting changes); rename test mod dir names;
update default pe layouts for IG cases
@stephenprice stephenprice added MPAS-albany-landice Concerning the MPAS-Albany land ice model Testing Anything related to unit/system tests Chrysalis pm-cpu Perlmutter at NERSC (CPU-only nodes) labels Nov 19, 2025
Copy link
Contributor

@matthewhoffman matthewhoffman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephenprice , thanks for tackling this longstanding cleanup work. I have a handful of questions/comments. Some are in files I'm not super familiar with, so in some cases I'm guessing a little bit. I've reviewed by inspection but haven't tried running any tests with the branch.

<grid name="a%ne30np4.pg2_l%.+_g%mpas.gis20km">
<mach name="chrysalis">
<pes compset=".*EAM.+ELM.+MPASSI.+MPASO.+MOSART.+MALI.+SWAV.+" pesize="any">
<comment>GIS 20km (low-res) testing config</comment>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can/should this comment be updated/generalized?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This section has been removed as part of draft PR 7898. My thought was that the 4-to-40km default layouts would completely replace / supersede these defaults.

</mach>
<mach name="pm-cpu|muller-cpu|alvarez-cpu">
<pes compset=".*EAM.+ELM.+MPASSI.+MPASO.+MOSART.+MALI.+SWAV.+" pesize="any">
<comment>pm-cpu: GIS 20 or 40km (low-res) testing config, 2 nodes, 128x1</comment>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update this comment too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per above comment (removed as part of draft PR 7898).

<ntasks_rof>512</ntasks_rof>
<ntasks_ice>512</ntasks_ice>
<ntasks_ocn>512</ntasks_ocn>
<ntasks_glc>256</ntasks_glc>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The gis4to40 mesh should run without issue on 512 tasks. That would be using 156 cells per task, which is less than we normally use, but isn't so small that it will cause problems. There probably isn't a big cost difference to the overall test, but it seems like we might as well use all the cpus if they are going to be idle otherwise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Oh, I see - this value was inherited from the 20km mesh, which might not have been able to run on 512 procs.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are probably right -- there is no reason glc could not be using 512 pes as well since we are running the components in serial and already requesting that many for the other components. @jonbob -- if this makes sense I can change this to 512.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that makes sense

Copy link
Contributor Author

@stephenprice stephenprice Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been updated.

<ntasks_rof>512</ntasks_rof>
<ntasks_ice>512</ntasks_ice>
<ntasks_ocn>512</ntasks_ocn>
<ntasks_glc>256</ntasks_glc>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also bump this to 512.

Copy link
Contributor Author

@stephenprice stephenprice Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been updated.

Comment on lines +127 to +128
<hist_file_ext_regex>\w+</hist_file_ext_regex>
<hist_file_ext_regex>\w+\.\w+</hist_file_ext_regex>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not an expert in regex or familiar with how this file works, but I wonder if there is a more specific search we can employ or if both versions are necessary. Maybe not. @jonbob , do you have experience with this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I was told a while back was that this should work but that there is some CIME bug preventing it from fully working. For the moment, it at least works to generate hist file baselines for MALI, which can then be later compared against newly generated baselines. It still does not work for including MALI hist files in a restart comparison unfortunately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephenprice -- it doesn't include MALI hist files in an ERS test? Or not MALI restart files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few diff. issues here. First, those lines of code are supposed to allow both mali .hist and .am files to be compared against a set of baseline files. As discussed here though (in section titled "Notes from Jason Boutte..."). In my experimenting however, this does not work and only the .am files get compared. This is why I've set the user_nl_mali to exclude .am files from outputs for these tests. But I figure it does not hurt to include that functionality for if / when this problem gets attention and gets fixed (currently, we would at least be comparing newly generated mali .hist files to baselines during SMS testing).

It's possible that this is by design, but from what I can tell, in my ERS tests nothing gets compared aside from coupler history files. It would be nice if mpas component hist files were also compared. I think this is considered not necessary because the mpas components generate and pass fields that populate the cpl hist files. But right now, we pass very little to the cpl, and for short runs, most of those fields are going to contain all zeroes, which does not feel like a very rigorous restart test (for mali at least). It's unclear to me why hist files from other components, including mpas components, are not included in the comparison for ERS tests. Should they be?

<ntasks_ice>128</ntasks_ice>
<ntasks_ocn>128</ntasks_ocn>
<ntasks_glc>128</ntasks_glc>
<ntasks_glc>960</ntasks_glc>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to leave a lot of processors idle for all the other components. Also, this is not an even multiple of 128. What's the purpose here? I can't figure out what this logic is meant to match given this file lives in the elm component. I'm still fuzzy on why some config files live in some components, but if this is the pe-layout for B-cases with Greenland, why wouldn't it go in the main cime_config as opposed to the elm-specific version?

Copy link
Contributor Author

@stephenprice stephenprice Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just supposed to support an IG test that uses the 1-to-10km init. cond. I can certainly add more PEs or change the count, but I didn't want to go so low as to risk running into memory issues. When I first added these test cases, the guidance was that if the test is not fully coupled, it should not go under ./cime_config but under that subdir for the respective leading component. I suppose we could move this to the 'mpas-albany-landice' component subdir instead of under 'elm'.

<ntasks_rof>64</ntasks_rof>
<ntasks_ice>64</ntasks_ice>
<ntasks_ocn>64</ntasks_ocn>
<ntasks_glc>960</ntasks_glc>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as previous comment about danger of having one component with so many more tasks than any other.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See response to above comment.

<grid name="a%ne30np4.pg2_l%.+_g%mpas.gis1to10kmR2">
<mach name="pm-cpu|muller-cpu|alvarez-cpu">
<pes compset=".*EAM.+ELM.+MPASSI.+MPASO.+MOSART.+MALI.+SWAV.+" pesize="M">
<comment>pm-cpu: GIS 1-to-10km (high-res) baseline config</comment>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this comment (and other ones below) be generalized to indicate it's a B-case that includes GIS?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. @jonbob -- is there any standard formatting that should be followed for the comment section of these entries?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There really isn't any standard formatting, just whatever information seems useful. Sometimes it's expected performance and sometimes just a brief description

Copy link
Contributor Author

@stephenprice stephenprice Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been updated for both chrys and pm-cpu layouts.

Comment on lines 1 to 2
config_am_globalstats_enable = .false.
config_am_regionalstats_enable = .false.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean to rename the gis20km testmod to gis but also add a testmod called landiceIG? It looks like the gis is not used in any tests (and is largely redundant). Can it be deleted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to remove the 'gis*' subdir altogether since it is redundant w/ the new 'landiceIG' subdir now. I'll see what happened there and remedy that. As we discussed previously, the 'landiceIG' name is just a placeholder for the time being, e.g. if we decide that should be changed during this discussion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been fixed (redundant 'gis' dir has been removed).

…tions

Remove redundant 'gis' testmods dir under MALI component subdir; change
pe layouts for testing to use 512 vs 256 pes; update description of
default pe layout for IG case tests; update 'extra snowlayers' test to
use 4-to-40km Greenland grid rather than no-longer-supported 20km grid
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Chrysalis MPAS-albany-landice Concerning the MPAS-Albany land ice model new baselines needed pm-cpu Perlmutter at NERSC (CPU-only nodes) Testing Anything related to unit/system tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants