-
Notifications
You must be signed in to change notification settings - Fork 449
Update testing for configurations with active Greenland ice sheet #7892
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Update testing for configurations with active Greenland ice sheet #7892
Conversation
Swap out low-res 20km mesh with var. res. 4-to-40km mesh; generalize shell commands and namelist file changes (not specific to a single mesh resolution); exclude regional stats generation for MALI (which prohibits comparison between multiple MALI hist. files); add ERS test for BG config.
Add reg exp matching to cime archiving script to pull in wider range of MPAS hist files in testing; Correct pe layouts for Chrys and PM testing.
Updated pes support for testing on Chrys and PM-cpu; reduce no. of days for ERS tests; remove no longer needed test mod support (due to default timestepping setting changes); rename test mod dir names; update default pe layouts for IG cases
matthewhoffman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stephenprice , thanks for tackling this longstanding cleanup work. I have a handful of questions/comments. Some are in files I'm not super familiar with, so in some cases I'm guessing a little bit. I've reviewed by inspection but haven't tried running any tests with the branch.
| <grid name="a%ne30np4.pg2_l%.+_g%mpas.gis20km"> | ||
| <mach name="chrysalis"> | ||
| <pes compset=".*EAM.+ELM.+MPASSI.+MPASO.+MOSART.+MALI.+SWAV.+" pesize="any"> | ||
| <comment>GIS 20km (low-res) testing config</comment> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can/should this comment be updated/generalized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. This section has been removed as part of draft PR 7898. My thought was that the 4-to-40km default layouts would completely replace / supersede these defaults.
| </mach> | ||
| <mach name="pm-cpu|muller-cpu|alvarez-cpu"> | ||
| <pes compset=".*EAM.+ELM.+MPASSI.+MPASO.+MOSART.+MALI.+SWAV.+" pesize="any"> | ||
| <comment>pm-cpu: GIS 20 or 40km (low-res) testing config, 2 nodes, 128x1</comment> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update this comment too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per above comment (removed as part of draft PR 7898).
| <ntasks_rof>512</ntasks_rof> | ||
| <ntasks_ice>512</ntasks_ice> | ||
| <ntasks_ocn>512</ntasks_ocn> | ||
| <ntasks_glc>256</ntasks_glc> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The gis4to40 mesh should run without issue on 512 tasks. That would be using 156 cells per task, which is less than we normally use, but isn't so small that it will cause problems. There probably isn't a big cost difference to the overall test, but it seems like we might as well use all the cpus if they are going to be idle otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Oh, I see - this value was inherited from the 20km mesh, which might not have been able to run on 512 procs.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are probably right -- there is no reason glc could not be using 512 pes as well since we are running the components in serial and already requesting that many for the other components. @jonbob -- if this makes sense I can change this to 512.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been updated.
| <ntasks_rof>512</ntasks_rof> | ||
| <ntasks_ice>512</ntasks_ice> | ||
| <ntasks_ocn>512</ntasks_ocn> | ||
| <ntasks_glc>256</ntasks_glc> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also bump this to 512.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been updated.
| <hist_file_ext_regex>\w+</hist_file_ext_regex> | ||
| <hist_file_ext_regex>\w+\.\w+</hist_file_ext_regex> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not an expert in regex or familiar with how this file works, but I wonder if there is a more specific search we can employ or if both versions are necessary. Maybe not. @jonbob , do you have experience with this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I was told a while back was that this should work but that there is some CIME bug preventing it from fully working. For the moment, it at least works to generate hist file baselines for MALI, which can then be later compared against newly generated baselines. It still does not work for including MALI hist files in a restart comparison unfortunately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stephenprice -- it doesn't include MALI hist files in an ERS test? Or not MALI restart files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few diff. issues here. First, those lines of code are supposed to allow both mali .hist and .am files to be compared against a set of baseline files. As discussed here though (in section titled "Notes from Jason Boutte..."). In my experimenting however, this does not work and only the .am files get compared. This is why I've set the user_nl_mali to exclude .am files from outputs for these tests. But I figure it does not hurt to include that functionality for if / when this problem gets attention and gets fixed (currently, we would at least be comparing newly generated mali .hist files to baselines during SMS testing).
It's possible that this is by design, but from what I can tell, in my ERS tests nothing gets compared aside from coupler history files. It would be nice if mpas component hist files were also compared. I think this is considered not necessary because the mpas components generate and pass fields that populate the cpl hist files. But right now, we pass very little to the cpl, and for short runs, most of those fields are going to contain all zeroes, which does not feel like a very rigorous restart test (for mali at least). It's unclear to me why hist files from other components, including mpas components, are not included in the comparison for ERS tests. Should they be?
| <ntasks_ice>128</ntasks_ice> | ||
| <ntasks_ocn>128</ntasks_ocn> | ||
| <ntasks_glc>128</ntasks_glc> | ||
| <ntasks_glc>960</ntasks_glc> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to leave a lot of processors idle for all the other components. Also, this is not an even multiple of 128. What's the purpose here? I can't figure out what this logic is meant to match given this file lives in the elm component. I'm still fuzzy on why some config files live in some components, but if this is the pe-layout for B-cases with Greenland, why wouldn't it go in the main cime_config as opposed to the elm-specific version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just supposed to support an IG test that uses the 1-to-10km init. cond. I can certainly add more PEs or change the count, but I didn't want to go so low as to risk running into memory issues. When I first added these test cases, the guidance was that if the test is not fully coupled, it should not go under ./cime_config but under that subdir for the respective leading component. I suppose we could move this to the 'mpas-albany-landice' component subdir instead of under 'elm'.
| <ntasks_rof>64</ntasks_rof> | ||
| <ntasks_ice>64</ntasks_ice> | ||
| <ntasks_ocn>64</ntasks_ocn> | ||
| <ntasks_glc>960</ntasks_glc> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as previous comment about danger of having one component with so many more tasks than any other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See response to above comment.
| <grid name="a%ne30np4.pg2_l%.+_g%mpas.gis1to10kmR2"> | ||
| <mach name="pm-cpu|muller-cpu|alvarez-cpu"> | ||
| <pes compset=".*EAM.+ELM.+MPASSI.+MPASO.+MOSART.+MALI.+SWAV.+" pesize="M"> | ||
| <comment>pm-cpu: GIS 1-to-10km (high-res) baseline config</comment> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this comment (and other ones below) be generalized to indicate it's a B-case that includes GIS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. @jonbob -- is there any standard formatting that should be followed for the comment section of these entries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There really isn't any standard formatting, just whatever information seems useful. Sometimes it's expected performance and sometimes just a brief description
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been updated for both chrys and pm-cpu layouts.
| config_am_globalstats_enable = .false. | ||
| config_am_regionalstats_enable = .false. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean to rename the gis20km testmod to gis but also add a testmod called landiceIG? It looks like the gis is not used in any tests (and is largely redundant). Can it be deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant to remove the 'gis*' subdir altogether since it is redundant w/ the new 'landiceIG' subdir now. I'll see what happened there and remedy that. As we discussed previously, the 'landiceIG' name is just a placeholder for the time being, e.g. if we decide that should be changed during this discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been fixed (redundant 'gis' dir has been removed).
…tions Remove redundant 'gis' testmods dir under MALI component subdir; change pe layouts for testing to use 512 vs 256 pes; update description of default pe layout for IG case tests; update 'extra snowlayers' test to use 4-to-40km Greenland grid rather than no-longer-supported 20km grid
Update existing testing support for IG and BG cases that uses 20km, uniform resolution Greenland mesh, to use newer variable resolution 4-to-40km mesh. All existing and new landice developer tests have been confirmed to pass. Changes to mali section of ./cime_config/config_archive.xml should now allow mali hist files to be compared against a baseline (prev. support used a hist.am file for this). A follow-up PR will fully remove support for Greenland 20km mesh.