-
Notifications
You must be signed in to change notification settings - Fork 14
Further pcmdi_diags updates #742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhangshixuan1987 This PR and the corresponding zppy-interfaces PR E3SM-Project/zppy-interfaces#41 will be for our fixups to the pcmdi_diags task before we make the next release candidate rc2 for E3SM Unified.
This self-review comments on the test results from my run labeled unique_id_20251009_2. Please take a look at the errors noted. (Also, I gave you write-access to the zppy repo, so you should be able to push commits to this branch pcmdi-updates.)
Other notes before we merge these PRs and make rc2:
- I noticed several of the 74 parameters in
default.initake boolean values despite being typed as strings. That can cause some issues (for example,my_var="False"would evaluate asTruebecause it's a non-empty string. So we should probably convert those to actual boolean types and test to make sure nothing broke. - I mentioned earlier (E3SM-Project/zppy-interfaces#25 (comment)) that the
plevdatancfile doesn't work well with the Jinja templating. I think the best option is to keep that in thediagnostics_base_pathand set that as the default path (as opposed to putting it in theinclusions/subdir where it really couldn't be used.
examples/post.v3.LR.amip.0101.cfg
Outdated
| output = /lcrc/group/e3sm/ac.forsyth2/zppy_pmp_amip/unique_id_20251007_4/v3.LR.amip_0101 | ||
| www = /lcrc/group/e3sm/public_html/diagnostic_output/ac.forsyth2/zppy_pmp_amip/unique_id_20251007_4 | ||
| output = /lcrc/group/e3sm/ac.forsyth2/zppy_pmp_amip/unique_id_20251009_2/v3.LR.amip_0101 | ||
| www = /lcrc/group/e3sm/public_html/diagnostic_output/ac.forsyth2/zppy_pmp_amip/unique_id_20251009_2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| [[ synthetic_plots ]] | ||
| active = True | ||
| # use subset variables for viewer | ||
| clim_vars = "pr,prw,psl,rlds,rldscs,ua-200,zg-500" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ua-200,zg-500 have grayed out links in the viewer.
cd /lcrc/group/e3sm/ac.forsyth2/zppy_pmp_amip/unique_id_20251009_2/v3.LR.amip_0101/post/scripts
grep -in "ua-200" *.o* | grep -i errorgives no results.
| output = /lcrc/group/e3sm/ac.forsyth2/zppy_pmp_hist/unique_id_20251007_4/v3.LR.amip_0101 | ||
| www = /lcrc/group/e3sm/public_html/diagnostic_output/ac.forsyth2/zppy_pmp_hist/unique_id_20251007_4 | ||
| output = /lcrc/group/e3sm/ac.forsyth2/zppy_pmp_hist/unique_id_20251009_2/v3.LR.amip_0101 | ||
| www = /lcrc/group/e3sm/public_html/diagnostic_output/ac.forsyth2/zppy_pmp_hist/unique_id_20251009_2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't do an inline comment in this cfg because it's not diffed, but we have:
[[ mean_climate ]]
clim_regions = "global,ocean,land,Tropics,NHEX,SHEX"
[[ synthetic_plots ]]
clim_regions = "global,ocean,land"
That explains the grayed out links for "ocean" and "land" (per your comment at E3SM-Project/zppy-interfaces#25 (comment)).
I'm not sure why even Global has some grayed out links in the Pattern Coor. and RMSE columns though.
I'm also curious why "Tropics,NHEX,SHEX" don't appear anywhere in the viewer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I can't do an inline comment in this cfg because it's not diffed, but we have:
[[ mean_climate ]]
clim_vars = "pr,prw,psl,rlds,rldscs,rltcre,rstcre,rsus,rsuscs,rlus,rlut,rlutcs,rsds,rsdscs,rsdt,rsut,rsutcs,rtmt,sfcWind,tas,tauu,tauv,ts,ta-200,ta-850,ua-200,ua-850,va-200,va-850,zg-500"
[[ synthetic_plots ]]
clim_vars = "pr,prw,psl,rlds,rldscs,ua-200,zg-500"
and ua-200,zg-500 have grayed out links in the viewer.
cd /lcrc/group/e3sm/ac.forsyth2/zppy_pmp_hist/unique_id_20251009_2/v3.LR.amip_0101/post/scripts
grep -in "ua-200" *.o* | grep -i errorgives:
pcmdi_diags_synthetic_plots_model_vs_obs.o962636:465:2025-10-09 15:33:38,286 [ERROR]: synthetic_metrics_plotter.py(mean_climate_plot_driver:430) >> KeyError on var_names=['pr', 'prw', 'psl', 'rlds', 'ua-200', 'zg-500']
pcmdi_diags_synthetic_plots_model_vs_obs.o962636:466:2025-10-09 15:33:38,286 [ERROR]: synthetic_metrics_plotter.py(generate:170) >> Failed to handle metric=mean_climate: "['ua-200', 'zg-500'] not in index"
pcmdi_diags_synthetic_plots_model_vs_obs.o962636:494:KeyError: "['ua-200', 'zg-500'] not in index"
| partition = "debug" | ||
| qos = "regular" | ||
| www = "/lcrc/group/e3sm/public_html/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v3_www/unique_id_test_20251007_4" | ||
| www = "/lcrc/group/e3sm/public_html/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v3_www/unique_id_test_20251009_2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| input_subdir = archive/atm/hist | ||
| mapping_file = "map_ne30pg2_to_cmip6_180x360_aave.20200201.nc" | ||
| output = "/lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/unique_id_test_20251007_4/v3.LR.historical_0051" | ||
| output = "/lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/unique_id_test_20251009_2/v3.LR.historical_0051" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/unique_id_test_20251009_2/v3.LR.historical_0051/post/scripts
LR.historical_0051/post/scripts
grep -v "OK" *statusgives:
pcmdi_diags_mean_climate_model_vs_obs_1985-1994.status:ERROR (11)
pcmdi_diags_synthetic_plots_model_vs_obs.status:ERROR (11)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
emacs pcmdi_diags_mean_climate_model_vs_obs_1985-1994.o962625
# FileNotFoundError: No file found for source variable 'rlutcs' in climo
emacs pcmdi_diags_synthetic_plots_model_vs_obs.o962628
# FileNotFoundError: No synthetic mean climate metrics found for model: v3.LR.historical_0051
# RuntimeError: No synthetic metrics plots could be generated.
grep -n "Error:" *.o*gives:
pcmdi_diags_mean_climate_model_vs_obs_1985-1994.o962625:19015:ValueError: ds_out is None
pcmdi_diags_mean_climate_model_vs_obs_1985-1994.o962625:25088:FileNotFoundError: No file found for source variable 'rlutcs' in climo
pcmdi_diags_synthetic_plots_model_vs_obs.o962628:396:FileNotFoundError: No synthetic mean climate metrics found for model: v3.LR.historical_0051
pcmdi_diags_synthetic_plots_model_vs_obs.o962628:415:FileNotFoundError: No Synthetic MoVs Metrics Data For v3.LR.historical_0051, Aborting.
pcmdi_diags_synthetic_plots_model_vs_obs.o962628:425:RuntimeError: No synthetic metrics plots could be generated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@forsyth2: Hi Ryan, again the issues here arise from the problems in "e3sm_to_cmip" run, as I did not see any "ua", "va", "ta" and "zg" that were processed by it in your directory:
- /lcrc/group/e3sm/ac.forsyth2/zppy_pmp_amip/unique_id_20251009_2/v3.LR.amip_0101/post/atm/180x360_aave/cmip_ts/monthly/
and in "/lcrc/group/e3sm/ac.forsyth2/zppy_pmp_amip/unique_id_20251009_2/v3.LR.amip_0101/post/scripts/e3sm_to_cmip_atm_monthly_180x360_aave_2005-2009-0005.o962639":
`2025-10-09 19:55:02,228_228:INFO:_run_parallel:35 of 39 handlers complete
2025-10-09 19:55:02,228 [ERROR]: main.py(_run_parallel:951) >> ta, va, zg, ua failed to complete
2025-10-09 19:55:02,228 [ERROR]: main.py(_run_parallel:951) >> ta, va, zg, ua failed to complete
2025-10-09 19:55:02,228_228:ERROR:_run_parallel:ta, va, zg, ua failed to complete
2025-10-09 19:55:02,228 [ERROR]: main.py(_run_parallel:952) >> 35 of 39 handlers complete
2025-10-09 19:55:02,228 [ERROR]: main.py(_run_parallel:952) >> 35 of 39 handlers complete
2025-10-09 19:55:02,228_228:ERROR:_run_parallel:35 of 39 handlers complete
"No variable named 'plev'. Variables on the dataset include ['lat', 'lon', 'lat_bnds', 'lon_bnds', 'gw', ..., 'hybm', 'ilev', 'lev', 'time', 'time_bnds']"
"No variable named 'plev'. Variables on the dataset include ['lat', 'lon', 'lat_bnds', 'lon_bnds', 'gw', ..., 'hybm', 'ilev', 'lev', 'time', 'time_bnds']"
"No variable named 'plev'. Variables on the dataset include ['lat', 'lon', 'lat_bnds', 'lon_bnds', 'gw', ..., 'hybm', 'ilev', 'lev', 'time', 'time_bnds']"
"No variable named 'plev'. Variables on the dataset include ['lat', 'lon', 'lat_bnds', 'lon_bnds', 'gw', ..., 'hybm', 'ilev', 'lev', 'time', 'time_bnds']"
- '[' 0 '!=' 0 ']'
Apparently, thee3sm_to_cmipfailed to complete the requested jobs, which is still an issue caused by the problems reported in above log files:ERROR: Unable to find specified vertical output grid file /lcrc/group/e3sm/ac.szhang/acme_scratch/e3sm_project/zppy/zppy/templates/inclusions/e3sm_to_cmip/vrt_remap_plev19.nc`
Therefore, the problem is not in zppy-pcmdi itself since e3sm_to_cmip runs as an independent module. In our case, e3sm_to_cmip reported errors, yet the status file still shows OK:
- /lcrc/group/e3sm/ac.forsyth2/zppy_pmp_amip/unique_id_20251009_2/v3.LR.amip_0101/post/scripts/e3sm_to_cmip_atm_monthly_180x360_aave_2005-2009-0005.status
Because that status remained OK, the zppy-pcmdi jobs that depend on e3sm_to_cmip proceeded using incomplete data. We should make the wrapper mark the task as ERROR (non-zero exit) when e3sm_to_cmip fails or produces incomplete outputs, and add a post-run validation step so downstream tasks won’t start on partial files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because that status remained OK, the
zppy-pcmdijobs that depend one3sm_to_cmipproceeded using incomplete data. We should make the wrapper mark the task as ERROR (non-zero exit) when e3sm_to_cmip fails or produces incomplete outputs, and add a post-run validation step so downstream tasks won’t start on partial files.
This e3sm_to_cmip PR is to address the exit code problem.E3SM-Project/e3sm_to_cmip#272 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ERROR: Unable to find specified vertical output grid file /lcrc/group/e3sm/ac.szhang/acme_scratch/e3sm_project/zppy/zppy/templates/inclusions/e3sm_to_cmip/vrt_remap_plev19.nc`
Ok yes, so this is exactly because the nc file was missing. That should be resolved by the change described in #742 (comment)
We should make the wrapper mark the task as ERROR (non-zero exit) when e3sm_to_cmip fails
We have other error exit codes in that file. I'll add one to check that that task completes successfully.
@forsyth2: I would like to request an online meeting discussion on these, as I can not get a full picture from the comments you raised above. |
There is one verison for vertical grid file here: https://web.lcrc.anl.gov/public/e3sm/diagnostics/e3sm_to_cmip_data/maps/vrt_remap_plev19.nc this is used by Tony for cmorization data for CMIP publicaiton, and can be accessed from the |
Sure, I can schedule something for tomorrow. I want to see if I can get those
@zhangshixuan1987 Do you have the original file? The |
@forsyth2 @chengzhuzhang : This is exactly the file I used for all my |
Thank you @forsyth2: I have two meetings from 9:30AM to 12PM PDT, while I am flexible for any other time. |
I might be able to explain these sufficiently here:
if c["component"] == "atm":
default_cmip_plevdata = f"{c['diagnostics_base_path']}/vrt_remap_plev19.nc"
set_value_of_parameter_if_undefined(
c,
"cmip_plevdata",
default_cmip_plevdata,
ParameterInferenceType.PATH_INFERENCE,
)
So based on this, we really just need to change the line above to be: (Note that Then, we won't need to explicitly specify this parameter. |
This is a great idea for me. |
Hi @forsyth2 : I think that I understand what you are referring to. I was trying to be more flexible rather than strict on these, i.e. I was trying to allow "yes", "y", "True", "true" etc. to be all interpreted as "True", while allowing "no", "n", "flase", "False" to be all interpreted as "False" in the pcmdi-interface module. However, as you suggested, it could be better to use a strict way to avoid any potential issues. I will do some modifications and will let you review again for those flags. |
That's fine if that's behavior we want to support. We just then have to be careful on our end to actually evaluate the parameters as booleans rather than strings. That is, we have to avoid something like evaluating |
|
@zhangshixuan1987 I'm running a new test with a working |
@forsyth2 : please let me know if new problems raised, and I can try to further adjust the code. |
|
@zhangshixuan1987 Ok my latest commit has the example cfgs' viewers looking good, but I just can't get the test case cfg to work. I don't know if I'm specifying incorrect parameters or there isn't data available or something else. The viewers:
cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/unique_id_test_20251009_3/v3.LR.historical_0051/post/scripts
grep -n "Error:" *.o*gives: # Let's rerun just an edited version of the test cfg
# Remove rlutcs
# Edit tests/integration/utils.py
# "unique_id": "unique_id_test_20251009_3a",
python tests/integration/utils.py
git add -A
pre-commit run --all-files
python -m pip install .
zppy -c tests/integration/generated/test_weekly_comprehensive_v3_chrysalis.cfg
cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/unique_id_test_20251009_3a/v3.LR.historical_0051/post/scripts
grep -v "OK" *status
# pcmdi_diags_mean_climate_model_vs_obs_1985-1994.status:ERROR (11)
# pcmdi_diags_synthetic_plots_model_vs_obs.status:ERROR (11)
grep -n "Error:" *.o*
# pcmdi_diags_mean_climate_model_vs_obs_1985-1994.o963441:17067:ValueError: ds_out is None
# pcmdi_diags_mean_climate_model_vs_obs_1985-1994.o963441:23140:FileNotFoundError: No file found for source variable 'rlutcs' in climo
# pcmdi_diags_synthetic_plots_model_vs_obs.o963444:396:FileNotFoundError: No synthetic mean climate metrics found for model: v3.LR.historical_0051
# pcmdi_diags_synthetic_plots_model_vs_obs.o963444:415:FileNotFoundError: No Synthetic MoVs Metrics Data For v3.LR.historical_0051, Aborting.
# pcmdi_diags_synthetic_plots_model_vs_obs.o963444:425:RuntimeError: No synthetic metrics plots could be generated.
cd ~/ez/zppy
git grep -n rlutcs
# examples/post.v3.LR.amip.0101.cfg:71: clim_vars = "pr,prw,psl,rlds,rldscs,rltcre,rstcre,rsus,rsuscs,rlus,rlut,rlutcs,rsds,rsdscs,rsdt,rsut,rsutcs,rtmt,sfcWind,tas,tauu,tauv,ts,ta-200,ta-850,ua-200,ua-850,va-200,va-850,zg-500"
# examples/post.v3.LR.historical.0101.cfg:64: clim_vars = "pr,prw,psl,rlds,rldscs,rltcre,rstcre,rsus,rsuscs,rlus,rlut,rlutcs,rsds,rsdscs,rsdt,rsut,rsutcs,rtmt,sfcWind,tas,tauu,tauv,ts,ta-200,ta-850,ua-200,ua-850,va-200,va-850,zg-500"
# tests/integration/generated/test_weekly_comprehensive_v3_chrysalis.cfg:104: # Remove rlutcs
# tests/integration/generated/test_weekly_comprehensive_v3_chrysalis.cfg:338: # Remove rlutcs
# tests/integration/template_weekly_comprehensive_v3.cfg:104: # Remove rlutcs
# tests/integration/template_weekly_comprehensive_v3.cfg:338: # Remove rlutcs
# zppy/defaults/default.ini:466:cmip_vars = string(default="pr,prw,psl,rlds,rldscs,rlut,rlutcs,rsut,rsutcs,rsds,rsdscs,rsdt,rsus,rsuscs,rlus,rtmt,sfcWind,tas,tauu,tauv,ts,ta,ua,va,zg")
# zppy/defaults/default.ini:483:# mean_climate AND synthetic_plots: clim_vars = "pr,prw,psl,rlds,rldscs,rltcre,rstcre,rlut,rlutcs,rsds,rsdscs,rsdt,rsus,rsuscs,rlus,rsut,rtmt,sfcWind,tas,tauu,tauv,ts,ta-200,ta-850,ua-200,ua-850,va-200,va-850,zg-500"
# zppy/defaults/default.ini:484:clim_vars = string(default="pr,prw,psl,rlds,rldscs,rltcre,rstcre,rlut,rlutcs,rsds,rsdscs,rsdt,rsus,rsuscs,rlus,rsut,rtmt,sfcWind,tas,tauu,tauv,ts,ta-200,ta-850,ua-200,ua-850,va-200,va-850,zg-500")
# zppy/e3sm_to_cmip.py:67: "ua, va, ta, wa, zg, hur, tas, ts, psl, ps, sfcWind, huss, pr, prc, prsn, evspsbl, tauu, tauv, hfls, clt, rlds, rlus, rsds, rsus, hfss, clivi, clwvi, prw, rldscs, rlut, rlutcs, rsdt, rsuscs, rsut, rsutcs, rtmt, abs550aer, od550aer, rsdscs, tasmax, tasmin"
# zppy/templates/inclusions/pcmdi_diags/reference_alias.json:103: "rlutcs" : { |
|
@forsyth2 : Thank you for sharing, the results from historical example viewer and amip example viewer look good to me, and the greyed out figures in historical example are what I explained before: the pre-generated CMIP6 metrics data does not have valid metrics data for Ocean and Land. Therefore, the parallel coordinate plots can not be constructed. I am confused by the test case -- no viewer generated, what is this test for? I did a test on my side using the
I have pushed the fixes I described above to the commit 58942229617. Please take a look and let me know if you have further questions. |
@zhangshixuan1987 That's the key cfg that tests every single
Will do, thanks! |
@zhangshixuan1987 No need to worry about this. I reviewed the parameters and I think all will work just fine as strings. My review
cd ~/ez/zppy
git grep -n "string(default=True)" zppy/defaults/default.ini
# zppy/defaults/default.ini:421:generate_sftlf = string(default=True)
# zppy/defaults/default.ini:424:mov_plot_obs = string(default=True)
# zppy/defaults/default.ini:425:mov_plot_model = string(default=True)
# zppy/defaults/default.ini:426:mov_nc_out_obs = string(default=True)
# zppy/defaults/default.ini:427:mov_nc_out_model = string(default=True)
# zppy/defaults/default.ini:437:save_test_clims = string(default=True)
# zppy/defaults/default.ini:496:CBF = string(default=True)
# zppy/defaults/default.ini:497:ConvEOF = string(default=True)
# zppy/defaults/default.ini:507:RmDomainMean = string(default=True)
# zppy/defaults/default.ini:547:save_all_data = string(default=True)
git grep -n "string(default=False)" zppy/defaults/default.ini
# zppy/defaults/default.ini:435:pcmdi_debug = string(default=False)
# zppy/defaults/default.ini:499:EofScaling = string(default=False)
# zppy/defaults/default.ini:502:landmask = string(default=False)Using These parameters are only used as parameters to call These parameters are only used as parameters to |
380328e to
afbddca
Compare
forsyth2
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have visually inspected these changes and confirmed the output is expected. The test case produces this viewer
|
This PR and its corresponding @chengzhuzhang Please let me know if you would like to review/test before merging. Otherwise we can do testing as part of the Unified testing period. @zhangshixuan1987 Please approve both PRs if there are no remaining issues on your end. Thank you for the effort on this very large new |
Thank you! @forsyth2. I think that currently, the workflow works as expected, which achieves the goal of this pull request. However, I am working on optimizing the synthetic plotting parts and will push them as soon as possible for your review and consideration if the timing is still allowed. |
Depending on 1) how large/complex the change is and 2) how quickly @xylar & @chengzhuzhang would like a
If you're pushing changes, please push to this new branch instead of the original branch for #732. It was a bit of a process to cherry-pick your earlier commit on to this branch. You should have write access now to push to |
@zhangshixuan1987 I talked to @chengzhuzhang and we determined it would be best to merge what we have and do this optimization work as a distinct pull request. |
This sounds good to me. I think there will be no truly “final” code for this part anyway, as we’ll likely refine and optimize the synthetic plotting continuously. It makes sense to merge the current version first and handle further improvements in a separate PR. |
|
I did not find an approve button, and I may have clicked the wrong one @forsyth2 @chengzhuzhang. Apologize. |
@zhangshixuan1987 Ok, thanks for the context. In this case, what will likely happen is active development will temporarily stop on
This is fine. I would have preferred to squash these many small commits by merging with "Squash & Merge" rather than a merge commit, but it is not a major issue. I also would have preferred @chengzhuzhang to sign off before merging, but in this case, I will go ahead and merge the corresponding |
@chengzhuzhang I made #745 to safeguard against an accidental merge like this, but the three options all have tradeoffs associated with them. |
I think that we are on the same page. The optimization I mentioned above is just the improvement of the existing code rather than adding new features. Hopefully that is considered as "testing and ironing out any issues in existing code".
Thank you for understanding. I will pay more attention next time. |
Summary
Objectives:
zppy-interfacesPR: Further pcmdi_diags updates zppy-interfaces#41Select one: This pull request is...
Small Change