Skip to content

[Bug]: Failures on the Weekly test run #625

@forsyth2

Description

@forsyth2

What happened?

I originally ran the 3 weekly tests (bundles, comprehensive_v2, comprehensive_v3) after merging #604/#617 on 7/31. The intent was to re-run these tests every single week, but I thought it was reasonable to only run them on weeks where pull requests (with code changes rather than say doc changes) were merged into zppy. After all, if no changes had been merged, what would be the point of running the extremely lengthy tests?

For #598, I was testing the latest sets (tc_analysis, enso_diags, streamflow [but apparently missing qbo]) added to the E3SM Diags CDAT migration (https://github.com/E3SM-Project/e3sm_diags/commits/cdat-migration-fy24), using min_case_e3sm_diags_cdat_migrated_sets. However, these three sets didn't show up on the viewer.

Upon testing on main, using weekly_comprehensive_v3, I found that tc_analysis still wasn't plotting (though enso_diags and streamflow were). I then ran all the weekly tests yielding the following results:

weekly_comprehensive_v3

Sets rendered

sub-task sets included in viewer sets that were specified but were not included
e3sm_diags > atm_monthly_180x360_aave "lat_lon","enso_diags","diurnal_cycle","streamflow" "tc_analysis","tropical_subseasonal"
e3sm_diags > atm_monthly_180x360_aave_mvm "lat_lon", N/A
e3sm_diags > lnd_monthly_mvm_lnd "lat_lon_land" N/A

Why are tc_analysis and tropical_subseasonal missing in the rendering for model-vs-obs?

$ cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test-main-20240907/v3.LR.historical_0051/post/scripts/
$ grep -in tc_analysis e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1987-1988.o596145  

gives:

IndexError: list index out of range

and

$ grep -in tropical_subseasonal e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1987-1988.o596145 

gives:

OSError: no files to open

Neither of these error messages are particularly enlightening.

Image check failures

Test failed due to image check failures:
https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v3_www/test-main-20240907/v3.LR.historical_0051/image_check_failures_comprehensive_v3/

$ cd /lcrc/group/e3sm/public_html/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v3_www/test-main-20240907/v3.LR.historical_0051/image_check_failures_comprehensive_v3
$ ls *_diff* | wc -l
136

136 differences from the expected. Appears to be 7 MERRA2-related E3SM Diags diffs, 1 global-time-series diff, and 128 MPAS-Analysis diffs.

weekly_comprehensive_v2

Sets rendered

sub-task sets included in viewer sets that were specified but were not included
e3sm_diags > atm_monthly_180x360_aave "lat_lon","diurnal_cycle","streamflow","tc_analysis" "enso_diags"
e3sm_diags > atm_monthly_180x360_aave_mvm "lat_lon", N/A
e3sm_diags > lnd_monthly_mvm_lnd "lat_lon_land" N/A

Why is enso_diags missing in the rendering for model-vs-obs?

$ cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v2_output/test-main-20240907/v2.LR.historical_0201/post/scripts/
$ grep -in enso_diags e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1852-1853.o596170

That does give:

IndexError: index 0 is out of bounds for axis 0 with size 0
RuntimeError: Requested years are outside of available sst obs records.

But didn't these years work before?? Why would they not now?

Image check failures

Test failed due to image check failures:
https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v2_www/test-main-20240907/v2.LR.historical_0201/image_check_failures_comprehensive_v2/

$ cd /lcrc/group/e3sm/public_html/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v2_www/test-main-20240907/v2.LR.historical_0201/image_check_failures_comprehensive_v2
$ ls *_diff* | wc -l
12

12 differences from the expected, all of which are MERRA2-related E3SM Diags diffs

weekly_bundles

Sets rendered

sub-task sets included in viewer sets that were specified but were not included
e3sm_diags > atm_monthly_180x360_aave "polar","enso_diags","diurnal_cycle", N/A
e3sm_diags > atm_monthly_180x360_aave_mvm "polar","enso_diags","streamflow", "tc_analysis"

Why is tc_analysis missing in the rendering for mvm?

$ cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_bundles_output/test-main-20240907/v3.LR.historical_0051/post/scripts
$ grep -in tc_analysis bundle3.o597964 

Not particularly enlightening error messages:

RuntimeError: Neither does AODMOM nor the variables in [('AODMOM',)] exist in the file
IndexError: list index out of range

Image check failures

Test failed due to image check failures:
https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zppy_weekly_bundles_www/test-main-20240907/v3.LR.historical_0051/image_check_failures_bundles/

$ cd /lcrc/group/e3sm/public_html/diagnostic_output/ac.forsyth2/zppy_weekly_bundles_www/test-main-20240907/v3.LR.historical_0051/image_check_failures_bundles
$ ls *_diff* | wc -l
1

1 difference from the expected: 1 global-time-series diff

Possible reasons for image check failures

As far as I can recall, the expected results were generated after the 7/31 merging of #604/#617. There have been no code changes merged to zppy main since then. The only thing that could be different is the E3SM Diags environment used, but even that wouldn't account for the MPAS-Analysis diffs or global-time-series diffs. And even then, I was using conda activate e3sm_diags_20240731, so the diags environment should have been identical to when the expected results were generated.

Lesson learned: always run these tests on a weekly basis (even if no code changes have been merged!), to catch environmental changes e.g., build versions, new e3sm_diags/other package changes.

What machine were you running on?

Chrysalis

Environment

zppy main as of 9/24.

What command did you run?

zppy -c tests/integration/generated/test_weekly_comprehensive_v3_chrysalis.cfg
zppy -c tests/integration/generated/test_weekly_comprehensive_v2_chrysalis.cfg
zppy -c tests/integration/generated/test_weekly_bundles_chrysalis.cfg # Run 1
zppy -c tests/integration/generated/test_weekly_bundles_chrysalis.cfg # Run 2 (for second part)

Copy your cfg file

N/A

What jobs are failing?

N/A

What stack trace are you encountering?

N/A

Metadata

Metadata

Assignees

No one assigned

    Labels

    semver: bugBug fix (will increment patch version)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions