-
Notifications
You must be signed in to change notification settings - Fork 2
Performance improvement attempt #19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Only create multi-figure PDFs for 'original' component - For non-original components, create individual PNGs for each plot - This removes the bottleneck of creating large multi-page PDFs for components that are primarily used for individual image display in the viewer - Reduce DPI from 150 to 100 for faster rendering and smaller files - Improve code organization and readability
This significant optimization restructures the data loading pipeline to load NetCDF files only once for all regions, rather than multiple times (once per region). Since each NetCDF file already contains data for all regions (global, northern hemisphere, southern hemisphere), this optimization: 1. Reduces I/O operations dramatically - files are read just once 2. Improves memory efficiency - data is loaded once and reused 3. Decreases processing time - particularly for multi-region analyses Implementation: - Added new load_all_region_data function to load data for all regions at once - Created extract_region_data function to extract specific region data - Restructured the data flow in coupled_global to process regions after data load - Maintains backward compatibility with existing API - Better error handling and reporting for problematic variables This is a significant performance enhancement for use cases that process multiple regions from the same dataset.
forsyth2
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @chengzhuzhang. Visual review looks mostly ok. This is a significant refactor, so I want to test the integration tests + the min cases from E3SM-Project/zppy#697 (comment) + the example cfg in E3SM-Project/zppy#694.
| # For non-original components: Create individual PNGs (one plot per file) | ||
| else: | ||
| # Process each plot individually | ||
| for i, plot_name in enumerate(plot_list): | ||
| # Create single-plot figure | ||
| fig = plt.figure(figsize=[13.5 / 2, 16.5 / 4]) | ||
| ax = fig.add_subplot(111) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is different than the design spec as described in E3SM-Project/zppy#697. In the current code, users can specify make_viewer or not. If they don't, then they can specify nrows and ncols to make custom m x n plot PDFs. The change in this PR appears to dictate that for the user. Original = multi-plot PDF, component = individual plot.
forsyth2
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chengzhuzhang I ran the min-case cfg files from E3SM-Project/zppy#697, and a reduced version of the example cfg in E3SM-Project/zppy#694. The latter is still running (so far, for 2h12m). The results of the former I have summarized below.
Testing steps
cd ~/ez/zppy-interfaces
git status # Make sure no changes will persist
git fetch upstream
git checkout -b load-data-once-for-all-regions upstream/load-data-once-for-all-regions
git rebase upstream/main # Make sure these commits are off latest `main`
git log
conda clean --all --y
conda env create -f conda/dev.yml -n zi_pr19_20250402
conda activate zi_pr19_20250402 && pip install . # Make sure to `pip install` only after activating the correct env!
pytest tests/unit/global_time_series/test_*.py
# 10 passed in 24.25s
cd ~/ez/zppy
git status # Make sure no changes will persist
git fetch upstream main
git checkout -b test_zppy_zi_pr19_20250402 upstream/main
git log
conda clean --all --y
conda env create -f conda/dev.yml -n zppy_dev_zi_pr19_20250402
conda activate zppy_dev_zi_pr19_20250402 && pip install . # Make sure to `pip install` only after activating the correct env!
pytest tests/test_*.py
# 25 passed in 0.27s
# Edit tests/integration/utils.py
# UNIQUE_ID = "test_zppy_zi_pr19_20250402"
# "diags_environment_commands": "",
# "environment_commands_test": "source /lcrc/soft/climate/e3sm-unified/test_e3sm_unified_1.11.0rc13_chrysalis.sh",
# "global_time_series_environment_commands": "source /gpfs/fs1/home/ac.forsyth2/miniforge3/etc/profile.d/conda.sh; conda activate zi_pr19_20250402",
# Set: generate_cfgs(unified_testing=True, dry_run=False)
# Keep:
# "user_input_v2": "/lcrc/group/e3sm/ac.forsyth2/",
# "user_input_v3": "/lcrc/group/e3sm2/ac.wlin/",
python tests/integration/utils.py
git add -A # To make it clear if there are later changes
# Run zppy
# Same display order as in https://github.com/E3SM-Project/zppy/pull/697#issuecomment-2762855613
zppy -c tests/integration/generated/test_min_case_global_time_series_viewers_chrysalis.cfg
zppy -c tests/integration/generated/test_min_case_global_time_series_viewers_all_land_variables_chrysalis.cfg
zppy -c tests/integration/generated/test_min_case_global_time_series_viewers_undefined_variables_chrysalis.cfg
zppy -c tests/integration/generated/test_min_case_global_time_series_original_8_missing_ocn_chrysalis.cfg
zppy -c tests/integration/generated/test_min_case_global_time_series_viewers_original_8_chrysalis.cfg
zppy -c tests/integration/generated/test_min_case_global_time_series_viewers_original_atm_plus_land_chrysalis.cfg
emacs examples/post.v3.LR.historical.zppy_v3.cfg
# Copy cfg from https://github.com/E3SM-Project/zppy/pull/694/files
# Update output, www
# Remove irrelevant tasks/subtasks
zppy -c examples/post.v3.LR.historical.zppy_v3.cfg
# Examine output
cd /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_global_time_series_viewers_output/test_zppy_zi_pr19_20250402/v3.LR.historical_0051/post/scripts
grep -v "OK" *status
cd /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_global_time_series_viewers_all_land_variables_output/test_zppy_zi_pr19_20250402/v3.LR.historical_0051/post/scripts
grep -v "OK" *status
cd /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_global_time_series_viewers_undefined_variables_output/test_zppy_zi_pr19_20250402/v3.LR.historical_0051/post/scripts
grep -v "OK" *status
cd /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_global_time_series_original_8_missing_ocn_output/test_zppy_zi_pr19_20250402/v3.LR.historical_0051/post/scripts
grep -v "OK" *status
cd /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_global_time_series_viewers_original_8_output/test_zppy_zi_pr19_20250402/v3.LR.historical_0051/post/scripts
grep -v "OK" *status
cd /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_global_time_series_viewers_original_atm_plus_land_output/test_zppy_zi_pr19_20250402/v3.LR.historical_0051/post/scripts
grep -v "OK" *status
cd /lcrc/group/e3sm/ac.forsyth2/E3SMv3_20250402_try1/v3.LR.historical_0051/post/scripts
grep -v "OK" *status
# Still runningSo far, only viewers_original_8 appears to have problems. The behavior on main is to generate a 8-plot PDF and link it under "Output Sets". The behavior on this PR is to generate an 8-plot PDF (2 of which are blank) and link it under a directory listing.
Full min-case cfg results
I copied the table from E3SM-Project/zppy#697 (comment) and added a column for any changes in behavior:
| Test cfg | ts:lnd_monthly_glb vars= |
plots_original= |
plots_lnd= |
Viewer link | Original Behavior | New Behavior |
|---|---|---|---|---|---|---|
| viewers | 20+ vars | "" |
20+ vars | viewer | atm, lnd link to tables. No original link. |
No change |
| viewers_all_land_variables | 20+ vars | "" |
"all" |
viewer | atm, lnd link to tables. No original link. That is, with the available variables constricted to the same 20+ vars as in the viewers test cfg, the same plot set generated. |
No change |
| viewers_undefined_variables | "" |
"" |
"all" |
viewer | atm, lnd link to tables. Very large number of lnd variables listed. |
No change |
| original_8_missing_ocn | N/A | default 8 plots | default "" |
N/A | Viewer not requested. No results page either, because mpas_analysis task is missing in the cfg and thus global_time_series gets skipped. This is expected behavior. |
No change |
| viewers_original_8 | N/A | default 8 plots | default "" |
viewer | In this case, make_viewer=True but the only requested component is the original 8 plots. This ends up looking a little weird, but it works. Clicking the link brings us to an 8-plot single-page PDF |
There is now a directory list instead of a Viewer-type link to a PDF. Only 6/8 plots appear on the single-page PDF now |
| viewers_original_atm_plus_land | "" |
5 atm plots | 4 vars | viewer | atm, lnd link to tables. original links to a 5-plot PDF |
No Change |
I think both behaviors are acceptable, the behavior of the this PR preserves what is in the version in current e3sm-unified? The only potential concern is where are the other single plots (png) will be saved? if they are in the same directory, than it can be very crowded to find the original 8-panel plots. |
Well, we do need to resolve the 2 missing plots (this PR skips "Change in ocean heat content" and "Change in sea level").
Yes, they get stored in the same directory. I recall getting the viewers to correctly find the images in the first place was a major difficulty, so if we did change the directory, we'd need to make sure the Viewers can still find the source images.
Per #19 (comment), I have confirmed this behavior change does come up in testing. Additional test# This tests plots_lnd without the viewer (4x2 PDF)
zppy -c tests/integration/generated/test_min_case_global_time_series_custom_chrysalis.cfg
cd /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_global_time_series_custom_output/test_zppy_zi_pr19_20250402/v3.LR.historical_0051/post/scripts
grep -v "OK" *status The www is a directory listing of PNGs, no PDF. Let's take a look at behavior on # Comment out environment_commands for `global_time_series`, so that Unified rc13 is used
#environment_commands = "source /gpfs/fs1/home/ac.forsyth2/miniforge3/etc/profile.d/conda.sh; conda activate zi_pr19_20250402"
# Set output and www:
output = "/lcrc/group/e3sm/ac.forsyth2/zppy_min_case_global_time_series_custom_output/test_zppy_zi_pr19_20250402_change_off/v3.LR.historical_0051"
www = "/lcrc/group/e3sm/public_html/diagnostic_output/ac.forsyth2/zppy_min_case_global_time_series_custom_www/test_zppy_zi_pr19_20250402_change_off"Then run: zppy -c tests/integration/generated/test_min_case_global_time_series_custom_chrysalis.cfg
# RUNNING
cd /lcrc/group/e3sm/ac.forsyth2/zppy_min_case_global_time_series_custom_output/test_zppy_zi_pr19_20250402_change_off/v3.LR.historical_0051/post/scripts
grep -v "OK" *statusThe www is a directory listing of multi-plot PNGs and PDFs that group together those PNGs. |
|
In this case, we can consider remove this commit d92f093 that changes plotting behavior. My test shows the other commit (load all regions) primarily contribute to the performance gain. |
|
@forsyth2 Another option we can propose is to not fix the performance issue for now to not further delay e3sm-unified. In this case, we provide a separate cfg for enable all land variables. This is a fallback, but is perhaps more reasonable in this situation? |
We need to have a clear decision on the design specification that we can refer back to. As of
As of this PR, the behavior is:
@chengzhuzhang Are either of those the desired design specification? Which "min-case" testing cfg corresponds with which cell in the above tables?
|
Thanks for the summary. Sounds good. We can probably get more input from science groups once this feature is released and exercised more in evaluation. |
@chengzhuzhang I think this makes sense. For now, we'll have the expected output from the first table (
I am going to let this finish running to see if any other issues pop up. Currently at 3h38m |
Never mind, it wasn't even running in the right environment. I forgot that had to be updated manually, unlike the auto-generated min-case cfg's... > cd /lcrc/group/e3sm/ac.forsyth2/E3SMv3_20250402_try1/v3.LR.historical_0051/post/scripts
> emacs global_time_series_1985-2014.settings
'environment_commands': 'source '
'/gpfs/fs1/home/ac.forsyth2/miniforge3/etc/profile.d/conda.sh; '
'conda activate zi_plots_lnd_20250326',Which is obviously not the environment |
|
I do want to see the actual results though: |
|
Ok actually using the correct environment, we do indeed see a faster runtime: cd /lcrc/group/e3sm/ac.forsyth2/E3SMv3_20250402_try1/v3.LR.historical_0051/post/scripts
grep -v "OK" *status
# No errors
grep -in "Elapsed time" global_time_series_1985-2014.o*
# global_time_series_1985-2014.o721350:3479:Elapsed time: 8292 secondsThat's 8,292 seconds = 138.2 minutes = 2.3 hours to run: @chengzhuzhang It appears your changes do make the code faster. I think it was taking 6.7 hours before, so this PR is a 2.9x speedup. www results however suggest there is no viewer created despite So, I think we're definitely on the right track here with the performance, but there are a few issues to work out first. |
1. Fix issue with missing plots in original component for non-global regions - Handle global-only plots in non-global regions by creating empty plots - Ensures all plots appear in multi-plot pages 2. Fix viewer creation when make_viewer=True - Create viewers for all components with requested plots - Properly link to generated PNGs in the viewer 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
This reverts commit 619b2fa.
|
Good to know this PR is likely on the right track to improve performance. I tried to fix the changed behavior for the original plots and viewer last night but I think it needs clear separation of original component from other realms. Given that I'm not very familiar with the code base and I'm not confident to make deep refactoring with the time constraint. I think this should be a future development item after this zppy/zppy-interfaces release. |
|
completed by #26 |
Summary
Objectives: performance improvement by:
When analyses the 30 year log that including all land variables: /lcrc/group/e3sm/ac.forsyth2/E3SMv3_20250331_try2/v3.LR.historical_0051/post/scripts/global_time_series_1985-2014.o719959
I tried to understand and analyze the zppy-interface results and code, and I found that there are 2 hours spent each time before processing a new region (n,s,glb). This PR attempted to reduce time by loading all regions at the same time. The total time for zi-interface to produce land vars can reduce about 50% (more than 6 hours -> 3hours)
@forsyth2 please review and see if this is acceptable. I did a full tests on Chrysalis, but not including the original plots, following the below commandline:
There should be more room for improvement of the code and performance.
Issue resolution:
Select one: This pull request is...
Please fill out either the "Small Change" or "Big Change" section (the latter includes the numbered subsections), and delete the other.
Small Change
Big Change
1. Does this do what we want it to do?
Required:
2. Are the implementation details accurate & efficient?
Required:
If applicable:
zppy-interfaces/conda, not just animportstatement.3. Is this well documented?
Required:
4. Is this code clean?
Required:
If applicable: