Fix land variable performance issue by eagerly loading data #40

chengzhuzhang · 2025-10-08T21:06:35Z

The pull request improves performance further based on the initial performance improvement work in #26. When testing 165 year historical runs, I found that the land variables were taking ~18 minutes each vs ~5 seconds for atmosphere variables. The issue was dask lazy evaluation - when area scaling arrays (total_land_area, north_land_area, south_land_area) remained as lazy dask arrays, the multiplication operation triggered loading all data from disk, causing the massive delay.

Solution: Eagerly load both area fields and computed data arrays into memory before performing operations. This ensures all operations work with numpy arrays instead of lazy dask arrays.

Changes:

For TOTAL metric variables, call .load() on area fields (valid_area_per_gridcell, area, landfrac) after opening dataset
Call .load() on annual average data_array after computation
Reduces land variable processing from ~18 minutes to ~5-10 seconds

🤖 Generated with Claude Code

Select one: This pull request is...

a bug fix: increment the patch version
a small improvement: increment the minor version
a new feature: increment the minor version
an incompatible (non-backwards compatible) API change: increment the major version

Small Change

To merge, I will use "Squash and merge". That is, this change should be a single commit.
Logic: I have visually inspected the entire pull request myself.
Pre-commit checks: All the pre-commits checks have passed.

Land variables were taking ~18 minutes each vs ~5 seconds for atmosphere variables. The issue was dask lazy evaluation - when area scaling arrays (total_land_area, north_land_area, south_land_area) remained as lazy dask arrays, the multiplication operation triggered loading all data from disk, causing the massive delay. Solution: Eagerly load both area fields and computed data arrays into memory before performing operations. This ensures all operations work with numpy arrays instead of lazy dask arrays. Changes: - For TOTAL metric variables, call .load() on area fields (valid_area_per_gridcell, area, landfrac) after opening dataset - Call .load() on annual average data_array after computation - Reduces land variable processing from ~18 minutes to ~5-10 seconds 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

forsyth2

This looks reasonable from visual inspection, and I see the GitHub Actions are passing.

Since we're in a heavy testing period for the Unified release anyway, we can check the integration tests results on main after merging / in the next rc.

chengzhuzhang · 2025-10-10T16:15:18Z

Thanks @forsyth2 .

chengzhuzhang marked this pull request as ready for review October 9, 2025 23:37

chengzhuzhang requested a review from forsyth2 October 9, 2025 23:38

forsyth2 added the semver: small improvement Small improvement (will increment patch version) label Oct 9, 2025

forsyth2 approved these changes Oct 9, 2025

View reviewed changes

chengzhuzhang merged commit 2554bd4 into main Oct 10, 2025
5 checks passed

chengzhuzhang deleted the one-more-fix-land-performance branch October 10, 2025 16:15

forsyth2 mentioned this pull request Oct 16, 2025

v0.2.0rc2 conda-forge/zppy-interfaces-feedstock#5

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix land variable performance issue by eagerly loading data #40

Fix land variable performance issue by eagerly loading data #40

Uh oh!

chengzhuzhang commented Oct 8, 2025 •

edited by forsyth2

Loading

Uh oh!

forsyth2 left a comment

Uh oh!

chengzhuzhang commented Oct 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix land variable performance issue by eagerly loading data #40

Fix land variable performance issue by eagerly loading data #40

Uh oh!

Conversation

chengzhuzhang commented Oct 8, 2025 • edited by forsyth2 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Small Change

Uh oh!

forsyth2 left a comment

Choose a reason for hiding this comment

Uh oh!

chengzhuzhang commented Oct 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chengzhuzhang commented Oct 8, 2025 •

edited by forsyth2

Loading