Fix land variable performance issue by eagerly loading data #40
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The pull request improves performance further based on the initial performance improvement work in #26. When testing 165 year historical runs, I found that the land variables were taking ~18 minutes each vs ~5 seconds for atmosphere variables. The issue was dask lazy evaluation - when area scaling arrays (total_land_area, north_land_area, south_land_area) remained as lazy dask arrays, the multiplication operation triggered loading all data from disk, causing the massive delay.
Solution: Eagerly load both area fields and computed data arrays into memory before performing operations. This ensures all operations work with numpy arrays instead of lazy dask arrays.
Changes:
🤖 Generated with Claude Code
Select one: This pull request is...
Small Change