You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix land variable performance issue by eagerly loading data
Land variables were taking ~18 minutes each vs ~5 seconds for atmosphere
variables. The issue was dask lazy evaluation - when area scaling arrays
(total_land_area, north_land_area, south_land_area) remained as lazy dask
arrays, the multiplication operation triggered loading all data from disk,
causing the massive delay.
Solution: Eagerly load both area fields and computed data arrays into
memory before performing operations. This ensures all operations work with
numpy arrays instead of lazy dask arrays.
Changes:
- For TOTAL metric variables, call .load() on area fields
(valid_area_per_gridcell, area, landfrac) after opening dataset
- Call .load() on annual average data_array after computation
- Reduces land variable processing from ~18 minutes to ~5-10 seconds
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <[email protected]>
0 commit comments