Skip to content

Commit 229d6da

Browse files
chengzhuzhangclaude
andcommitted
Fix land variable performance issue by eagerly loading data
Land variables were taking ~18 minutes each vs ~5 seconds for atmosphere variables. The issue was dask lazy evaluation - when area scaling arrays (total_land_area, north_land_area, south_land_area) remained as lazy dask arrays, the multiplication operation triggered loading all data from disk, causing the massive delay. Solution: Eagerly load both area fields and computed data arrays into memory before performing operations. This ensures all operations work with numpy arrays instead of lazy dask arrays. Changes: - For TOTAL metric variables, call .load() on area fields (valid_area_per_gridcell, area, landfrac) after opening dataset - Call .load() on annual average data_array after computation - Reduces land variable processing from ~18 minutes to ~5-10 seconds 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
1 parent 98fa8dc commit 229d6da

File tree

1 file changed

+12
-0
lines changed
  • zppy_interfaces/global_time_series/coupled_global

1 file changed

+12
-0
lines changed

zppy_interfaces/global_time_series/coupled_global/utils.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -178,11 +178,23 @@ def process_variable(
178178
# 2. Load only this variable's data
179179
dataset = xcdat.open_mfdataset(file_paths, center_times=True)
180180

181+
# For TOTAL metrics, eagerly load area fields to avoid lazy computation issues
182+
if var.metric == Metric.TOTAL:
183+
if "valid_area_per_gridcell" in dataset:
184+
dataset["valid_area_per_gridcell"].load()
185+
if "area" in dataset:
186+
dataset["area"].load()
187+
if "landfrac" in dataset:
188+
dataset["landfrac"].load()
189+
181190
try:
182191
# 3. Compute annual average
183192
annual_dataset = dataset.temporal.group_average(var.variable_name, "year")
184193
data_array = annual_dataset.data_vars[var.variable_name]
185194

195+
# Eagerly load the result to avoid lazy computation issues
196+
data_array.load()
197+
186198
# 4. Apply area scaling if needed
187199
data_array = apply_scaling(data_array, var.metric, dataset)
188200

0 commit comments

Comments
 (0)