Skip to content

Conversation

@tomvothecoder
Copy link
Collaborator

@tomvothecoder tomvothecoder commented Nov 11, 2025

Description

Closes #1012

This PR preserves legacy Xarray merge behavior in response to breaking changes introduced in newer versions of Xarray. The changes ensure backward compatibility by explicitly setting compat="no_conflicts" and join="outer" when merging datasets, which were the default behaviors before Xarray v2025.08.0.

Key Changes:

  • Introduced a global constant LEGACY_XARRAY_MERGE_KWARGS to centralize merge configuration
  • Applied legacy merge settings to xr.merge() calls in dataset utilities
  • Refactored duplicate code in mp_partition_driver.py by extracting a reusable _open_mfdataset() function
  • Applied legacy merge settings to xr.open_mfdataset() in the new _open_mfdataset() helper function

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules

If applicable:

  • New and existing unit tests pass with my changes (locally and CI/CD build)
  • I have added tests that prove my fix is effective or that my feature works
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have noted that this is a breaking change for a major release (fix or feature that would cause existing functionality to not work as expected)

@tomvothecoder tomvothecoder marked this pull request as ready for review November 11, 2025 22:00
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR preserves legacy Xarray merge behavior in response to breaking changes introduced in newer versions of Xarray. The changes ensure backward compatibility by explicitly setting compat="no_conflicts" and join="outer" when merging datasets, which were the default behaviors before Xarray v2025.08.0.

Key Changes:

  • Introduced a global constant LEGACY_XARRAY_MERGE_KWARGS to centralize merge configuration
  • Applied legacy merge settings to xr.merge() calls in dataset utilities
  • Applied legacy merge settings to xr.open_mfdataset() in the new _open_mfdataset() helper function
  • Refactored duplicate code in mp_partition_driver.py by extracting a reusable _open_mfdataset() function

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
e3sm_diags/__init__.py Defines LEGACY_XARRAY_MERGE_KWARGS constant with legacy Xarray merge settings
e3sm_diags/driver/utils/dataset_xr.py Applies legacy merge kwargs to two xr.merge() calls
e3sm_diags/driver/tropical_subseasonal_driver.py Adds join="outer" to existing xr.merge() call already using compat="override"
e3sm_diags/driver/mp_partition_driver.py Extracts duplicated code into _open_mfdataset() helper and applies legacy merge settings; improves docstring formatting

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Collaborator Author

@tomvothecoder tomvothecoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @mahf708 and @chengzhuzhang, this PR is ready for review. It sets legacy Xarray settings on remaining xr.open_mfdataset() and xr.merge() calls to prevent a FutureWarning about changing kwargs.

@mahf708 Can you try pip installing the build from this branch in your env and see if the FutureWarnings are now gone?

Comment on lines 100 to 150
try:
# xr.open_mfdataset() can accept an explicit list of files.
landfrac = xr.open_mfdataset(glob.glob(f"{test_data_path}/LANDFRAC_*")).sel(
lat=slice(-70, -30), time=slice(f"{start_year}-01-01", f"{end_year}-12-31")
)["LANDFRAC"]
temp = xr.open_mfdataset(glob.glob(f"{test_data_path}/T_*.nc")).sel(
lat=slice(-70, -30), time=slice(f"{start_year}-01-01", f"{end_year}-12-31")
)["T"]
cice = xr.open_mfdataset(glob.glob(f"{test_data_path}/CLDICE_*.nc")).sel(
lat=slice(-70, -30), time=slice(f"{start_year}-01-01", f"{end_year}-12-31")
)["CLDICE"]
cliq = xr.open_mfdataset(glob.glob(f"{test_data_path}/CLDLIQ_*.nc")).sel(
lat=slice(-70, -30), time=slice(f"{start_year}-01-01", f"{end_year}-12-31")
)["CLDLIQ"]
landfrac = _open_mfdataset(test_data_path, "LANDFRAC", start_year, end_year)
temp = _open_mfdataset(test_data_path, "T", start_year, end_year)
cice = _open_mfdataset(test_data_path, "CLDICE", start_year, end_year)
cliq = _open_mfdataset(test_data_path, "CLDLIQ", start_year, end_year)
except OSError:
logger.info(
f"No files to open for variables within {start_year} and {end_year} from {test_data_path}."
)
raise

parameter.test_name_yrs = test_data.get_name_yrs_attr(season)

metrics_dict["test"] = {}
metrics_dict["test"]["T"], metrics_dict["test"]["LCF"] = compute_lcf(
cice, cliq, temp, landfrac
)

if run_type == "model-vs-model":
ref_data = Dataset(parameter, data_type="ref")

ref_data_path = parameter.reference_data_path
start_year = parameter.ref_start_yr
end_year = parameter.ref_end_yr
# xr.open_mfdataset() can accept an explicit list of files.

try:
landfrac = xr.open_mfdataset(glob.glob(f"{ref_data_path}/LANDFRAC_*")).sel(
lat=slice(-70, -30),
time=slice(f"{start_year}-01-01", f"{end_year}-12-31"),
)["LANDFRAC"]
temp = xr.open_mfdataset(glob.glob(f"{ref_data_path}/T_*.nc")).sel(
lat=slice(-70, -30),
time=slice(f"{start_year}-01-01", f"{end_year}-12-31"),
)["T"]
cice = xr.open_mfdataset(glob.glob(f"{ref_data_path}/CLDICE_*.nc")).sel(
lat=slice(-70, -30),
time=slice(f"{start_year}-01-01", f"{end_year}-12-31"),
)["CLDICE"]
cliq = xr.open_mfdataset(glob.glob(f"{ref_data_path}/CLDLIQ_*.nc")).sel(
lat=slice(-70, -30),
time=slice(f"{start_year}-01-01", f"{end_year}-12-31"),
)["CLDLIQ"]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I extracted common code into an _open_mfdataset() function at the bottom of this file to make the code DRY.

Comment on lines +148 to +189
def _open_mfdataset(
data_path: str, var: str, start_year: int, end_year: int
) -> xr.DataArray:
"""
Open multiple NetCDF files as a single xarray Dataset and subset by time
and latitude.
This function reads multiple NetCDF files matching the specified variable
name and combines them into a single xarray Dataset. The data is then
subsetted based on the specified time range and latitude bounds.
Parameters
----------
data_path : str
The path to the directory containing the NetCDF files.
var : str
The variable name to match in the file pattern.
start_year : int
The starting year for the time subsetting.
end_year : int
The ending year for the time subsetting.
Returns
-------
xr.DataArray
The subsetted DataArray for the specified variable, filtered by time
and latitude.
"""
file_pattern = f"{data_path}/{var}_*.nc"
ds = xr.open_mfdataset(
glob.glob(file_pattern),
data_vars="minimal",
**LEGACY_XARRAY_MERGE_KWARGS, # type: ignore[ arg-type ]
)

ds_sub = ds.sel(
lat=slice(-70, -30), time=slice(f"{start_year}-01-01", f"{end_year}-12-31")
)

da_var = ds_sub[var]

return da_var
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DRY function.

Comment on lines +15 to +22
# Settings to preserve legacy Xarray behavior when merging datasets.
# See https://xarray.pydata.org/en/stable/user-guide/io.html#combining-multiple-files
# and https://xarray.pydata.org/en/stable/whats-new.html#id14
LEGACY_XARRAY_MERGE_KWARGS = {
# "override", "exact" are the new defaults as of Xarray v2025.08.0
"compat": "no_conflicts",
"join": "outer",
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constant variable for legacy xarray merge settings.

@chengzhuzhang
Copy link
Contributor

@tomvothecoder The logs got messy after this xarray update. Thanks for working on this PR to fix this. The code change looks good to me.

@tomvothecoder tomvothecoder merged commit e65bf2a into main Nov 12, 2025
5 checks passed
@tomvothecoder tomvothecoder deleted the devops/1012-xr-settings branch November 12, 2025 21:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Preserve Xarray default behaviors for compat and join

4 participants