-
Notifications
You must be signed in to change notification settings - Fork 34
Creates a mask for 2d datasets before regridding #974
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Re-run a full set run. The SST metrics are now closer to what we have in v2: V2: V3: |
|
@tomvothecoder I think this PR is in good shape, please review when get a chance, thanks! |
tomvothecoder
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @chengzhuzhang, I left some questions and suggestions. This PR is coming together and almost done. Thanks.
e3sm_diags/driver/utils/regrid.py
Outdated
|
|
||
| if tool == "regrid2" or not any(dim in dims_to_check for dim in var_dims): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if tool == "regrid2" or not any(dim in dims_to_check for dim in var_dims): | |
| # Create a mask for xESMF if the variable is 2D (does not include specified dimensions) | |
| if tool == "xesmf" and not any(dim in dims_to_check for dim in var_dims): |
Are we supposed to add the mask variable whenever the tool is xesmf, rather than regrid2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the logic is right without changing, assumming regrid2 can take 3d var as input for regriding?
- Add unit tests for `_add_mask`
tomvothecoder
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Jill, I pushed some final suggestions. I think this PR is good to go if you agree with the changes.
Let me know what you think.
e3sm_diags/driver/utils/regrid.py
Outdated
| def _add_mask(ds: xr.Dataset, var_key: str, tool: str) -> xr.Dataset: | ||
| """Add a mask variable to the dataset. | ||
| This function creates a mask variable for the specified variable key in | ||
| the dataset if the tool is "regrid2" (which supports 3D variables) or if | ||
| the variable is 2D with only spatial dimensions (e.g., "X" and "Y"). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed _add_mask_for_2d to _add_mask since it supports 3D variables with regrid2.
I also added more descriptions to the docstring and removed dims_to_check parameter.
| ds_new = ds.copy() | ||
| var = ds_new[var_key] | ||
| var_dims = var.dims | ||
|
|
||
| if tool == "regrid2" or not any(dim in dims_to_check for dim in var_dims): | ||
| spatial_dims = {"x", "y", "lon", "lat", "longitude", "latitude"} | ||
| is_spatial_var = len(var_dims) == 2 and all( | ||
| str(dim).lower() in spatial_dims for dim in var_dims | ||
| ) | ||
|
|
||
| if tool == "regrid2" or is_spatial_var: | ||
| logger.debug(f"Creating mask for {var_key} with dimensions {var_dims}") | ||
| ds["mask"] = xr.where(~np.isnan(ds[var_key]), 1, 0) | ||
|
|
||
| if "mask" in ds_new: | ||
| logger.warning("Overwriting existing 'mask' variable in the dataset.") | ||
|
|
||
| ds_new["mask"] = xr.where(~np.isnan(var), 1, 0) | ||
| else: | ||
| logger.debug( | ||
| f"Skipping mask creation for variable {var_key} with dimensions {var_dims}" | ||
| ) | ||
|
|
||
| return ds | ||
| return ds_new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the logic of _add_mask to make it easier to understand. Now, it checks if the variable is a 2D spatial variable (if it is, it allows regridding using xESMF).
This is simpler than the old approach, which checked whether any of the variable’s dimensions weren't in a list of vertical and time dimensions. That method was harder to follow and skipped regridding with xESMF if any non-spatial dimension was present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it raises a logger.warning if an existing mask variable will be overwritten
| np.testing.assert_array_equal(result["mask"].values, np.array([[1, 0], [1, 1]])) | ||
|
|
||
| # Ensure a warning is logged | ||
| assert "Overwriting existing 'mask' variable in the dataset" in caplog.text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added unit tests to cover various cases with _add_mask().
It is easier to test _add_mask() directly because align_grids_to_lower_res() removes the mask variable by the end of regridding, which makes it harder to detect if a mask was used unless we write more comprehensive test to check for np.nan.
e3sm_diags/driver/utils/regrid.py
Outdated
| var_dims = var.dims | ||
|
|
||
| spatial_dims = {"x", "y", "lon", "lat", "longitude", "latitude"} | ||
| is_spatial_var = len(var_dims) == 2 and all( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel this is a little risky than the old logic, for cases when excessive dims present? I'm not sure if those were completely dropped at some point. I can test a full run again to see if all variables are produced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally understood the logic for xESMF regridding as: ignore any dimensions that are Z (vertical) or T (time), which effectively means we expect to regrid a 2D variable with just X and Y (horizontal spatial dimensions).
As far as I can recall, I haven't encountered any variables with more than these four dimensions (X, Y, Z, T) but there might be some that I don't remember. If such variables do exist and we want to proceed with regridding them anyway, this logic would end up ignoring those variables entirely which is problematic.
Also, based on what I'm reading, xESMF does support regridding variables with more than two dimensions by broadcasting. It applies the 2D regridding independently across each slice along the non-spatial dimensions. Do we want to continue regridding variables that have >2 dimensions minus Z and T? If so, then we'll need to revert the logic back to the old one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are cosp related dimensions, but not sure if those are dropped in the course of data ingection
xESMF does support 3Dvar as our original implmentation, but doesn't support 3d masks, as I understand. Here is the code line that throw errors when passing in 3d mask: https://github.com/pangeo-data/xESMF/blob/30e3ecb094d39a0c6f1edb22b60d8d608a19f7d9/xesmf/backend.py#L138. Before xESMF relaxing this constraint, we can only have more accurate regridding for 2d vars, at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do like the refactored and refined code from your change. Could you only revert the logic that handles exclusion of dims? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sounds good!
| dims_to_skip = {"lev", "plev", "z", "time", "t"} | ||
| has_skipped_dims = any(str(dim).lower() in dims_to_skip for dim in var_dims) | ||
|
|
||
| if tool == "regrid2" or is_spatial_var: | ||
| if tool == "regrid2" or not has_skipped_dims: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revert the logic for checking dimensions back to the old logic in 7bd8f29 (#974)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Description
This PR is to address the regridding difference identified here:#964
lat_lon_HadISST_CL-SST-metrics-diff.ipynb:
This PR explicitly create a mask variable before regriding to prevent missing data affecting regridding results.
Checklist
If applicable: