-
Notifications
You must be signed in to change notification settings - Fork 0
merge to latest main #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enhances MOM6 (ocean model) variable processing capabilities by adding support for both native and regridded ocean grids, alongside improvements to logging consistency and PBS resource allocation.
Key Changes:
- Added support for processing ocean variables on native MOM6 grid vs. regridded grid based on dimension checks
- Replaced print statements with proper logger calls throughout the codebase
- Enhanced
process_one_varto handle multiple dimension configurations per variable
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/monthly_cmor.py | Major refactor to support multiple dimension configurations per variable, improved logging, and ocean-specific grid handling |
| scripts/fullamon.sh | Dynamic PBS CPU allocation instead of hardcoded worker count |
| scripts/fullLmon.sh | Dynamic PBS CPU allocation instead of hardcoded worker count |
| cmip7_prep/regrid.py | Added realm-specific regridding map selection and ocean variable handling logic |
| cmip7_prep/pipeline.py | Added MOM6 grid support in the pipeline with proper auxiliary variable handling |
| cmip7_prep/mapping_compat.py | Added dims field to VarConfig for dimension specification |
| cmip7_prep/data/cesm_to_cmip7.yaml | Updated sos variable to support multiple dimension configurations |
cmip7_prep/pipeline.py
Outdated
| from pathlib import Path | ||
| from typing import Optional, Sequence, Union, Dict, List | ||
| import re | ||
| from venv import logger |
Copilot
AI
Nov 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logger is being imported from the venv module, which is incorrect. The venv module is for creating virtual environments and does not provide a logger. This should likely be from logging import getLogger or use the existing logging setup pattern from other files in the project.
| from venv import logger | |
| import logging |
scripts/monthly_cmor.py
Outdated
| logger.error(f"Exception while processing {varname}: {e!r}") | ||
| return (varname, f"ERROR: {e!r}") | ||
| results.append((varname, f"ERROR: {e!r}")) | ||
| return results if len(results) > 1 else results[0] |
Copilot
AI
Nov 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function signature declares it returns list[tuple[str, str]], but this line returns either a list or a single tuple based on length. This breaks the type contract and will cause issues for callers expecting a consistent return type. Always return a list to match the declared signature.
| return results if len(results) > 1 else results[0] | |
| return results |
cmip7_prep/regrid.py
Outdated
| if hdim == "lndgrid": | ||
| out_norm = regridder(da2_2d, **kwargs) | ||
| out = _denormalize_land_field(out_norm, ds_in, spec.path) | ||
| else: | ||
| # default path (atm or no landfrac/sftlf available) | ||
| # default path (atm, ocn or no landfrac/sftlf available) | ||
| out = regridder(da2_2d, **kwargs) |
Copilot
AI
Nov 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regridder is called inside the if hdim == 'lndgrid' block but not before it. For ocean realm processing (lines 640-644), out_norm is never assigned before the denormalization check at line 649, which would cause an UnboundLocalError. The regridder call should be moved before the if hdim == 'lndgrid' check or the logic restructured to handle all realms consistently.
cmip7_prep/pipeline.py
Outdated
| ds_native = open_native_for_cmip_vars( | ||
| [cmip_var], ds_or_glob, mapping, **open_kwargs | ||
| ) | ||
| if not "landfrac" in ds_native and not "ncol" in ds_native: |
Copilot
AI
Nov 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the idiomatic in operator format: if 'landfrac' not in ds_native and 'ncol' not in ds_native: for better readability and consistency with Python conventions.
| if not "landfrac" in ds_native and not "ncol" in ds_native: | |
| if "landfrac" not in ds_native and "ncol" not in ds_native: |
| if "ncol" not in var_da.dims and "lndgrid" not in var_da.dims: | ||
| logger.info("Variable has no 'ncol' or 'lndgrid' dim; assuming ocn variable.") | ||
| hdim = "tripolar" | ||
| da2 = var_da # Use the DataArray, not the whole Dataset | ||
| non_spatial = [d for d in da2.dims if d not in ("yh", "xh")] | ||
| realm = "ocn" | ||
| method = method or "bilinear" # force bilinear for ocn | ||
| else: | ||
| da2, non_spatial, hdim = _ensure_ncol_last(var_da) | ||
| if hdim == "lndgrid": | ||
| da2 = _normalize_land_field(da2, ds_in) | ||
| realm = "lnd" | ||
| elif hdim == "ncol": | ||
| realm = "atm" |
Copilot
AI
Nov 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition assumes that any variable without 'ncol' or 'lndgrid' is an ocean variable. This logic is fragile and could fail if new dimension types are added. Consider checking explicitly for expected ocean dimensions ('xh', 'yh') or adding a realm parameter instead of inferring from absence of dimensions.
| if "ncol" not in var_da.dims and "lndgrid" not in var_da.dims: | |
| logger.info("Variable has no 'ncol' or 'lndgrid' dim; assuming ocn variable.") | |
| hdim = "tripolar" | |
| da2 = var_da # Use the DataArray, not the whole Dataset | |
| non_spatial = [d for d in da2.dims if d not in ("yh", "xh")] | |
| realm = "ocn" | |
| method = method or "bilinear" # force bilinear for ocn | |
| else: | |
| da2, non_spatial, hdim = _ensure_ncol_last(var_da) | |
| if hdim == "lndgrid": | |
| da2 = _normalize_land_field(da2, ds_in) | |
| realm = "lnd" | |
| elif hdim == "ncol": | |
| realm = "atm" | |
| if "xh" in var_da.dims and "yh" in var_da.dims: | |
| logger.info("Variable has 'xh' and 'yh' dims; treating as ocean variable.") | |
| hdim = "tripolar" | |
| da2 = var_da # Use the DataArray, not the whole Dataset | |
| non_spatial = [d for d in da2.dims if d not in ("yh", "xh")] | |
| realm = "ocn" | |
| method = method or "bilinear" # force bilinear for ocn | |
| elif "ncol" in var_da.dims or "lndgrid" in var_da.dims: | |
| da2, non_spatial, hdim = _ensure_ncol_last(var_da) | |
| if hdim == "lndgrid": | |
| da2 = _normalize_land_field(da2, ds_in) | |
| realm = "lnd" | |
| elif hdim == "ncol": | |
| realm = "atm" | |
| else: | |
| raise ValueError( | |
| f"Cannot determine grid/realm for variable '{varname}': dims={var_da.dims!r}. " | |
| "Expected 'ncol', 'lndgrid', or both 'xh' and 'yh'." | |
| ) |
cmip7_prep/regrid.py
Outdated
| force_method="conservative", | ||
| realm=realm, | ||
| ) # fx always conservative | ||
| print(f"using fx map: {spec.path}") |
Copilot
AI
Nov 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This print statement should be replaced with a logger call for consistency with the rest of the codebase changes in this PR, which systematically replaced print statements with proper logging.
| print(f"using fx map: {spec.path}") | |
| logger.info(f"using fx map: {spec.path}") |
cmip7_prep/regrid.py
Outdated
| print( | ||
| f"Regridding {varname} using {spec.method_label} map: {spec.path} for realm {realm}" | ||
| ) |
Copilot
AI
Nov 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This print statement should be replaced with a logger call for consistency with the rest of the codebase changes in this PR.
cmip7_prep/regrid.py
Outdated
| print( | ||
| f"da2_2d range: {da2_2d['lat'].min().item()} to {da2_2d['lat'].max().item()} lat, " | ||
| f"{da2_2d['lon'].min().item()} to {da2_2d['lon'].max().item()} lon" | ||
| ) |
Copilot
AI
Nov 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This print statement should be replaced with a logger call (likely logger.debug() given it's diagnostic information) for consistency with the rest of the codebase changes in this PR.
cmip7_prep/pipeline.py
Outdated
| return ( | ||
| cmip_var, | ||
| f"ERROR: MOM6 grid information is required for variable {cmip_var} " | ||
| "but was not provided.", |
Copilot
AI
Nov 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function should return xr.Dataset according to its signature, but this error path returns a tuple. This type mismatch will cause issues for callers expecting a Dataset. Consider raising an exception instead of returning an error tuple.
| return ( | |
| cmip_var, | |
| f"ERROR: MOM6 grid information is required for variable {cmip_var} " | |
| "but was not provided.", | |
| raise ValueError( | |
| f"MOM6 grid information is required for variable {cmip_var} but was not provided." |
Ongoing development of MOM6 prep.