Skip to content

Conversation

@jedwards4b
Copy link
Contributor

Ongoing development of MOM6 prep.

@jedwards4b jedwards4b requested a review from Copilot November 4, 2025 15:32
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances MOM6 (ocean model) variable processing capabilities by adding support for both native and regridded ocean grids, alongside improvements to logging consistency and PBS resource allocation.

Key Changes:

  • Added support for processing ocean variables on native MOM6 grid vs. regridded grid based on dimension checks
  • Replaced print statements with proper logger calls throughout the codebase
  • Enhanced process_one_var to handle multiple dimension configurations per variable

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
scripts/monthly_cmor.py Major refactor to support multiple dimension configurations per variable, improved logging, and ocean-specific grid handling
scripts/fullamon.sh Dynamic PBS CPU allocation instead of hardcoded worker count
scripts/fullLmon.sh Dynamic PBS CPU allocation instead of hardcoded worker count
cmip7_prep/regrid.py Added realm-specific regridding map selection and ocean variable handling logic
cmip7_prep/pipeline.py Added MOM6 grid support in the pipeline with proper auxiliary variable handling
cmip7_prep/mapping_compat.py Added dims field to VarConfig for dimension specification
cmip7_prep/data/cesm_to_cmip7.yaml Updated sos variable to support multiple dimension configurations

from pathlib import Path
from typing import Optional, Sequence, Union, Dict, List
import re
from venv import logger
Copy link

Copilot AI Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logger is being imported from the venv module, which is incorrect. The venv module is for creating virtual environments and does not provide a logger. This should likely be from logging import getLogger or use the existing logging setup pattern from other files in the project.

Suggested change
from venv import logger
import logging

Copilot uses AI. Check for mistakes.
logger.error(f"Exception while processing {varname}: {e!r}")
return (varname, f"ERROR: {e!r}")
results.append((varname, f"ERROR: {e!r}"))
return results if len(results) > 1 else results[0]
Copy link

Copilot AI Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function signature declares it returns list[tuple[str, str]], but this line returns either a list or a single tuple based on length. This breaks the type contract and will cause issues for callers expecting a consistent return type. Always return a list to match the declared signature.

Suggested change
return results if len(results) > 1 else results[0]
return results

Copilot uses AI. Check for mistakes.
Comment on lines 649 to 654
if hdim == "lndgrid":
out_norm = regridder(da2_2d, **kwargs)
out = _denormalize_land_field(out_norm, ds_in, spec.path)
else:
# default path (atm or no landfrac/sftlf available)
# default path (atm, ocn or no landfrac/sftlf available)
out = regridder(da2_2d, **kwargs)
Copy link

Copilot AI Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regridder is called inside the if hdim == 'lndgrid' block but not before it. For ocean realm processing (lines 640-644), out_norm is never assigned before the denormalization check at line 649, which would cause an UnboundLocalError. The regridder call should be moved before the if hdim == 'lndgrid' check or the logic restructured to handle all realms consistently.

Copilot uses AI. Check for mistakes.
ds_native = open_native_for_cmip_vars(
[cmip_var], ds_or_glob, mapping, **open_kwargs
)
if not "landfrac" in ds_native and not "ncol" in ds_native:
Copy link

Copilot AI Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the idiomatic in operator format: if 'landfrac' not in ds_native and 'ncol' not in ds_native: for better readability and consistency with Python conventions.

Suggested change
if not "landfrac" in ds_native and not "ncol" in ds_native:
if "landfrac" not in ds_native and "ncol" not in ds_native:

Copilot uses AI. Check for mistakes.
Comment on lines 593 to 606
if "ncol" not in var_da.dims and "lndgrid" not in var_da.dims:
logger.info("Variable has no 'ncol' or 'lndgrid' dim; assuming ocn variable.")
hdim = "tripolar"
da2 = var_da # Use the DataArray, not the whole Dataset
non_spatial = [d for d in da2.dims if d not in ("yh", "xh")]
realm = "ocn"
method = method or "bilinear" # force bilinear for ocn
else:
da2, non_spatial, hdim = _ensure_ncol_last(var_da)
if hdim == "lndgrid":
da2 = _normalize_land_field(da2, ds_in)
realm = "lnd"
elif hdim == "ncol":
realm = "atm"
Copy link

Copilot AI Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition assumes that any variable without 'ncol' or 'lndgrid' is an ocean variable. This logic is fragile and could fail if new dimension types are added. Consider checking explicitly for expected ocean dimensions ('xh', 'yh') or adding a realm parameter instead of inferring from absence of dimensions.

Suggested change
if "ncol" not in var_da.dims and "lndgrid" not in var_da.dims:
logger.info("Variable has no 'ncol' or 'lndgrid' dim; assuming ocn variable.")
hdim = "tripolar"
da2 = var_da # Use the DataArray, not the whole Dataset
non_spatial = [d for d in da2.dims if d not in ("yh", "xh")]
realm = "ocn"
method = method or "bilinear" # force bilinear for ocn
else:
da2, non_spatial, hdim = _ensure_ncol_last(var_da)
if hdim == "lndgrid":
da2 = _normalize_land_field(da2, ds_in)
realm = "lnd"
elif hdim == "ncol":
realm = "atm"
if "xh" in var_da.dims and "yh" in var_da.dims:
logger.info("Variable has 'xh' and 'yh' dims; treating as ocean variable.")
hdim = "tripolar"
da2 = var_da # Use the DataArray, not the whole Dataset
non_spatial = [d for d in da2.dims if d not in ("yh", "xh")]
realm = "ocn"
method = method or "bilinear" # force bilinear for ocn
elif "ncol" in var_da.dims or "lndgrid" in var_da.dims:
da2, non_spatial, hdim = _ensure_ncol_last(var_da)
if hdim == "lndgrid":
da2 = _normalize_land_field(da2, ds_in)
realm = "lnd"
elif hdim == "ncol":
realm = "atm"
else:
raise ValueError(
f"Cannot determine grid/realm for variable '{varname}': dims={var_da.dims!r}. "
"Expected 'ncol', 'lndgrid', or both 'xh' and 'yh'."
)

Copilot uses AI. Check for mistakes.
force_method="conservative",
realm=realm,
) # fx always conservative
print(f"using fx map: {spec.path}")
Copy link

Copilot AI Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This print statement should be replaced with a logger call for consistency with the rest of the codebase changes in this PR, which systematically replaced print statements with proper logging.

Suggested change
print(f"using fx map: {spec.path}")
logger.info(f"using fx map: {spec.path}")

Copilot uses AI. Check for mistakes.
Comment on lines 623 to 625
print(
f"Regridding {varname} using {spec.method_label} map: {spec.path} for realm {realm}"
)
Copy link

Copilot AI Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This print statement should be replaced with a logger call for consistency with the rest of the codebase changes in this PR.

Copilot uses AI. Check for mistakes.
Comment on lines 645 to 648
print(
f"da2_2d range: {da2_2d['lat'].min().item()} to {da2_2d['lat'].max().item()} lat, "
f"{da2_2d['lon'].min().item()} to {da2_2d['lon'].max().item()} lon"
)
Copy link

Copilot AI Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This print statement should be replaced with a logger call (likely logger.debug() given it's diagnostic information) for consistency with the rest of the codebase changes in this PR.

Copilot uses AI. Check for mistakes.
Comment on lines 237 to 240
return (
cmip_var,
f"ERROR: MOM6 grid information is required for variable {cmip_var} "
"but was not provided.",
Copy link

Copilot AI Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function should return xr.Dataset according to its signature, but this error path returns a tuple. This type mismatch will cause issues for callers expecting a Dataset. Consider raising an exception instead of returning an error tuple.

Suggested change
return (
cmip_var,
f"ERROR: MOM6 grid information is required for variable {cmip_var} "
"but was not provided.",
raise ValueError(
f"MOM6 grid information is required for variable {cmip_var} but was not provided."

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants