Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/tools/cmor.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ workflows are supported. Available subcommands:
* Minimal Syntax: ``fre cmor run -d [indir] -l [varlist] -r [table_config] -p [exp_config] -o [outdir] [options]``
* Required Options:
- ``-d, --indir TEXT`` - Input directory with netCDF files
- ``-l, --varlist TEXT`` - Variable list dictionary mapping local to MIP variable names
- ``-l, --varlist TEXT`` - Variable list dictionary mapping modeler variable names to MIP table variable names
- ``-r, --table_config TEXT`` - MIP table JSON configuration
- ``-p, --exp_config TEXT`` - Experiment/model metadata JSON
- ``-o, --outdir TEXT`` - Output directory prefix
Expand Down
26 changes: 21 additions & 5 deletions docs/usage/cmor_cookbook.rst
Original file line number Diff line number Diff line change
Expand Up @@ -42,17 +42,33 @@ You will need to split the platform-target string appropriately to extract the i
Creating Variable Lists
~~~~~~~~~~~~~~~~~~~~~~~~

Variable lists map your local variable names to MIP table variable names. Generate a variable list from a directory of netCDF files:
Variable lists are JSON files mapping modeler variable names to MIP table variable names. Each
entry is a key/value pair where:

* The **key** is the modeler's variable name -- used for targeting filenames in the input directory
AND expected as the name of the data array inside those files (they must match).
* The **value** is the corresponding MIP table variable name -- used for CMIP metadata lookups.

In many cases, the key and value are identical (e.g., ``"sos": "sos"``), but they may differ when
the modeler uses a different name than the MIP standard (e.g., ``"sst_model": "tos"``).

.. important::

The variable name in the filename **must** match the variable name inside the file. If they
differ, ``fre cmor run`` will raise an error with a helpful message listing the variables
found in the file.

Generate a variable list from a directory of netCDF files:

.. code-block:: bash

fre cmor varlist \
-d /path/to/component/output \
-o generated_varlist.json

This tool examines filenames to extract variable names. It assumes FRE-style naming conventions
(e.g., ``component.YYYYMMDD.variable.nc``). Review the generated file and edit as needed to map
local variable names to target MIP variable names.
This tool examines filenames to extract variable names. It assumes FRE-style naming conventions
(e.g., ``component.YYYYMMDD.variable.nc``). Review the generated file and edit values as needed
to map modeler variable names to the correct MIP table variable names.

To verify variables exist in MIP tables, search for variable definitions:

Expand Down Expand Up @@ -113,7 +129,7 @@ For processing individual directories or debugging specific issues, use ``fre cm
Required arguments:

* ``--indir``: Directory containing netCDF files to CMORize
* ``--varlist``: JSON file mapping local variable names to target variable names
* ``--varlist``: JSON file mapping modeler variable names to MIP table variable names
* ``--table_config``: MIP table JSON file (e.g., ``CMIP6_Omon.json``)
* ``--exp_config``: Experiment configuration JSON with metadata
* ``--outdir``: Output directory root for CMORized files
Expand Down
21 changes: 14 additions & 7 deletions fre/cmor/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,12 @@ repository, and the user directly.

### Required User Configuration

- A variable list as a JSON dictionary is required, to assist with targeting the right input files for CMORization. An example
in the repository is included [here](https://github.com/NOAA-GFDL/fre-cli/blob/main/fre/tests/test_files/CMORbite_var_list.json).
- A variable list as a JSON dictionary is required, to assist with targeting the right input files for CMORization. Each entry
is a key/value pair where the **key** is the modeler's variable name (used in the filename AND expected inside the file) and
the **value** is the corresponding MIP table variable name. The key and value are often the same, but may differ when the
modeler uses a different name than the MIP standard. The variable name in the filename **must** match the variable name
inside the file -- if they differ, `fre cmor run` will raise an error. An example in the repository is included
[here](https://github.com/NOAA-GFDL/fre-cli/blob/main/fre/tests/test_files/CMORbite_var_list.json).
Additionally, see `fre cmor varlist` and the `--opt_var_name` flag for more information.

- An experiment configuration file as a JSON dictionary, an example provided by PCMDI is included in the repository
Expand Down Expand Up @@ -103,7 +107,8 @@ effectively contain this exact example, which is run automatically in as a unit-
Here, `fre cmor run` will process one file before exiting (`--run_one`), use the input gridding information metadata provided by the
`--grid_label`, `--grid_desc`, and `--nom_res` arguments. `--table_config` is pointing to a specific external configuration table known
as a MIP table, while `--exp_config` will contain the requisite information on output directory structure, calendar, and more. `--varlist`
specifies which files in `--indir` will be processed. The output directory structure's final location will be at `--outdir`.
specifies a JSON dictionary mapping modeler variable names (keys, used for filename targeting) to MIP table variable names (values, used
for CMIP metadata lookups). The output directory structure's final location will be at `--outdir`.


### `fre cmor yaml`
Expand Down Expand Up @@ -154,8 +159,9 @@ for `sos` within will be printed to screen by this call.

### `fre cmor varlist`

Generate a variable list of NetCDF files in a target directory. Only works if the targeted files have names containing the
variable in the right spot. Each entry in the output list should be unique.
Generate a variable list from NetCDF files in a target directory. Only works if the targeted files have names containing the
variable in the right spot. Each entry in the output list is a key/value pair mapping the modeler's variable name to itself
by default (e.g., `"sos": "sos"`). Edit the values as needed to map to the correct MIP table variable names.


#### Example and Description
Expand All @@ -165,8 +171,9 @@ fre cmor varlist --dir_targ fre/tests/test_files/ocean_sos_var_file/ \
cat simple_varlist.txt # shows the result
```

Here, `simple_varlist.txt` will be a simple JSON file, containing a dictionary with the variable(s) `sos` and `sosV2` listed.
Note that `sosV2` is made-up variable for software testing purposes only.
Here, `simple_varlist.txt` will be a simple JSON file, containing a dictionary with the variable(s) `sos` and `sosV2` listed,
each mapping to itself (e.g., `"sos": "sos"`). Review and edit the values to map to the correct MIP table variable names.
Note that `sosV2` is a made-up variable for software testing purposes only.

Optionally, pass `--mip_table` with a path to a MIP table JSON file to filter the generated variable list so that only variables
present in the MIP table are included.
Expand Down
2 changes: 2 additions & 0 deletions fre/cmor/cmor_finder.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,8 @@ def print_var_content(table_config_file: IO[str],
table_name = proj_table_vars["Header"].get('table_id').split(' ')[1]
except KeyError:
fre_logger.warning("couldn't get header and table_name field")
except IndexError:
fre_logger.warning("couldn't get header and table_name, probably not a variable table")

if table_name is not None:
fre_logger.info('looking for %s data in table %s!', var_name, table_name)
Expand Down
49 changes: 25 additions & 24 deletions fre/cmor/cmor_mixer.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,11 +64,11 @@ def rewrite_netcdf_file_var( mip_var_cfgs: dict = None,

:param mip_var_cfgs: Variable table, as loaded from the MIP table JSON config.
:type mip_var_cfgs: dict
:param local_var: Variable name used for finding files locally.
:param local_var: Modeler's variable name, used for finding files and reading data from them.
:type local_var: str
:param netcdf_file: Path to the input NetCDF file to be CMORized.
:type netcdf_file: str
:param target_var: Name of the variable to be processed.
:param target_var: MIP table variable name for metadata lookups.
:type target_var: str
:param json_exp_config: Path to experiment configuration JSON file (for dataset metadata).
:type json_exp_config: str
Expand All @@ -86,16 +86,16 @@ def rewrite_netcdf_file_var( mip_var_cfgs: dict = None,
ocean grids.
"""
fre_logger.info("input data:")
fre_logger.info(" local_var = %s", local_var)
fre_logger.info(" target_var = %s", target_var)
fre_logger.info(" local_var = %s (modeler variable name, in filename and file)", local_var)
fre_logger.info(" target_var = %s (MIP table variable name)", target_var)

# open the input file
fre_logger.info("opening %s", netcdf_file)
ds = nc.Dataset(netcdf_file, 'r+')

# read the input variable data
fre_logger.info('attempting to read variable data, %s', target_var)
var = from_dis_gimme_dis(from_dis=ds, gimme_dis=target_var)
# read the input variable data using the modeler's variable name (local_var)
fre_logger.info('attempting to read variable data, %s', local_var)
var = from_dis_gimme_dis(from_dis=ds, gimme_dis=local_var)

## var type
#var_dtype = var.dtype
Expand Down Expand Up @@ -141,12 +141,14 @@ def rewrite_netcdf_file_var( mip_var_cfgs: dict = None,
var_brand = filter_brands(
brands, target_var, mip_var_cfgs,
has_time_bnds = 'time_bnds' in ds.variables,
input_vert_dim = get_vertical_dimension(ds, target_var)
input_vert_dim = get_vertical_dimension(ds, local_var)
)

else:
fre_logger.error('cmip7 case detected, but dimensions of input data do not match '
'any of those found for the associated brands.')
raise ValueError
raise ValueError('no variable brand was able to be identified for this CMIP7 case')
fre_logger.debug('cmip7 case, filtered possible brands to %s', var_brand)
else:
fre_logger.debug('non-cmip7 case detected, skipping variable brands')

Expand Down Expand Up @@ -215,8 +217,8 @@ def rewrite_netcdf_file_var( mip_var_cfgs: dict = None,
time_bnds = from_dis_gimme_dis(from_dis=ds, gimme_dis='time_bnds')

# determine the vertical dimension by looping over netcdf variables
vert_dim = get_vertical_dimension(ds, target_var) # returns int(0) if not present
fre_logger.info("Vertical dimension of %s: %s", target_var, vert_dim)
vert_dim = get_vertical_dimension(ds, local_var) # returns int(0) if not present
fre_logger.info("Vertical dimension of %s: %s", local_var, vert_dim)

# Check var_dim and vert_dim and assign lev if relevant.
lev, lev_units = None, "1"
Expand Down Expand Up @@ -534,7 +536,7 @@ def rewrite_netcdf_file_var( mip_var_cfgs: dict = None,

elif vert_dim in ALT_HYBRID_SIGMA_COORDS:
# find the ps file nearby
ps_file = netcdf_file.replace(f'.{target_var}.nc', '.ps.nc')
ps_file = netcdf_file.replace(f'.{local_var}.nc', '.ps.nc')
ds_ps = nc.Dataset(ps_file)
ps = from_dis_gimme_dis(ds_ps, 'ps')

Expand Down Expand Up @@ -694,9 +696,9 @@ def cmorize_target_var_files(indir: str = None,

:param indir: Path to the directory containing NetCDF files to process.
:type indir: str
:param target_var: Name of the variable to process in each file.
:param target_var: MIP table variable name for metadata lookups.
:type target_var: str
:param local_var: Local/filename variable name (often identical to target_var).
:param local_var: Modeler's variable name, used for file-targeting and reading data from files.
:type local_var: str
:param iso_datetime_range_arr: List of ISO datetime strings, each identifying a specific file.
:type iso_datetime_range_arr: list of str
Expand All @@ -721,9 +723,8 @@ def cmorize_target_var_files(indir: str = None,
.. note:: Copies files to a temporary directory, runs CMORization, moves results to output, cleans up temp files.
"""

fre_logger.info("local_var = %s to be used for file-targeting.\n"
"target_var = %s to be used for reading the data \n"
"from the file\n"
fre_logger.info("local_var = %s to be used for file-targeting and reading data.\n"
"target_var = %s to be used for MIP table lookups.\n"
"outdir = %s", local_var, target_var, outdir)

# determine a tmp dir for working on files.
Expand Down Expand Up @@ -851,7 +852,7 @@ def cmorize_all_variables_in_dir(vars_to_run: Dict[str, Any],
"""
CMORize all variables in a directory according to a variable mapping.

:param vars_to_run: Mapping of local variable names (in filenames) to target variable names (in NetCDF).
:param vars_to_run: Mapping of modeler variable names to MIP table variable names.
:type vars_to_run: dict
:param indir: Directory containing NetCDF files to process.
:type indir: str
Expand All @@ -875,15 +876,15 @@ def cmorize_all_variables_in_dir(vars_to_run: Dict[str, Any],
.. note:: Errors for individual variables are logged and processing continues (except for run_one_mode).
"""

# loop over local-variable:target-variable pairs in vars_to_run
# loop over modeler-variable:mip-variable pairs in vars_to_run
return_status = -1
for local_var in vars_to_run:
# if the target-variable is "good", get the name of the data inside the netcdf file.
target_var = vars_to_run[local_var] # often equiv to local_var but not necessarily.
if local_var != target_var:
fre_logger.warning('local_var == %s != %s == target_var\n'
'i am expecting %s to be in the filename, and i expect the variable\n'
'in that file to be named %s', local_var, target_var, local_var, target_var)
fre_logger.info('local_var == %s != %s == target_var\n'
'modeler variable name differs from MIP table variable name.\n'
'i am expecting %s in both the filename and the file, and will map it\n'
'to MIP table variable %s', local_var, target_var, local_var, target_var)

fre_logger.info('........beginning CMORization for %s/%s..........', local_var, target_var)
try:
Expand Down Expand Up @@ -923,7 +924,7 @@ def cmor_run_subtool(indir: str = None,

:param indir: Directory containing NetCDF files to process.
:type indir: str
:param json_var_list: Path to JSON file with variable mapping (local to target names).
:param json_var_list: Path to JSON file with variable mapping (modeler names to MIP table names).
:type json_var_list: str
:param json_table_config: Path to MIP table JSON file (per-variable metadata).
:type json_table_config: str
Expand Down
12 changes: 7 additions & 5 deletions fre/cmor/frecmor.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,10 @@
"matching that variable name. I.e., this string help target local_vars, not " + \
"target_vars."
VARLIST_HELP="path pointing to a json file containing directory of key/value pairs. " + \
"the keys are the \'local\' names used in the filename, and the values " + \
"pointed to by those keys are strings representing the name of the variable " + \
"contained in targeted files. the key and value are often the same, " + \
"the keys are the modeler\'s variable names used in the filename and " + \
"expected as the variable name within the targeted files. the values " + \
"pointed to by those keys are strings representing the corresponding " + \
"MIP table variable name. the key and value are often the same, " + \
"but it is not required."
RUN_ONE_HELP="process only one file, then exit. mostly for debugging and isolating issues."
DRY_RUN_HELP="don't call the cmor_mixer subtool, just printout what would be called and move on until natural exit"
Expand Down Expand Up @@ -156,11 +157,10 @@ def find(varlist, table_config_dir, opt_var_name): #uncovered
required = False)
def run(indir, varlist, table_config, exp_config, outdir, run_one, opt_var_name,
grid_label, grid_desc, nom_res, start, stop, calendar):
# pylint: disable=unused-argument
"""
Rewrite climate model output files with CMIP-compliant metadata for down-stream publishing
"""
cmor_run_subtool(
result = cmor_run_subtool(
indir = indir,
json_var_list = varlist,
json_table_config = table_config,
Expand All @@ -175,6 +175,8 @@ def run(indir, varlist, table_config, exp_config, outdir, run_one, opt_var_name,
stop = stop,
calendar_type = calendar
)
if result != 0:
raise click.ClickException(f'cmor_run_subtool returned non-zero status: {result}')


@cmor_cli.command()
Expand Down
Loading
Loading