Skip to content
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
174 changes: 173 additions & 1 deletion diag_manager/diag_yaml_format.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ The purpose of this document is to explain the diag_table yaml format.
- [3. More examples](diag_yaml_format.md#3-more-examples)
- [4. Schema](diag_yaml_format.md#4-schema)
- [5. Ensemble and Nest Support](diag_yaml_format.md#5-ensemble-and-nest-support)
- [6. Reducing Diag Table Yaml Length](diag_yaml_format.md#6-reducing-diag-table-yaml-length)

### 1. Converting from legacy ascii diag_table format

Expand Down Expand Up @@ -352,4 +353,175 @@ found in the [gfdl_msd_schemas](https://github.com/NOAA-GFDL/gfdl_msd_schemas)
repository on Github.

### 5. Ensemble and Nest Support
When using nests, it may be desired for a nest to have a different file frequency or number of variables from the parent grid. This may allow users to save disk space and reduce simulations time. In order to supports, FMS allows each nest to have a different diag_table.yaml from the parent grid. For example, if running with 1 test FMS will use diag_table.yaml for the parent grid and diag_table.nest_01.yaml for the first nest Similary, each ensemble member can have its own diag_table (diag_table_ens_XX.yaml, where XX is the ensemble number). However, for the ensemble case if both the diag_table.yaml and the diag_table_ens_* files are present, the code will crash as only 1 option is allowed.
When using nests, it may be desired for a nest to have a different file frequency or number of variables from the parent grid. This may allow users to save disk space and reduce simulations time. In order to support this, FMS allows each nest to have a different diag_table.yaml from the parent grid. For example, if running with 1 nest FMS will use diag_table.yaml for the parent grid and diag_table.nest_01.yaml for the first nest. Similary, each ensemble member can have its own diag_table (diag_table_ens_XX.yaml, where XX is the ensemble number). However, for the ensemble case if both the diag_table.yaml and the diag_table_ens_* files are present, the code will crash as only 1 option is allowed.

### 6. Reducing Diag Table Yaml Length
There may be scenarios where the diag_table.yaml becomes long and contains a lot of repeated content.

For example, the keys `module`, `reduction`, and `kind` often have the same values across many variables.

```yaml
title: test_none
base_date: 2 1 1 0 0 0
diag_files:
- file_name: test_4xdaily
freq: 6 hours
time_units: hours
unlimdim: time
varlist:
- var_name: var0
module: ocn_mod
reduction: none
kind: r4
- var_name: var1
module: ocn_mod
reduction: none
kind: r4
- var_name: var2
module: ocn_mod
reduction: none
kind: r4
```

To reduce size and improve readability, you can **define these keys at the file level, and override them at the variable level if needed**:

```yaml
title: test_none
base_date: 2 1 1 0 0 0
diag_files:
- file_name: test_4xdaily
freq: 6 hours
time_units: hours
unlimdim: time
module: ocn_mod
reduction: none
kind: r4
varlist:
- var_name: var0
- var_name: var1
- var_name: var2
```

However, there may be cases where a file contains a large number of variables from different modules, requiring duplication of the module key across multiple lines. For example:

```yaml
title: test_none
base_date: 2 1 1 0 0 0
diag_files:
- file_name: test_4xdaily
freq: 6 hours
time_units: hours
unlimdim: time
module: radiation_mod
reduction: none
kind: r4
varlist:
- var_name: var0
- var_name: var1
- var_name: var2
- var_name: var3
module: some_other_mod
- var_name: var4
module: some_other_mod
- var_name: var5
module: some_other_mod
```

To address this, you can group variables by module:
```yaml
title: test_none
base_date: 2 1 1 0 0 0
diag_files:
- file_name: test_4xdaily
freq: 6 hours
time_units: hours
unlimdim: time
reduction: none
kind: r4
modules:
- module: radiation_mod
varlist:
- var_name: var0
- var_name: var1
- var_name: var2
- module: some_other_mod
varlist:
- var_name: var3
- var_name: var4
- var_name: var5
```

Another option **to reduce its size and improve readability, is to use yaml anchors**. For example, instead of writing:
``` yaml
title: test_none
base_date: 2 1 1 0 0 0
diag_files:
- file_name: test_4xdaily
freq: 6 hours
time_units: hours
unlimdim: time
module: ocn_mod
reduction: none
kind: r4
varlist:
- var_name: var0
- var_name: var1
- var_name: var2
- var_name: var3
- var_name: var4
- var_name: var3
output_name: var3_Z
zbounds: 2. 3.
- file_name: test_daily
freq: 1 days
time_units: hours
unlimdim: time
module: ocn_mod
reduction: none
kind: r4
varlist:
- var_name: var0
- var_name: var1
- var_name: var2
- var_name: var3
- var_name: var4
- var_name: var3
output_name: var3_Z
zbounds: 2. 3.
```

You can define an anchor and reuse it:
```yaml
name: &name
- var_name: var0
- var_name: var1
- var_name: var2
- var_name: var3
- var_name: var4
- var_name: var3
output_name: var3_Z
zbounds: 2. 3.

title: test_none
base_date: 2 1 1 0 0 0
diag_files:
- file_name: test_4xdaily
freq: 6 hours
time_units: hours
unlimdim: time
module: ocn_mod
reduction: none
kind: r4
varlist: *name
- file_name: test_daily
freq: 1 days
time_units: hours
unlimdim: time
module: ocn_mod
reduction: none
kind: r4
varlist:
- *name
- variable_name: var773
```

95 changes: 81 additions & 14 deletions diag_manager/fms_diag_yaml.F90
Original file line number Diff line number Diff line change
Expand Up @@ -390,6 +390,14 @@ subroutine diag_yaml_object_init(diag_subset_output)
logical :: is_instantaneous !< .True. if the file is instantaneous (i.e no averaging)
character(len=FMS_FILE_LEN) :: yamlfilename !< Name of the expected diag_table.yaml

integer :: nmods !< Number of module block in a file
integer, allocatable :: mod_ids(:) !< Ids for each module block in a file
integer, allocatable :: nvars_per_file(:) !< Number of variables in each file
logical, allocatable :: has_module_block(:)!< True if each file is using the module block
character(len=FMS_FILE_LEN), allocatable :: mod_name(:) !< Buffer to store module name
character(len=FMS_FILE_LEN) :: buffer !< Buffer to stote string variables
integer :: istart, iend !< Starting and ending indices of the file block

if (diag_yaml_module_initialized) return

! If doing and ensemble or nest run add the filename appendix (ens_XX or nest_XX) to the filename
Expand Down Expand Up @@ -432,14 +440,40 @@ subroutine diag_yaml_object_init(diag_subset_output)

!< Determine how many files are in the diag_yaml, ignoring those with write_file = False
actual_num_files = 0
allocate(nvars_per_file(nfiles))
allocate(has_module_block(nfiles))
do i = 1, nfiles
write_file = .true.
call get_value_from_key(diag_yaml_id, diag_file_ids(i), "write_file", write_file, is_optional=.true.)
if(.not. write_file) ignore(i) = .true.

!< If ignoring the file, ignore the fields in that file too!
if (.not. ignore(i)) then
nvars = get_total_num_vars(diag_yaml_id, diag_file_ids(i))
! Determine if the file has defined a module block
nmods = 0
nmods = get_num_blocks(diag_yaml_id, "modules", parent_block_id=diag_file_ids(i))

nvars_per_file(i) = get_num_blocks(diag_yaml_id, "varlist", parent_block_id=diag_file_ids(i))
if (nmods .ne. 0) then
has_module_block(i) = .true.
! Get the total number of variables in each module block, ignoring those with write_var = .false.
if (nvars_per_file(i) .ne. 0) &
call mpp_error(FATAL, "diag_manager_mod:: the file:"//trim(filename)//" has a 'modules' block defined "//&
"and a 'module' key defined at the file level. This is not allowed!")

allocate(mod_ids(nmods))
call get_block_ids(diag_yaml_id, "modules", mod_ids, parent_block_id=diag_file_ids(i))

nvars = 0
do j = 1, nmods
nvars_per_file(i) = nvars_per_file(i) + get_num_blocks(diag_yaml_id, "varlist", parent_block_id=mod_ids(j))
nvars = nvars + get_total_num_vars(diag_yaml_id, mod_ids(j))
enddo
deallocate(mod_ids)
else
nvars = get_total_num_vars(diag_yaml_id, diag_file_ids(i))
has_module_block(i) = .false.
endif
total_nvars = total_nvars + nvars
if (nvars .ne. 0) then
actual_num_files = actual_num_files + 1
Expand Down Expand Up @@ -472,16 +506,41 @@ subroutine diag_yaml_object_init(diag_subset_output)
file_list%file_name(file_count) = trim(diag_yaml%diag_files(file_count)%file_fname)//c_null_char
file_list%diag_file_indices(file_count) = file_count

nvars = 0
nvars = get_num_blocks(diag_yaml_id, "varlist", parent_block_id=diag_file_ids(i))
allocate(var_ids(nvars))
call get_block_ids(diag_yaml_id, "varlist", var_ids, parent_block_id=diag_file_ids(i))
allocate(var_ids(nvars_per_file(i)))
allocate(mod_name(nvars_per_file(i)))
if (has_module_block(i)) then
nmods = get_num_blocks(diag_yaml_id, "modules", parent_block_id=diag_file_ids(i))
allocate(mod_ids(nmods))
call get_block_ids(diag_yaml_id, "modules", mod_ids, parent_block_id=diag_file_ids(i))

istart = 1
nvars_per_file(i) = 0
do j = 1, nmods
iend = istart + get_num_blocks(diag_yaml_id, "varlist", parent_block_id=mod_ids(j)) - 1
call get_block_ids(diag_yaml_id, "varlist", var_ids(istart:iend), parent_block_id=mod_ids(j))

! Update nvars_per_file to only include those are actually being written
nvars_per_file(i) = nvars_per_file(i) + get_total_num_vars(diag_yaml_id, mod_ids(j))

call get_value_from_key(diag_yaml_id, mod_ids(j), "module", buffer)
mod_name(istart:iend) = trim(buffer)

istart = iend + 1
enddo

deallocate(mod_ids)
else
call get_block_ids(diag_yaml_id, "varlist", var_ids, parent_block_id=diag_file_ids(i))
nvars_per_file(i) = get_total_num_vars(diag_yaml_id, diag_file_ids(i))
endif

file_var_count = 0
allocate(diag_yaml%diag_files(file_count)%file_varlist(get_total_num_vars(diag_yaml_id, diag_file_ids(i))))
allocate(diag_yaml%diag_files(file_count)%file_outlist(get_total_num_vars(diag_yaml_id, diag_file_ids(i))))
allocate(diag_yaml%diag_files(file_count)%file_varlist(nvars_per_file(i)))
allocate(diag_yaml%diag_files(file_count)%file_outlist(nvars_per_file(i)))

allow_averages = .not. diag_yaml%diag_files(file_count)%file_freq(1) < 1
is_instantaneous = .false.
nvars_loop: do j = 1, nvars
nvars_loop: do j = 1, nvars_per_file(i)
write_var = .true.
call get_value_from_key(diag_yaml_id, var_ids(j), "write_var", write_var, is_optional=.true.)
if (.not. write_var) cycle
Expand All @@ -497,7 +556,7 @@ subroutine diag_yaml_object_init(diag_subset_output)
diag_yaml%diag_fields(var_count)%var_file_is_subregional = diag_yaml%diag_files(file_count)%has_file_sub_region()

call fill_in_diag_fields(diag_yaml_id, diag_yaml%diag_files(file_count), var_ids(j), &
diag_yaml%diag_fields(var_count), allow_averages)
diag_yaml%diag_fields(var_count), allow_averages, has_module_block(i), mod_name(j))

!> Save the variable name in the diag_file type
diag_yaml%diag_files(file_count)%file_varlist(file_var_count) = diag_yaml%diag_fields(var_count)%var_varname
Expand All @@ -515,6 +574,7 @@ subroutine diag_yaml_object_init(diag_subset_output)
variable_list%diag_field_indices(var_count) = var_count
enddo nvars_loop
deallocate(var_ids)
deallocate(mod_name)
enddo nfiles_loop

!> Sort the file list in alphabetical order
Expand Down Expand Up @@ -650,12 +710,15 @@ subroutine fill_in_diag_files(diag_yaml_id, diag_file_id, yaml_fileobj)

!> @brief Fills in a diagYamlFilesVar_type with the contents of a variable block in
!! diag_table.yaml
subroutine fill_in_diag_fields(diag_file_id, yaml_fileobj, var_id, field, allow_averages)
subroutine fill_in_diag_fields(diag_file_id, yaml_fileobj, var_id, field, allow_averages, &
has_module_block, mod_name)
integer, intent(in) :: diag_file_id !< Id of the file block in the yaml file
type(diagYamlFiles_type), intent(in) :: yaml_fileobj !< The yaml file obj for the variables
integer, intent(in) :: var_id !< Id of the variable block in the yaml file
type(diagYamlFilesVar_type), intent(inout) :: field !< diagYamlFilesVar_type obj to read the contents into
logical, intent(in) :: allow_averages !< .True. if averages are allowed for this file
logical, intent(in) :: has_module_block
character(len=*), intent(in) :: mod_name

integer :: natt !< Number of attributes in variable
integer :: var_att_id(1) !< Id of the variable attribute block
Expand Down Expand Up @@ -685,13 +748,17 @@ subroutine fill_in_diag_fields(diag_file_id, yaml_fileobj, var_id, field, allow_
"Check your diag_table.yaml for the field:"//trim(field%var_varname))
endif

if (yaml_fileobj%default_var_module .eq. "") then
if (yaml_fileobj%default_var_module .eq. "" .and. .not. has_module_block) then
call diag_get_value_from_key(diag_file_id, var_id, "module", field%var_module)
else
else
call diag_get_value_from_key(diag_file_id, var_id, "module", buffer, is_optional=.true.)
!! If the module was set for the variable, override it with the default
if (trim(buffer) .eq. "") then
field%var_module = yaml_fileobj%default_var_module
if (has_module_block) then
field%var_module = trim(mod_name)
else
field%var_module = yaml_fileobj%default_var_module
endif
else
field%var_module = trim(buffer)
endif
Expand Down Expand Up @@ -1742,7 +1809,7 @@ subroutine fms_diag_yaml_out(ntimes, ntiles, ndistributedfiles)

!! This is the number of distributed files
!! If the diag files were not combined, the name of the diag file is going to be
!! filename_tileXX.nc.YY, where YY is the distributed file number
!! filename_tileXX.nc.YY, where YY is the distributed file number
!! (1 to the number of distributed files)
call fms_f2c_string(keys2(i)%key13, 'number_of_distributed_files')

Expand Down
8 changes: 8 additions & 0 deletions parser/yaml_parser.F90
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,8 @@ module yaml_parser_mod
integer, parameter :: MISSING_FILE = -1 !< Error code if the yaml file is missing
integer, parameter :: PARSER_INIT_ERROR = -2 !< Error code if unable to create a parser object
integer, parameter :: INVALID_YAML = -3 !< Error code if unable to parse a yaml file
integer, parameter :: INVALID_ALIAS = -4 !< Error code if an invalid alias was passed in
integer, parameter :: MAX_LEVELS_REACH = -5 !< Error code if the MAX_LEVELS is reach
integer, parameter :: SUCCESSFUL = 1 !< "Error" code if the parsing was successful

!> @brief c functions binding
Expand Down Expand Up @@ -279,6 +281,12 @@ subroutine check_error_code(error_code, filename)
call mpp_error(FATAL, "Error initializing the parser for the file:"//trim(filename))
case (INVALID_YAML)
call mpp_error(FATAL, "Error parsing the file:"//trim(filename)//". Check that your yaml file is valid")
case (INVALID_ALIAS)
call mpp_error(FATAL, "An alias (*alias_name) in your file:"//trim(filename)//" is invalid."//&
"Make sure that all aliases correspond to an anchor (&anchor_name)!")
case (MAX_LEVELS_REACH)
call mpp_error(FATAL, "The file:"//trim(filename)//" has reached the maximum number of level!"//&
"Try setting -DMAX_LEVELS to a number greater than the current limit.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defining the macro requires a recompile and the message should say that. Is there a way to make MAX_LEVELS a runtime option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I’ve updated the error message.

Yes, making MAX_LEVELS a runtime option would be cleaner.

I didn’t pursue it yet since none of our current YAMLs come close to that limit, but I've open an issue to work on that in a future release: #1755

end select
end subroutine check_error_code

Expand Down
Loading
Loading