NOAA-GFDL · vithikashah001 · Aug 21, 2025 · Jun 23, 2025 · Jun 23, 2025 · Jul 2, 2025
@@ -16,6 +16,7 @@ The purpose of this document is to explain the diag_table yaml format.
 - [3. More examples](diag_yaml_format.md#3-more-examples)
 - [4. Schema](diag_yaml_format.md#4-schema)
 - [5. Ensemble and Nest Support](diag_yaml_format.md#5-ensemble-and-nest-support)
+- [6. Reducing Diag Table Yaml Length](diag_yaml_format.md#6-reducing-diag-table-yaml-length)
 
 ### 1. Converting from legacy ascii diag_table format
 
@@ -352,4 +353,175 @@ found in the [gfdl_msd_schemas](https://github.com/NOAA-GFDL/gfdl_msd_schemas)
 repository on Github.
 
 ### 5. Ensemble and Nest Support
-When using nests, it may be desired for a nest to have a different file frequency or number of variables from the parent grid. This may allow users to save disk space and reduce simulations time. In order to supports, FMS allows each nest to have a different diag_table.yaml from the parent grid. For example, if running with 1 test FMS will use diag_table.yaml for the parent grid and diag_table.nest_01.yaml for the first nest Similary, each ensemble member can have its own diag_table (diag_table_ens_XX.yaml, where XX is the ensemble number). However, for the ensemble case if both the diag_table.yaml and the diag_table_ens_* files are present, the code will crash as only 1 option is allowed.
+When using nests, it may be desired for a nest to have a different file frequency or number of variables from the parent grid. This may allow users to save disk space and reduce simulations time. In order to support this, FMS allows each nest to have a different diag_table.yaml from the parent grid. For example, if running with 1 nest FMS will use diag_table.yaml for the parent grid and diag_table.nest_01.yaml for the first nest. Similary, each ensemble member can have its own diag_table (diag_table_ens_XX.yaml, where XX is the ensemble number). However, for the ensemble case if both the diag_table.yaml and the diag_table_ens_* files are present, the code will crash as only 1 option is allowed.
+
+### 6. Reducing Diag Table Yaml Length
+There may be scenarios where the diag_table.yaml becomes long and contains a lot of repeated content.
+
+For example, the keys `module`, `reduction`, and `kind` often have the same values across many variables.
+
+```yaml
+title: test_none
+base_date: 2 1 1 0 0 0
+diag_files:
+- file_name: test_4xdaily
+  freq: 6 hours
+  time_units: hours
+  unlimdim: time
+  varlist:
+  - var_name: var0
+    module: ocn_mod
+    reduction: none
+    kind: r4
+  - var_name: var1
+    module: ocn_mod
+    reduction: none
+    kind: r4
+  - var_name: var2
+    module: ocn_mod
+    reduction: none
+    kind: r4
+```
+
+To reduce size and improve readability, you can **define these keys at the file level, and override them at the variable level if needed**:
+
+```yaml
+title: test_none
+base_date: 2 1 1 0 0 0
+diag_files:
+- file_name: test_4xdaily
+  freq: 6 hours
+  time_units: hours
+  unlimdim: time
+  module: ocn_mod
+  reduction: none
+  kind: r4
+  varlist:
+  - var_name: var0
+  - var_name: var1
+  - var_name: var2
+```
+
+However, there may be cases where a file contains a large number of variables from different modules, requiring duplication of the module key across multiple lines. For example:
+
+```yaml
+title: test_none
+base_date: 2 1 1 0 0 0
+diag_files:
+- file_name: test_4xdaily
+  freq: 6 hours
+  time_units: hours
+  unlimdim: time
+  module: radiation_mod
+  reduction: none
+  kind: r4
+  varlist:
+  - var_name: var0
+  - var_name: var1
+  - var_name: var2
+  - var_name: var3
+    module: some_other_mod
+  - var_name: var4
+    module: some_other_mod
+  - var_name: var5
+    module: some_other_mod
+```
+
+To address this, you can group variables by module:
+```yaml
+title: test_none
+base_date: 2 1 1 0 0 0
+diag_files:
+- file_name: test_4xdaily
+  freq: 6 hours
+  time_units: hours
+  unlimdim: time
+  reduction: none
+  kind: r4
+  modules:
+  - module: radiation_mod
+    varlist:
+    - var_name: var0
+    - var_name: var1
+    - var_name: var2
+  - module: some_other_mod
+    varlist:
+    - var_name: var3
+    - var_name: var4
+    - var_name: var5
+```
+
+Another option **to reduce its size and improve readability, is to use yaml anchors**. For example, instead of writing:
+``` yaml
+title: test_none
+base_date: 2 1 1 0 0 0
+diag_files:
+- file_name: test_4xdaily
+  freq: 6 hours
+  time_units: hours
+  unlimdim: time
+  module: ocn_mod
+  reduction: none
+  kind: r4
+  varlist:
+  - var_name: var0
+  - var_name: var1
+  - var_name: var2
+  - var_name: var3
+  - var_name: var4
+  - var_name: var3
+    output_name: var3_Z
+    zbounds: 2. 3.
+- file_name: test_daily
+  freq: 1 days
+  time_units: hours
+  unlimdim: time
+  module: ocn_mod
+  reduction: none
+  kind: r4
+  varlist:
+  - var_name: var0
+  - var_name: var1
+  - var_name: var2
+  - var_name: var3
+  - var_name: var4
+  - var_name: var3
+    output_name: var3_Z
+    zbounds: 2. 3.
+```
+
+You can define an anchor and reuse it:
+```yaml
+name: &name
+  - var_name: var0
+  - var_name: var1
+  - var_name: var2
+  - var_name: var3
+  - var_name: var4
+  - var_name: var3
+    output_name: var3_Z
+    zbounds: 2. 3.
+
+title: test_none
+base_date: 2 1 1 0 0 0
+diag_files:
+- file_name: test_4xdaily
+  freq: 6 hours
+  time_units: hours
+  unlimdim: time
+  module: ocn_mod
+  reduction: none
+  kind: r4
+  varlist: *name
+- file_name: test_daily
+  freq: 1 days
+  time_units: hours
+  unlimdim: time
+  module: ocn_mod
+  reduction: none
+  kind: r4
+  varlist:
+  - *name
+  - variable_name: var773
+```
+
@@ -390,6 +390,14 @@ subroutine diag_yaml_object_init(diag_subset_output)
   logical              :: is_instantaneous !< .True. if the file is instantaneous (i.e no averaging)
   character(len=FMS_FILE_LEN)   :: yamlfilename     !< Name of the expected diag_table.yaml
 
+  integer                                  :: nmods              !< Number of module block in a file
+  integer,                     allocatable :: mod_ids(:)         !< Ids for each module block in a file
+  integer,                     allocatable :: nvars_per_file(:)  !< Number of variables in each file
+  logical,                     allocatable :: has_module_block(:)!< True if each file is using the module block
+  character(len=FMS_FILE_LEN), allocatable :: mod_name(:)        !< Buffer to store module name
+  character(len=FMS_FILE_LEN)              :: buffer             !< Buffer to stote string variables
+  integer                                  :: istart, iend       !< Starting and ending indices of the file block
+
   if (diag_yaml_module_initialized) return
 
   ! If doing and ensemble or nest run add the filename appendix (ens_XX or nest_XX) to the filename
@@ -432,14 +440,40 @@ subroutine diag_yaml_object_init(diag_subset_output)
 
   !< Determine how many files are in the diag_yaml, ignoring those with write_file = False
   actual_num_files = 0
+  allocate(nvars_per_file(nfiles))
+  allocate(has_module_block(nfiles))
   do i = 1, nfiles
     write_file = .true.
     call get_value_from_key(diag_yaml_id, diag_file_ids(i), "write_file", write_file, is_optional=.true.)
     if(.not. write_file) ignore(i) = .true.
 
     !< If ignoring the file, ignore the fields in that file too!
     if (.not. ignore(i)) then
-        nvars = get_total_num_vars(diag_yaml_id, diag_file_ids(i))
+        ! Determine if the file has defined a module block
+        nmods = 0
+        nmods = get_num_blocks(diag_yaml_id, "modules", parent_block_id=diag_file_ids(i))
+
+        nvars_per_file(i) = get_num_blocks(diag_yaml_id, "varlist", parent_block_id=diag_file_ids(i))
+        if (nmods .ne. 0) then
+          has_module_block(i) = .true.
+          ! Get the total number of variables in each module block, ignoring those with write_var = .false.
+          if (nvars_per_file(i) .ne. 0) &
+            call mpp_error(FATAL, "diag_manager_mod:: the file:"//trim(filename)//" has a 'modules' block defined "//&
+                                  "and a 'module' key defined at the file level. This is not allowed!")
+
+          allocate(mod_ids(nmods))
+          call get_block_ids(diag_yaml_id, "modules", mod_ids, parent_block_id=diag_file_ids(i))
+
+          nvars = 0
+          do j =  1, nmods
+            nvars_per_file(i) = nvars_per_file(i) + get_num_blocks(diag_yaml_id, "varlist", parent_block_id=mod_ids(j))
+            nvars = nvars + get_total_num_vars(diag_yaml_id, mod_ids(j))
+          enddo
+          deallocate(mod_ids)
+        else
+          nvars = get_total_num_vars(diag_yaml_id, diag_file_ids(i))
+          has_module_block(i) = .false.
+        endif
         total_nvars = total_nvars + nvars
         if (nvars .ne. 0) then
           actual_num_files = actual_num_files + 1
@@ -472,16 +506,41 @@ subroutine diag_yaml_object_init(diag_subset_output)
     file_list%file_name(file_count) = trim(diag_yaml%diag_files(file_count)%file_fname)//c_null_char
     file_list%diag_file_indices(file_count) = file_count
 
-    nvars = 0
-    nvars = get_num_blocks(diag_yaml_id, "varlist", parent_block_id=diag_file_ids(i))
-    allocate(var_ids(nvars))
-    call get_block_ids(diag_yaml_id, "varlist", var_ids, parent_block_id=diag_file_ids(i))
+    allocate(var_ids(nvars_per_file(i)))
+    allocate(mod_name(nvars_per_file(i)))
+    if (has_module_block(i)) then
+      nmods = get_num_blocks(diag_yaml_id, "modules", parent_block_id=diag_file_ids(i))
+      allocate(mod_ids(nmods))
+      call get_block_ids(diag_yaml_id, "modules", mod_ids, parent_block_id=diag_file_ids(i))
+
+      istart = 1
+      nvars_per_file(i) = 0
+      do j = 1, nmods
+        iend = istart + get_num_blocks(diag_yaml_id, "varlist", parent_block_id=mod_ids(j)) - 1
+        call get_block_ids(diag_yaml_id, "varlist", var_ids(istart:iend), parent_block_id=mod_ids(j))
+
+        ! Update nvars_per_file to only include those are actually being written
+        nvars_per_file(i) = nvars_per_file(i) + get_total_num_vars(diag_yaml_id, mod_ids(j))
+
+        call get_value_from_key(diag_yaml_id, mod_ids(j), "module", buffer)
+        mod_name(istart:iend) = trim(buffer)
+
+        istart  = iend + 1
+      enddo
+
+      deallocate(mod_ids)
+    else
+      call get_block_ids(diag_yaml_id, "varlist", var_ids, parent_block_id=diag_file_ids(i))
+      nvars_per_file(i) = get_total_num_vars(diag_yaml_id, diag_file_ids(i))
+    endif
+
     file_var_count = 0
-    allocate(diag_yaml%diag_files(file_count)%file_varlist(get_total_num_vars(diag_yaml_id, diag_file_ids(i))))
-    allocate(diag_yaml%diag_files(file_count)%file_outlist(get_total_num_vars(diag_yaml_id, diag_file_ids(i))))
+    allocate(diag_yaml%diag_files(file_count)%file_varlist(nvars_per_file(i)))
+    allocate(diag_yaml%diag_files(file_count)%file_outlist(nvars_per_file(i)))
+
     allow_averages = .not. diag_yaml%diag_files(file_count)%file_freq(1) < 1
     is_instantaneous = .false.
-    nvars_loop: do j = 1, nvars
+    nvars_loop: do j = 1, nvars_per_file(i)
       write_var = .true.
       call get_value_from_key(diag_yaml_id, var_ids(j), "write_var", write_var, is_optional=.true.)
       if (.not. write_var) cycle
@@ -497,7 +556,7 @@ subroutine diag_yaml_object_init(diag_subset_output)
       diag_yaml%diag_fields(var_count)%var_file_is_subregional = diag_yaml%diag_files(file_count)%has_file_sub_region()
 
       call fill_in_diag_fields(diag_yaml_id, diag_yaml%diag_files(file_count), var_ids(j), &
-        diag_yaml%diag_fields(var_count), allow_averages)
+        diag_yaml%diag_fields(var_count), allow_averages, has_module_block(i), mod_name(j))
 
       !> Save the variable name in the diag_file type
       diag_yaml%diag_files(file_count)%file_varlist(file_var_count) = diag_yaml%diag_fields(var_count)%var_varname
@@ -515,6 +574,7 @@ subroutine diag_yaml_object_init(diag_subset_output)
       variable_list%diag_field_indices(var_count) = var_count
     enddo nvars_loop
     deallocate(var_ids)
+    deallocate(mod_name)
   enddo nfiles_loop
 
   !> Sort the file list in alphabetical order
@@ -650,12 +710,15 @@ subroutine fill_in_diag_files(diag_yaml_id, diag_file_id, yaml_fileobj)
 
 !> @brief Fills in a diagYamlFilesVar_type with the contents of a variable block in
 !! diag_table.yaml
-subroutine fill_in_diag_fields(diag_file_id, yaml_fileobj, var_id, field, allow_averages)
+subroutine fill_in_diag_fields(diag_file_id, yaml_fileobj, var_id, field, allow_averages, &
+                               has_module_block, mod_name)
   integer,                        intent(in)  :: diag_file_id !< Id of the file block in the yaml file
   type(diagYamlFiles_type),       intent(in)  :: yaml_fileobj !< The yaml file obj for the variables
   integer,                        intent(in)  :: var_id       !< Id of the variable block in the yaml file
   type(diagYamlFilesVar_type), intent(inout)  :: field        !< diagYamlFilesVar_type obj to read the contents into
   logical,                        intent(in)  :: allow_averages !< .True. if averages are allowed for this file
+  logical,                        intent(in)  :: has_module_block
+  character(len=*),               intent(in)  :: mod_name
 
   integer :: natt          !< Number of attributes in variable
   integer :: var_att_id(1) !< Id of the variable attribute block
@@ -685,13 +748,17 @@ subroutine fill_in_diag_fields(diag_file_id, yaml_fileobj, var_id, field, allow_
         "Check your diag_table.yaml for the field:"//trim(field%var_varname))
   endif
 
-  if (yaml_fileobj%default_var_module .eq. "") then
+  if (yaml_fileobj%default_var_module .eq. "" .and. .not. has_module_block) then
     call diag_get_value_from_key(diag_file_id, var_id, "module", field%var_module)
-  else
+ else
     call diag_get_value_from_key(diag_file_id, var_id, "module", buffer, is_optional=.true.)
     !! If the module was set for the variable, override it with the default
     if (trim(buffer) .eq. "") then
-      field%var_module = yaml_fileobj%default_var_module
+      if (has_module_block) then
+        field%var_module = trim(mod_name)
+      else
+        field%var_module = yaml_fileobj%default_var_module
+      endif
     else
       field%var_module = trim(buffer)
     endif
@@ -1742,7 +1809,7 @@ subroutine fms_diag_yaml_out(ntimes, ntiles, ndistributedfiles)
 
     !! This is the number of distributed files
     !! If the diag files were not combined, the name of the diag file is going to be
-    !! filename_tileXX.nc.YY, where YY is the distributed file number 
+    !! filename_tileXX.nc.YY, where YY is the distributed file number
     !! (1 to the number of distributed files)
     call fms_f2c_string(keys2(i)%key13, 'number_of_distributed_files')
 

@@ -66,6 +66,8 @@ module yaml_parser_mod
 integer, parameter :: MISSING_FILE = -1       !< Error code if the yaml file is missing
 integer, parameter :: PARSER_INIT_ERROR = -2  !< Error code if unable to create a parser object
 integer, parameter :: INVALID_YAML = -3       !< Error code if unable to parse a yaml file
+integer, parameter :: INVALID_ALIAS = -4      !< Error code if an invalid alias was passed in
+integer, parameter :: MAX_LEVELS_REACH = -5   !< Error code if the MAX_LEVELS is reach
 integer, parameter :: SUCCESSFUL = 1          !< "Error" code if the parsing was successful
 
 !> @brief c functions binding
@@ -279,6 +281,12 @@ subroutine check_error_code(error_code, filename)
       call mpp_error(FATAL, "Error initializing the parser for the file:"//trim(filename))
    case (INVALID_YAML)
       call mpp_error(FATAL, "Error parsing the file:"//trim(filename)//". Check that your yaml file is valid")
+   case (INVALID_ALIAS)
+      call mpp_error(FATAL, "An alias (*alias_name) in your file:"//trim(filename)//" is invalid."//&
+                            "Make sure that all aliases correspond to an anchor (&anchor_name)!")
+   case (MAX_LEVELS_REACH)
+      call mpp_error(FATAL, "The file:"//trim(filename)//" has reached the maximum number of level!"//&
+                            "Try setting -DMAX_LEVELS to a number greater than the current limit.")
    end select
 end subroutine check_error_code