Skip to content

areacell naming #111

@taylor13

Description

@taylor13

The long-planned approach (initiated more than 3 years ago) to making the variable register for CMIP7 serve multiple projects without undue, duplicative inflation of the register's contents requires that it be independent of reporting interval ("frequency"), region, and grid. The plan, therefore, has been to remove frequency, region, and grid-specific information from the variable definitions in the register. This has already been implemented for frequency and region, and this should be done for "grid" by removing the distinction between the variable names "areacella" and "areacello", replacing these simply with "areacell".

Although the distinction in the names "areacella" and "areacello" superficially seem to be useful in determining which grid a variable should be reported on, it is not true. Even for a single model, a variable may be reported on multiple grids (e.g., on a very high resolution native grid of complex description and on a more manageable, lower resolution regular latitude x longitude grid). So knowing the name of the areacell variable in CMIP6 (whether it be areacella or areacello) did not provide a user with the definitive information needed to obtain the area cells they should use to analyze CMIP variables of interest.

For CMIP7 it became clear that we should register each grid used to report model output and assign it a unique grid label. That plan is being implemented. Some of the more recent discussion of this (initiated in May of this year) can be found here: WCRP-CMIP/CMIP7-CVs#202 . This grid label will appear as part of the filename uniquely identifying data and it will also be record as a global attribute in every file. This means that when users download a variable that they want to analyze, they can definitively learn (from either the filename or from the grid_label global attribute) which of the areacell variables in the archive they should obtain in conjunction with the file of interest. There is absolutely no need to assign different names for the cell areas that might usually be associated with an atmospheric grid or usually associated with an ocean grid. (The same grid may, in fact, be used to report both atmospheric and ocean variables.)

I have spent countless hours trying to come up with a use-case where it would be important to filter datasets based on the name of areacell (i.e., whether it was areacella or areacello). I have been unsuccessful, and no one has provided a use-case requiring the use of different names.

So, I think there should be no reason to continue to give different names to the areacell variable based on whether it might happen to be used to report atmospheric variables or ocean variables . (When both atmospheric and ocean variables are reported on a common grid, then we face a dilemma. Should we assign the name "areacella" or "areacello"? In fact we would have to report the same cell areas twice but using two different names; otherwise the cell_measures would point to a non-existent variable.)

Somewhat tangentially, It should be noted that the current plan for the data request is to include a grid directive, which would indicate to modeling groups what grid a variable is requested on. The options will include: the native atmospheric grid, the native ocean grid, a preferred atmospheric grid (i.e., a grid that a modeling group chooses to regrid its data to) , and a preferred ocean grid (similar to the atmospheric case). In the latest release of the data request a modeling group must infer this information from the cell_measures and by reading the "processing notes". Inclusion of explicit "requested grid" information will ensure that modeling groups are aware of which grid they should report their data on. (The "requested grid" is, of course, simply a "request" and not a requirement.)

Apparently, objections were raised just a few days ago to the above well-reasoned plan by one or two members of the Data Request Task Team (no one has publicly raised any objections). I'm pretty sure their objections arose from an unawareness of how grids were to be uniquely identified in CMIP7. Whether or not that's the case, I would request that they publicly provide a use-case indicating why it is important to create two different variable names to identify the area of a grid cell.

Time is of the essence now, so I hope any objections identifying the variable containing cell areas by the name "areacell" will be made known within the next day or two. I am told that some groups plan to start writing model output in the next week or so, and they need to know how cell_measures should be recorded.

Please provide any feedback on this by responding to the corresponding github issue:
(The issue has been raised on the "WCRP-CMIP/Variable Registry" repository (https://github.com/WCRP-CMIP/Variable-Registry) because this register cannot be implemented as originally conceived without correcting this problem; it should be noted, however that there are also important implications for the CV repository (https://github.com/WCRP-CMIPCV) and the data request's airtables.)

Please share this issue with anyone you think might have an interest.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions