CMIP7 Controlled Vocabularies

Core Controlled Vocabularies (CVs) for use in CMIP7

Caution

For information on what is in the CVs please visit the ESGVOC repository at: insert link here

THIS REPOSITORY IS CURRENTLY UNDER ACTIVE DEVELOPMENT

This repository is used to maintain the data that defines the core controlled vocabularies (CVs) for use in CMIP7. However, at present, the only recommend way to access these values is via the esgvoc package. If you are looking for further details on ESGF metadata handling, please follow this issue as this question goes beyond the CVs alone.

Overview

At present, this repository should not be used as your source of truth. As stated above, the only recommend way to access the CVs is via the esgvoc package. This will change in future, but for the CVs Task Team (CVs TT) does not have the resources to maintain more than one interface to the CVs.

Details

If you would like to understand how this works in more detail, then here is some further information. There are a few key concepts to be aware of.

How esgvoc works

esgvoc works by using an esgvoc branch in its source data repositories (we'll get to what those are in a minute). Each esgvoc branch contains the data esgvoc needs, in the format that esgvoc needs it. When you query data via esgvoc it (effectively, although it's smarter than this in practice) grabs the information from the esgvoc branch of all the repositories it needs to query, parses the information and serves it to you following esgvoc's defined API (see the esgvoc docs for the form of this API).

Key nouns

Data descriptor: a known metadata category used in CMIP
- for example, "experimentDD" is the data descriptor which defines the experiment to which a given dataset belongs, "areaLabelDD" is the data descriptor which defines the area label given to a dataset and "productTypeDD" is the data descriptor which (loosely) describes what kind of product a given dataset is (e.g. model-based, observations, reanalysis).
Term: an individual entry in the CVs for a given data descriptor
- for example, the entry for the "experimentDD" data descriptor which defines the historical experiment and its associated metadata
Collection: set of terms for a given data descriptor which are included in a given CMIP phase's CVs

Note: the naming convention for how data descriptors appear in different places, e.e. whether they have the trailing DD, whether they are camelCase or snake_case, are still being ironed out so expect to see a few different variants of these in the short-term, not always with a clearly defined logic. To follow discussion on this issue, please see esgvoc#56.

Source repositories

esgvoc uses two different sources of information. The first is the WCRP-universe i.e. this repository, (specifically the esgvoc branch thereof). This is the main/baseline/canonical repository containing all known terms used in all supported phases of CMIP (CMIP6, CMIP6-cordex, input4MIPs, CMIP7 etc.). In the universe, terms are defined in full i.e. all metadata is supplied next to each term. The second source of information, when it comes to CMIP7 CVs, is this repository (specifically the esgvoc branch). esgvoc uses this repository to define the terms that belong to CMIP7 (not all terms in the universe are relevant to CMIP7). In general, this definition simply means including the term in this repository too (as above, more precisely, in the esgvoc branch of this repository). Normally, this definition simply means including a link back to the universe. There is no need to duplicate metadata using esgvoc. However, this repository can also define, where needed, overrides of the information in the universe. This is rare, but can allow metadata to differ from what is in the universe, specifically for CMIP7.

Branches you may see

There are a few branches in this repository. Given there are so many relevant ones, here is a quick guide. In future, we hope to return to 'normal' and just have main, but for now it's like this:

main: just this README
- deliberately zero content. If you want to know about CVs, use esgvoc.
esgvoc: source data for esgvoc
- unless you are a developer, you shouldn't need to look at this. If you want to know about CVs, use esgvoc.
esgvoc_dev: if you want to make a change to the esgvoc branch, target your merge request at this branch (it is the development branch where all changes are added before being merged to esgvoc for releases).
- unless you are a developer, you shouldn't need to look at this. If you want to know about CVs, use esgvoc.

All other branches can be ignored. They are being used by devs and are not intended to be long-lived.

JSON branch structure (ignore these and use esgvoc for now)

Required
`main`	The landing page directing users to the relevant content.
`docs`	Contains the documentation and is version-controlled. This is the branch where documentation edits are made. Actions and automations (e.g., workflows that update docs or summaries) are also configured from this branch.
`src-data`	Stores the JSONLD content used to link all files. Updates here trigger automated workflows that identify changed JSON files and update documentation or summaries accordingly.
`production`	Not for user digestion. Hosts the compiled documentation and JSONLD files, as well as the static pages site. Updated automatically via workflows when changes in `src-data` or `docs` are processed.

Optional
`dev_*`	Other branches used for updating things.
`*`	All other branches are usually ones containing submissions to update the content.

Contributors

Thanks to our contributors!

Acknowledgement

The repository content has been collected from many contributors representing the Coupled Model Intercomparison Project phase 7 (CMIP7), including those from climate modeling groups and model intercomparison projects (MIPs) worldwide. The structure of content and tools required to maintain it was developed by climate and computer scientists from the Program for Climate Model Diagnosis and Intercomparison (PCMDI) at Lawrence Livermore National Laboratory (LLNL) with assistance from colleagues at the UK MetOffice, UK Centre for Environmental Data Analysis (CEDA), the Deutsches Klimarechenzentrum (DKRZ) in Germany and the members of the Infrastructure for the European Network for Earth System Modelling (IS-ENES) consortium.

This work is sponsored by the Regional and Global Model Analysis (RGMA) program of the Earth and Environmental Systems Sciences Division (EESSD) in the Office of Biological and Environmental Research (BER) within the Department of Energy's (DOE) Office of Science (OS). The work at PCMDI is performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.

Name		Name	Last commit message	Last commit date
Latest commit History 1,220 Commits
.github		.github
cmor		cmor
.gitignore		.gitignore
LICENSE-CC-BY		LICENSE-CC-BY

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CMIP7 Controlled Vocabularies

THIS REPOSITORY IS CURRENTLY UNDER ACTIVE DEVELOPMENT

Overview