Skip to content

Commit ec8ec1d

Browse files
authored
Merge pull request #1 from CMIP-Data-Request/ja
Initial content merge from JA
2 parents f116af6 + 8408c0a commit ec8ec1d

File tree

2 files changed

+182
-4
lines changed

2 files changed

+182
-4
lines changed

docs/Content/index.md

Lines changed: 97 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,98 @@
1-
# DReq Content
1+
# DReq Content: technical details
2+
3+
The information used by the data request is version controlled [here](https://github.com/CMIP-Data-Request/CMIP7_DReq_Content).
4+
This information is referred to as the data request "content".
5+
It includes all information linking scientific goals to the output variables requested from particular CMIP experiments, and the CF metadata required to completely characterize the output variables.
6+
The content is versioned separately from the [data request software](https://github.com/CMIP-Data-Request/CMIP7_DReq_Software), which provides an interface to query and utilize the content.
7+
**Users should not interact with the content files directly**, as the software is intended for this purpose.
8+
9+
Airtable databases ("bases") maintained by the CMIP IPO and Data Request Task Team are the primary source of the content.
10+
These Airtable bases are used to manage the information gathered by the extensive community consultation undertaken to develop the CMIP7 data request.
11+
The content repository stores exports of the information from Airtable in `json` files.
12+
Although `json` files are easily viewable in any text editor, the format of these exported content files is not designed for readability since users should interact with data request content either by using the software or by navigating the Airtable online interface.
13+
14+
While users should not interact directly with the expored content files, for reference we document here some basic aspects of these files.
15+
There are two flavours of exported content file:
16+
17+
- `dreq_release_export.json` contains the content of an official data request release. New versions of this file correspond to tags in this repository with the names of official releases (e.g. `v1.0beta` or `v1.0`).
18+
19+
- `dreq_raw_export.json` contains the content of the "working" bases used by the Data Request Task Team, CMIP IPO, and Thematic Author Teams to develop the data request. It is updated on an ongoing basis (i.e., there can be updates between official releases). Its format differs slightly from that of `dreq_release_export.json`, but for tagged versions its information content should be consistent with `dreq_release_export.json`.
20+
21+
The basic structure of an export file is:
22+
```
23+
{
24+
'base name 1' : {
25+
'table name 1' : {
26+
...
27+
'records' : { # dict to contain all records (rows) in the table, indexed by each record's unique id string
28+
record id 1 : {record info}
29+
record id 2 : {record info}
30+
...
31+
},
32+
'fields' : { # dict to contain schema info about the fields found in each record
33+
field id 1 : {field info}
34+
field id 2 : {field info}
35+
...
36+
},
37+
'table name 2' : {...}
38+
...
39+
}
40+
}
41+
'base name 2' : {...}
42+
...
43+
}
44+
```
45+
For example:
46+
```json
47+
{
48+
"Data Request Opportunities (Public)": {
49+
"Comment": {
50+
"base_id": "appbrFryP1MhstOS3",
51+
"base_name": "Data Request Opportunities (Public)",
52+
"id": "tblQqiAzywOppDNvj",
53+
"name": "Comment",
54+
"description": "",
55+
"fields": {
56+
"fld5PnZpNhaifVJ8z": {
57+
"description": "Comment Title",
58+
"name": "Comment Title",
59+
"type": "singleLineText"
60+
},
61+
"fldKYZsaRAapA58NG": {
62+
"description": "Variable groups relevant to the comment.",
63+
"linked_table_id": "tbl4x1RxPwKRZ0VXY",
64+
"name": "Variable Groups",
65+
"type": "multipleRecordLinks"
66+
},
67+
...
68+
"records": {
69+
"rec5E9oBVZsxdxHKN": {
70+
"Comment": "The reference to Omon.sltbasin (Omon.slftbasin) is wrong and must be changed to Omon.sltbasin.\n",
71+
"Comment Title": "Update description",
72+
"Opportunities": [
73+
"reczXng420cBQ08hg"
74+
],
75+
"Status": "Done",
76+
"Theme": [
77+
"Ocean & Sea-Ice"
78+
],
79+
"Variable Groups": [
80+
"recPohW0nDzLULHye"
81+
]
82+
},
83+
...
84+
```
85+
86+
Each base is a separate top-level entry ("base" is Airtable's term for database).
87+
This is necessary to ensure the integrity of links between different tables in each base.
88+
They are self-consistent within a base, but not across different bases.
89+
Note however that release versions (`dreq_release_export.json`) contain only one base.
90+
To create release versions, tables from public views of the three "working bases" (Opportunities, Physical Parameters, and Variables) at the release time are consolidated in Airtable into a single, static self-consistent base.
91+
92+
In any export, links from a record to one or more other records in other tables appear as lists of record id strings.
93+
In the above example, "Opportunities" and "Variable Groups" are both links (in this instance the lists have length = 1).
94+
The field description indicates which table a link points to, which in the above example for "Variable Groups" is the table with the id string given by `linked_table_id`.
95+
(In this instance it's obvious which table is linked to because the field name is the same as the table name, but that's not required and isn't always the case.)
96+
Note that the unique ids of records, such as `"recPohW0nDzLULHye"`, or of other entities in the export (tables, fields, and bases), are **not equivalent to the persistent unique identifiers (uid) of data request objects**.
97+
These ids are internally generated when the information is exported from airtable, and provide self-consistent identifiers within the export file but are not persistent identifiers.
298

3-
Information about the Data Request and what it contains

docs/Software/index.md

Lines changed: 85 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,86 @@
1-
# Information of the Software and how it works.
1+
# Data request software
22

3-
Some information here.
3+
The [data request software](https://github.com/CMIP-Data-Request/CMIP7_DReq_Software) provides python code allowing users to interact programatically with the [CMIP7 data request](https://wcrp-cmip.org/cmip7/cmip7-data-request/).
4+
It will provide an API and scripts that can produce lists of the variables requested for each CMIP7 experiment, information about the requested variables, and in general support different ways of querying and utilizing the information in the data request.
5+
6+
7+
## Overview
8+
9+
The CMIP7 data request **Software** and **Content** are version controlled in separate github repositories.
10+
Official releases of the data request correspond to a tag in each of these repositories (e.g., `v1.0beta`).
11+
However the Software can interact with different versions of the Content - for example, to examine changes that have occurred when a new version of the data request is issued.
12+
13+
The data request **Content**, which is version controlled [here](https://github.com/CMIP-Data-Request/CMIP7_DReq_Content), refers to the information comprising the data request.
14+
This includes descriptions of Opportunities and their lists of requested variables, definitions of the variables, etc.
15+
The Content is stored as `json` and read by the data request Software as its input, but users should not interact with this `json` directly and its structure is not designed for readability.
16+
Users do not need to manually download the Content as this is done automatically by the Software (see "Getting Started", below, for further details).
17+
18+
The data request Content is an automatic export from Airtable, a cloud platform used by the Data Request Task Team to facilitate ongoing community engagement in developing the data request.
19+
Airtable provides users with a browseable web interface to explore data request information contained in relational databases that are referred to as "bases".
20+
These Airtable bases contain interlinked tables that constitute the primary source of data request information.
21+
22+
The Content of each official release of the data request can be explored online using the [Airtable interface](https://bit.ly/CMIP7-DReq-v1_0beta).
23+
This provides a browseable web view of the Content, allowing users to follow links between different elements of the data request - for example, to view the variables requested by a given Opportunity, or to view the Opportunities that request a given variable.
24+
This view is complementary to the access to the Content that is provided via the Software, and both access methods (Airtable and Software) are based on the same underlying information about the data request.
25+
26+
27+
Using the data request **Software** provides a way to interact programmatically with the data request Content, such as to:
28+
29+
- Given a list of supported opportunities and their priorities, produce lists of variables to output for each experiment (see Getting Started section to test this functionality),
30+
- Output the CF-compliant metadata characterizing each variable (an example file with some of the metadata for each requested variable is available in v1.0beta),
31+
- Compare the requested output of CMIP7 experiments to a given model's published CMIP6 output (not yet available with v1.0beta, to come in a commit to come soon before v1.0),
32+
- Estimate output volumes (not yet available with v1.0beta, to come in a commit to come soon before v1.0).
33+
34+
The Software should facilitate integration of the data request into modelling workflows.
35+
Suggestions for functionality are welcome in the [github discussion forum](https://github.com/CMIP-Data-Request/CMIP7_DReq_Software/discussions).
36+
37+
38+
During development, the Software and Content repositories reside in the github organisation https://github.com/CMIP-Data-Request.
39+
Stable releases will eventually be migrated into the https://github.com/WCRP-CMIP organisation.
40+
41+
42+
## Getting started
43+
44+
To begin, clone the Software and navigate to the `scripts/` directory:
45+
```
46+
git clone [email protected]:CMIP-Data-Request/CMIP7_DReq_Software.git
47+
cd CMIP7_DReq_Software/scripts
48+
```
49+
The `env.yml` file can be used to create a conda environment in which to run the Software:
50+
```
51+
conda env create -n my_dreq_env --file env.yml
52+
```
53+
replacing `my_dreq_env` with your preferred environment name.
54+
Activate this environment:
55+
```
56+
conda activate my_dreq_env
57+
```
58+
and run the the example script:
59+
```
60+
python workflow_example.py
61+
```
62+
This script contains a workflow to access the data request Content, specify a list of Opportunities and priority levels of variables, and output the lists of variables requested from each experiment in the specified Opportunities.
63+
Each listed variables is currently identified by a unique "compound name" using CMIP6-era table names and short variable names (`Amon.tas`, `Omon.tos`, etc).
64+
Variable names may change in upcoming releases, but in any case a mapping to CMIP6-era variable names will be retained in the data request so as to allow comparison with CMIP6 output (for those variables that were defined in CMIP6).
65+
66+
To access the data request Content, the example script first needs to identify the version of the data request Content that is being used.
67+
This is done by specifying a tag in the Content repo and calling the retrieval function.
68+
For example:
69+
```
70+
dc.retrieve('v1.0beta')
71+
```
72+
downloads `v1.0beta` of the Content into local cache, if it is not already there.
73+
The script can then access it by loading it into a python dict variable:
74+
```
75+
content = dc.load('v1.0beta')
76+
```
77+
Currently a single version of the Content `json` file is roughly 20 MB in size.
78+
The size of local cache can be managed by deleting unused versions.
79+
For example, to remove a specific version:
80+
```
81+
dc.delete('v1.0beta')
82+
```
83+
Or to remove all locally cached versions:
84+
```
85+
dc.delete()
86+
```

0 commit comments

Comments
 (0)