Skip to content

Commit c72911f

Browse files
Addressing Iss41 (#72)
Making the tool HPC-agnostic, optimized, and provide readthedocs documentation
1 parent cd06ea0 commit c72911f

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

81 files changed

+6346
-548
lines changed

.gitignore

+5-2
Original file line numberDiff line numberDiff line change
@@ -4,5 +4,8 @@
44
.DS_Store
55
*.swp
66

7-
# WIP folders
8-
scripts/ouranos-crcm5-cmip6/
7+
# test folders
8+
ignore-tests/
9+
10+
# docs stuff
11+
docs/build/

.readthedocs.yaml

+31
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# Read the Docs configuration file for Sphinx projects
2+
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
3+
4+
# Required
5+
version: 2
6+
7+
# Set the OS, Python version and other tools you might need
8+
build:
9+
os: ubuntu-22.04
10+
tools:
11+
python: "3.12"
12+
13+
# Build documentation in the "docs/" directory with Sphinx
14+
sphinx:
15+
configuration: docs/conf.py
16+
# You can configure Sphinx to use a different builder, for instance use the dirhtml builder for simpler URLs
17+
# builder: "dirhtml"
18+
# Fail on all warnings to avoid broken references
19+
# fail_on_warning: true
20+
21+
# Optionally build your docs in additional formats such as PDF and ePub
22+
formats:
23+
- pdf
24+
- epub
25+
26+
# Optional but recommended, declare the Python requirements required
27+
# to build your documentation
28+
# See https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html
29+
python:
30+
install:
31+
- requirements: docs/requirements.txt

README.md

+3-99
Original file line numberDiff line numberDiff line change
@@ -1,107 +1,11 @@
1-
# Description
2-
This repository contains scripts to process meteorological datasets in NetCDF file format. The general usage of the script (i.e., `./extract-dataset.sh`) is as follows:
3-
4-
```console
5-
Usage:
6-
extract-dataset [options...]
7-
8-
Script options:
9-
-d, --dataset Meteorological forcing dataset of interest
10-
-i, --dataset-dir=DIR The source path of the dataset file(s)
11-
-v, --variable=var1[,var2[...]] Variables to process
12-
-o, --output-dir=DIR Writes processed files to DIR
13-
-s, --start-date=DATE The start date of the data
14-
-e, --end-date=DATE The end date of the data
15-
-l, --lat-lims=REAL,REAL Latitude's upper and lower bounds;
16-
optional; within the [-90, +90] limits
17-
-n, --lon-lims=REAL,REAL Longitude's upper and lower bounds;
18-
optional; within the [-180, +180] limits
19-
-a, --shape-file=PATH Path to the ESRI shapefile; optional
20-
-m, --ensemble=ens1,[ens2,[...]] Ensemble members to process; optional
21-
Leave empty to extract all ensemble members
22-
-M, --model=model1,[model2,[...]] Models that are part of a dataset,
23-
only applicable to climate datasets, optional
24-
-S, --scenario=scn1,[scn2,[...]] Climate scenarios to process, only applicable
25-
to climate datasets, optional
26-
-j, --submit-job Submit the data extraction process as a job
27-
on the SLURM system; optional
28-
-k, --no-chunk No parallelization, recommended for small domains
29-
-p, --prefix=STR Prefix prepended to the output files
30-
-b, --parsable Parsable SLURM message mainly used
31-
for chained job submissions
32-
-c, --cache=DIR Path of the cache directory; optional
33-
-E, [email protected] E-mail user when job starts, ends, or
34-
fails; optional
35-
-u, --account Digital Research Alliance of Canada's sponsor's
36-
account name; optional, defaults to 'rpp-kshook'
37-
-L, --list-datasets List all the available datasets and the
38-
corresponding keywords for '--dataset' option
39-
-V, --version Show version
40-
-h, --help Show this screen and exit
41-
42-
```
43-
# Available Datasets
44-
|# |Dataset |Time Period |DOI |Description |
45-
|--|---------------------------|--------------------------------|--------------------------|-------------------------------------|
46-
|1 |GWF-NCAR WRF-CONUS I |Hourly (Oct 2000 - Dec 2013) |10.1007/s00382-016-3327-9 |[link](./scripts/gwf-ncar-conus_i) |
47-
|2 |GWF-NCAR WRF-CONUS II[^1] |Hourly (Jan 1995 - Dec 2015) |10.5065/49SN-8E08 |[link](./scripts/gwf-ncar-conus_ii) |
48-
|3 |ECMWF ERA5[^2] |Hourly (Jan 1950 - Dec 2020) |10.24381/cds.adbb2d47 and [link](https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-single-levels-preliminary-back-extension?tab=overview)|[link](./scripts/ecmwf-era5)|
49-
|4 |ECCC RDRSv2.1 |Hourly (Jan 1980 - Dec 2018) |10.5194/hess-25-4917-2021 |[link](./scripts/eccc-rdrs) |
50-
|5 |CCRN CanRCM4-WFDEI-GEM-CaPA|3-Hourly (Jan 1951 - Dec 2100) |10.5194/essd-12-629-2020 |[link](./scripts/ccrn-canrcm4_wfdei_gem_capa)|
51-
|6 |CCRN WFDEI-GEM-CaPA |3-Hourly (Jan 1979 - Dec 2016) |10.20383/101.0111 |[link](./scripts/ccrn-wfdei_gem_capa)|
52-
|7 |ORNL Daymet |Daily (Jan 1980 - Dec 2022)[^3] |10.3334/ORNLDAAC/2129 |[link](./scripts/ornl-daymet) |
53-
|8 |Alberta Gov Climate Dataset|Daily (Jan 1950 - Dec 2100) |10.5194/hess-23-5151-2019 |[link](./scripts/ab-gov) |
54-
|9 |Ouranos ESPO-G6-R2 |Daily (Jan 1950 - Dec 2100) |10.1038/s41597-023-02855-z|[link](./scripts/ouranos-espo-g6-r2) |
55-
|10|Ouranos MRCC5-CMIP6 |hourly (Jan 1950 - Dec 2100) |TBD |[link](./scripts/ouranos-mrcc5-cmip6)|
56-
|11|NASA NEX-GDDP-CMIP6 |Daily (Jan 1950 - Dec 2100) |10.1038/s41597-022-01393-4|[link](./scripts/nasa-nex-gddp-cmip6)|
57-
58-
[^1]: For access to the files on Graham cluster, please contact [Stephen O'Hearn](mailto:[email protected]).
59-
[^2]: ERA5 data from 1950-1979 are based on [ERA5 preliminary extenion](https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-single-levels-preliminary-back-extension?tab=overview) and 1979 onwards are based on [ERA5 1979-present](https://doi.org/10.24381/cds.adbb2d47).
60-
[^3]: For the Peurto Rico domain of the dataset, data are available from January 1950 until December 2022.
61-
[^4]: Data is not publicly available yet. DOI is to be determined once the relevant paper is published.
62-
63-
# General Example
64-
As an example, follow the code block below. Please remember that you MUST have access to Digital Research Alliance of Canada (DRA) clusters (specifically `Graham`) and have access to `RDRSv2.1` model outputs. Also, remember to generate a [Personal Access Token](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token) with GitHub in advance. Enter the following codes in your Graham shell as a test case:
65-
66-
```console
67-
foo@bar:~$ git clone https://github.com/kasra-keshavarz/datatool # clone the repository
68-
foo@bar:~$ cd ./datatool/ # move to the repository's directory
69-
foo@bar:~$ ./extract-dataset.sh -h # view the usage message
70-
foo@bar:~$ ./extract-dataset.sh \
71-
--dataset="rdrs" \
72-
--dataset-dir="/project/rpp-kshook/Climate_Forcing_Data/meteorological-data/rdrsv2.1" \
73-
--output-dir="$HOME/scratch/rdrs_outputs/" \
74-
--start-date="2001-01-01 00:00:00" \
75-
--end-date="2001-12-31 23:00:00" \
76-
--lat-lims=49,51 \
77-
--lon-lims=-117,-115 \
78-
--variable="RDRS_v2.1_A_PR0_SFC,RDRS_v2.1_P_HU_09944" \
79-
--cache='$SLURM_TMPDIR' \
80-
--prefix="testing_";
81-
```
82-
See the [examples](./examples) directory for real-world scripts for each meteorological dataset included in this repository.
83-
84-
# Logs
85-
The datasets logs are generated under the `$HOME/.datatool` directory,
86-
only in cases where jobs are submitted to clusters' schedulers. If
87-
processing is not submitted as a job, then the logs are printed on screen.
88-
89-
# New Datasets
90-
If you are considering any new dataset to be added to the data
91-
repository, and subsequently the associated scripts added here,
92-
you can open a new ticket on the **Issues** tab of the current
93-
repository. Or, you can make a
94-
[Pull Request](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request)
95-
on this repository with your own script.
96-
97-
# Support
98-
Please open a new ticket on the **Issues** tab of this repository for
99-
support.
1+
# Documentation
2+
The relevant documentation is located on [Readthedocs](https://datatool.readthedocs.io/en/latest/) website.
1003

1014
# License
1025
Meteorological Data Processing Workflow - datatool <br>
1036
Copyright (C) 2022-2023, University of Saskatchewan<br>
1047
Copyright (C) 2023-2024, University of Calgary<br>
8+
Copyright (C) 2022-2024, datatool developers
1059

10610
This program is free software: you can redistribute it and/or modify
10711
it under the terms of the GNU General Public License as published by

VERSION

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
0.5.2-dev
1+
0.7.0
File renamed without changes.

docs/Makefile

+20
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Minimal makefile for Sphinx documentation
2+
#
3+
4+
# You can set these variables from the command line, and also
5+
# from the environment for the first two.
6+
SPHINXOPTS ?=
7+
SPHINXBUILD ?= sphinx-build
8+
SOURCEDIR = ./
9+
BUILDDIR = build
10+
11+
# Put it first so that "make" without argument is like "make help".
12+
help:
13+
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
14+
15+
.PHONY: help Makefile
16+
17+
# Catch-all target: route all unknown targets to Sphinx using the new
18+
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
19+
%: Makefile
20+
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

docs/conf.py

+29
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# Configuration file for the Sphinx documentation builder.
2+
#
3+
# For the full list of built-in configuration values, see the documentation:
4+
# https://www.sphinx-doc.org/en/master/usage/configuration.html
5+
6+
# -- Project information -----------------------------------------------------
7+
# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information
8+
9+
project = 'datatool'
10+
copyright = '2022-2024, University of Calgary'
11+
author = 'Kasra Keshavarz'
12+
release = '0.7.0'
13+
14+
# -- General configuration ---------------------------------------------------
15+
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
16+
17+
import sphinx_rtd_theme
18+
19+
extensions = [
20+
'sphinx_rtd_theme',
21+
]
22+
23+
templates_path = ['_templates']
24+
exclude_patterns = []
25+
26+
# -- Options for HTML output -------------------------------------------------
27+
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output
28+
29+
html_theme = 'sphinx_rtd_theme'

docs/datasets.rst

+68
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
.. Copyright 2022-2024 University of Calgary, University of Saskatchewan
2+
and other datatool developers.
3+
4+
SPDX-License-Identifier: (GPL-3.0-or-later)
5+
6+
.. _datatool-datasets:
7+
8+
========
9+
Datasets
10+
========
11+
This page details the dataset recipes available with ``datatool``.
12+
13+
-------
14+
Summary
15+
-------
16+
The following table lists available datasets, their DOI, and provides links to sections describing the dataset.
17+
18+
+----+------------------------------+--------------------------------------------------------------------------------------+
19+
| # | Dataset | DOI |
20+
+====+==============================+======================================================================================+
21+
| 1 | GWF-NCAR WRF-CONUS I | 10.1007/s00382-016-3327-9 |
22+
+----+------------------------------+--------------------------------------------------------------------------------------+
23+
| 2 | GWF-NCAR WRF-CONUS II [#f1]_ | 10.5065/49SN-8E08 |
24+
+----+------------------------------+--------------------------------------------------------------------------------------+
25+
| 3 | ECMWF ERA5 [#f2]_ | 10.24381/cds.adbb2d47 and `ERA5 preliminary extension <era5_preliminary_extension>`_ |
26+
+----+------------------------------+--------------------------------------------------------------------------------------+
27+
| 4 | ECCC RDRSv2.1 | 10.5194/hess-25-4917-2021 |
28+
+----+------------------------------+--------------------------------------------------------------------------------------+
29+
| 5 | CCRN CanRCM4-WFDEI-GEM-CaPA | 10.5194/essd-12-629-2020 |
30+
+----+------------------------------+--------------------------------------------------------------------------------------+
31+
| 6 | CCRN WFDEI-GEM-CaPA | 10.20383/101.0111 |
32+
+----+------------------------------+--------------------------------------------------------------------------------------+
33+
| 7 | ORNL Daymet [#f3]_ | 10.3334/ORNLDAAC/2129 |
34+
+----+------------------------------+--------------------------------------------------------------------------------------+
35+
| 8 | Alberta Gov Climate Dataset | 10.5194/hess-23-5151-2019 |
36+
+----+------------------------------+--------------------------------------------------------------------------------------+
37+
| 9 | Ouranos ESPO-G6-R2 | 10.1038/s41597-023-02855-z |
38+
+----+------------------------------+--------------------------------------------------------------------------------------+
39+
| 10 | Ouranos MRCC5-CMIP6 | 10.5281/zenodo.11061924 |
40+
+----+------------------------------+--------------------------------------------------------------------------------------+
41+
| 11 | NASA NEX-GDDP-CMIP6 | 10.1038/s41597-022-01393-4 |
42+
+----+------------------------------+--------------------------------------------------------------------------------------+
43+
44+
.. [#f1] For access to the files on the Graham cluster, please contact `Stephen O'Hearn <mailto:[email protected]>`_.
45+
.. [#f2] ERA5 data from 1950-1979 are based on `ERA5 preliminary extension <https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-single-levels-preliminary-back-extension?tab=overview>`_ and 1979 onwards are based on `ERA5 1979-present <https://doi.org/10.24381/cds.adbb2d47>`_.
46+
.. [#f3] For the Puerto Rico domain of the dataset, data are available from January 1950 until December 2022.
47+
48+
.. _era5_preliminary_extension: https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-single-levels-preliminary-back-extension?tab=overview/
49+
50+
---------------------
51+
Detailed Descriptions
52+
---------------------
53+
.. toctree::
54+
:maxdepth: 2
55+
:caption: Contents:
56+
57+
scripts/ab-gov.rst
58+
scripts/ccrn-canrcm4_wfdei_gem_capa.rst
59+
scripts/ccrn-wfdei_gem_capa.rst
60+
scripts/eccc-rdrs.rst
61+
scripts/ecmwf-era5.rst
62+
scripts/gwf-ncar-conus_i.rst
63+
scripts/gwf-ncar-conus_ii.rst
64+
scripts/nasa-nex-gddp-cmip6.rst
65+
scripts/ornl-daymet.rst
66+
scripts/ouranos-espo-g6-r2.rst
67+
scripts/ouranos-mrcc5-cmip6.rst
68+

docs/index.rst

+85
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
.. Copyright 2022-2024 University of Calgary, University of Saskatchewan
2+
and other datatool Developers.
3+
4+
SPDX-License-Identifier: (GPL-3.0-or-later)
5+
6+
.. _main-datatool:
7+
8+
========================================
9+
Welcome to ``datatool``'s documentation!
10+
========================================
11+
``datatool`` is an HPC-indepenent workflow enabling end-users extracting
12+
subsets from community meteorological datasets through a simple
13+
command-line interface (CLI). The tool works at large with NetCDF files,
14+
but is not limited to any file format, structure, or dataset.
15+
16+
Through crowsourcing, ``datatool`` aims to enable end-users extract subsets
17+
from any dataset available to the community members.
18+
19+
--------------
20+
User Interface
21+
--------------
22+
This repository contains scripts to process meteorological datasets in NetCDF
23+
file format. The general usage of the script (i.e., ``./extract-dataset.sh``)
24+
is as follows:
25+
26+
.. code-block:: console
27+
28+
Usage:
29+
extract-dataset [options...]
30+
31+
Script options:
32+
-d, --dataset Meteorological forcing dataset of interest
33+
-i, --dataset-dir=DIR The source path of the dataset file(s)
34+
-v, --variable=var1[,var2[...]] Variables to process
35+
-o, --output-dir=DIR Writes processed files to DIR
36+
-s, --start-date=DATE The start date of the data
37+
-e, --end-date=DATE The end date of the data
38+
-l, --lat-lims=REAL,REAL Latitude's upper and lower bounds
39+
optional; within the [-90, +90] limits
40+
-n, --lon-lims=REAL,REAL Longitude's upper and lower bounds
41+
optional; within the [-180, +180] limits
42+
-a, --shape-file=PATH Path to the ESRI shapefile; optional
43+
-m, --ensemble=ens1,[ens2,[...]] Ensemble members to process; optional
44+
Leave empty to extract all ensemble members
45+
-M, --model=model1,[model2,[...]] Models that are part of a dataset,
46+
only applicable to climate datasets, optional
47+
-S, --scenario=scn1,[scn2,[...]] Climate scenarios to process, only applicable
48+
to climate datasets, optional
49+
-j, --submit-job Submit the data extraction process as a job
50+
on the SLURM system; optional
51+
-k, --no-chunk No parallelization, recommended for small domains
52+
-p, --prefix=STR Prefix prepended to the output files
53+
-b, --parsable Parsable SLURM message mainly used
54+
for chained job submissions
55+
-c, --cache=DIR Path of the cache directory; optional
56+
defaults to $HOME/scratch
57+
-E, [email protected] E-mail user when job starts, ends, or
58+
fails; optional
59+
-C, --cluster=JSON JSON file detailing cluster-specific details
60+
-L, --list-datasets List all the available datasets and the
61+
corresponding keywords for '--dataset' option
62+
-V, --version Show version
63+
-h, --help Show this screen and exit
64+
65+
66+
Use the navigation menu on the left to explore the ``datatool``'s
67+
documentation!
68+
69+
.. toctree::
70+
:maxdepth: 2
71+
:caption: User Manual
72+
73+
index
74+
quick_start
75+
json
76+
77+
:maxdepth: 3
78+
:caption: Datasets
79+
80+
datasets
81+
82+
:maxdepth: 1
83+
:caption: License
84+
85+
license

0 commit comments

Comments
 (0)