Skip to content

Commit ada260d

Browse files
authored
Setting up MDS-ToolBox V1.0.0 (#1)
1 parent 2dc511d commit ada260d

29 files changed

+2019
-1
lines changed

.github/workflows/ruff.yml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
name: Ruff
2+
on: [ pull_request ]
3+
jobs:
4+
ruff:
5+
runs-on: ubuntu-latest
6+
steps:
7+
- uses: actions/checkout@v4
8+
- uses: astral-sh/ruff-action@v3
9+
- name: Ruff format
10+
run: ruff format --diff

.gitignore

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
# directories
2+
__pycache__
3+
.idea
4+
.vscode
5+
6+
# extensions
7+
*.nc
8+
9+
# files
10+
poetry.lock

README.md

Lines changed: 279 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,279 @@
1-
# mds-toolbox
1+
# Marine Data Store ToolBox
2+
3+
This Python script provides a command-line interface (CLI) for downloading datasets using
4+
[copernicusmarine toolbox](https://help.marine.copernicus.eu/en/collections/4060068-copernicus-marine-toolbox)
5+
or [botos3](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html)
6+
7+
[![boto3](https://img.shields.io/badge/boto3->1.34-blue.svg)](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html)
8+
[![copernicusmarine](https://img.shields.io/badge/copernicusmarine->1.06-blue.svg)](https://help.marine.copernicus.eu/en/collections/4060068-copernicus-marine-toolbox)
9+
[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
10+
11+
<!-- TOC -->
12+
* [Marine Data Store ToolBox](#marine-data-store-toolbox)
13+
* [How to Install it](#how-to-install-it)
14+
* [Uninstall](#uninstall)
15+
* [Usage](#usage)
16+
* [S3 direct access](#s3-direct-access)
17+
* [s3-get](#s3-get)
18+
* [s3-list](#s3-list)
19+
* [Wrapper for copernicusmarine](#wrapper-for-copernicusmarine)
20+
* [Subset](#subset)
21+
* [Get](#get)
22+
* [File List](#file-list)
23+
* [Etag](#etag)
24+
* [Authors](#authors)
25+
<!-- TOC -->
26+
27+
---
28+
# How to Install it
29+
30+
Create the conda environment:
31+
32+
```shell
33+
mamba env create -f environment.yml
34+
mamba activate mdsenv
35+
36+
pip install .
37+
```
38+
39+
## Uninstall
40+
41+
To uninstall it:
42+
43+
```shell
44+
mamba activate mdsenv
45+
46+
pip uninstall mds-toolbox
47+
```
48+
49+
---
50+
51+
# Usage
52+
53+
The script provides several commands for different download operations:
54+
55+
```shell
56+
Usage: mds [OPTIONS] COMMAND [ARGS]...
57+
58+
Options:
59+
-h, --help Show this message and exit.
60+
61+
Commands:
62+
etag Get the etag of a give S3 file
63+
file-list Wrapper to copernicus marine toolbox file list
64+
get Wrapper to copernicusmarine get
65+
s3-get Download files with direct access to MDS using S3
66+
s3-list Listing file on MDS using S3
67+
subset Wrapper to copernicusmarine subset
68+
```
69+
70+
---
71+
72+
## S3 direct access
73+
74+
Since the copernicusmarine tool add a heavy overhead to s3 request, two functions has been developed to:
75+
76+
* make very fast s3 request
77+
* provide a thread-safe access to s3 client
78+
79+
### s3-get
80+
81+
```shell
82+
Usage: mds s3-get [OPTIONS]
83+
84+
Options:
85+
-b, --bucket TEXT Bucket name [required]
86+
-f, --filter TEXT Filter on the online files [required]
87+
-o, --output-directory TEXT Output directory [required]
88+
-p, --product TEXT The product name [required]
89+
-i, --dataset-id TEXT Dataset Id [required]
90+
-g, --dataset-version TEXT Dataset version or tag
91+
-r, --recursive List recursive all s3 files
92+
--threads INTEGER Downloading file using threads
93+
-s, --subdir TEXT Dataset directory on mds (i.e. {year}/{month})
94+
- If present boost the connection
95+
--overwrite Force overwrite of the file
96+
--keep-timestamps After the download, set the correct timestamp
97+
to the file
98+
--sync-time Update the file if it changes on the server
99+
using last update information
100+
--sync-etag Update the file if it changes on the server
101+
using etag information
102+
--help Show this message and exit.
103+
```
104+
105+
**Example**
106+
107+
```shell
108+
mds s3-get -i cmems_obs-ins_med_phybgcwav_mynrt_na_irr -b mdl-native-03 -g 202311 -p INSITU_MED_PHYBGCWAV_DISCRETE_MYNRT_013_035 -o "/work/antonio/20240320" -s latest/$(date -du +"%Y%m%d") --keep-timestamps --sync-etag -f $(date -du +"%Y%m%d")
109+
```
110+
111+
**Example using threads**
112+
113+
```shell
114+
mds s3-get --threads 10 -i cmems_obs-ins_med_phybgcwav_mynrt_na_irr -b mdl-native-03 -g 202311 -p INSITU_MED_PHYBGCWAV_DISCRETE_MYNRT_013_035 -o "." -s latest/$(date -du +"%Y%m%d") --keep-timestamps --sync-etag -f $(date -du +"%Y%m%d")
115+
```
116+
117+
### s3-list
118+
119+
```shell
120+
Usage: mds.py s3-list [OPTIONS]
121+
122+
Options:
123+
-b, --bucket TEXT Filter on the online files [required]
124+
-f, --filter TEXT Filter on the online files [required]
125+
-p, --product TEXT The product name [required]
126+
-i, --dataset-id TEXT Dataset Id
127+
-g, --dataset-version TEXT Dataset version or tag
128+
-s, --subdir TEXT Dataset directory on mds (i.e. {year}/{month}) -
129+
If present boost the connection
130+
-r, --recursive List recursive all s3 files
131+
--help Show this message and exit.
132+
```
133+
134+
**Example**
135+
136+
```shell
137+
mds s3-list -b mdl-native-01 -p INSITU_GLO_PHYBGCWAV_DISCRETE_MYNRT_013_030 -i cmems_obs-ins_glo_phybgcwav_mynrt_na_irr -g 202311 -s "monthly/BO/202401" -f "*" | tr " " "\n"
138+
```
139+
140+
**Example recursive**
141+
142+
```shell
143+
mds s3-list -b mdl-native-12 -p MEDSEA_ANALYSISFORECAST_PHY_006_013 -f '*' -r | tr " " "\n"
144+
```
145+
146+
---
147+
148+
## Wrapper for copernicusmarine
149+
150+
**The following functions rely on copernicusmarine implementation, the final result is strictly related to the installed
151+
version**
152+
153+
### Subset
154+
155+
```shell
156+
Usage: mds.py subset [OPTIONS]
157+
158+
Options:
159+
-o, --output-directory TEXT Output directory [required]
160+
-f, --output-filename TEXT Output filename [required]
161+
-i, --dataset-id TEXT Dataset Id [required]
162+
-v, --variables TEXT Variables to download. Can be used multiple times
163+
-x, --minimum-longitude FLOAT Minimum longitude for the subset.
164+
-X, --maximum-longitude FLOAT Maximum longitude for the subset.
165+
-y, --minimum-latitude FLOAT Minimum latitude for the subset. Requires a
166+
float within this range: [-90<=x<=90]
167+
-Y, --maximum-latitude FLOAT Maximum latitude for the subset. Requires a
168+
float within this range: [-90<=x<=90]
169+
-z, --minimum-depth FLOAT Minimum depth for the subset. Requires a
170+
float within this range: [x>=0]
171+
-Z, --maximum-depth FLOAT Maximum depth for the subset. Requires a
172+
float within this range: [x>=0]
173+
-t, --start-datetime TEXT Start datetime as:
174+
%Y|%Y-%m-%d|%Y-%m-%dT%H:%M:%S|%Y-%m-%d
175+
%H:%M:%S|%Y-%m-%dT%H:%M:%S.%fZ
176+
-T, --end-datetime TEXT End datetime as:
177+
%Y|%Y-%m-%d|%Y-%m-%dT%H:%M:%S|%Y-%m-%d
178+
%H:%M:%S|%Y-%m-%dT%H:%M:%S.%fZ
179+
-r, --dry-run Dry run
180+
-g, --dataset-version TEXT Dataset version or tag
181+
-n, --username TEXT Username
182+
-w, --password TEXT Password
183+
--help Show this message and exit.
184+
```
185+
186+
**Example**
187+
188+
```shell
189+
mds subset -f output.nc -o . -i cmems_mod_glo_phy-thetao_anfc_0.083deg_P1D-m -x -18.16667 -X 1.0 -y 30.16 -Y 46.0 -z 0.493 -Z 5727.918000000001 -t 2025-01-01 -T 2025-01-01 -v thetao
190+
```
191+
192+
### Get
193+
194+
**Command**:
195+
196+
```shell
197+
Usage: mds.py get [OPTIONS]
198+
199+
Options:
200+
-f, --filter TEXT Filter on the online files
201+
-o, --output-directory TEXT Output directory [required]
202+
-i, --dataset-id TEXT Dataset Id [required]
203+
-g, --dataset-version TEXT Dataset version or tag
204+
-s, --service TEXT Force download through one of the available
205+
services using the service name among
206+
['original-files', 'ftp'] or its short name
207+
among ['files', 'ftp'].
208+
-d, --dry-run Dry run
209+
-u, --update If the file not exists, download it, otherwise
210+
update it it changed on mds
211+
-v, --dataset-version TEXT Dry run
212+
-nd, --no-directories TEXT Option to not recreate folder hierarchy in
213+
output directory
214+
--force-download TEXT Flag to skip confirmation before download
215+
--disable-progress-bar TEXT Flag to hide progress bar
216+
-n, --username TEXT Username
217+
-w, --password TEXT Password
218+
--help Show this message and exi
219+
```
220+
221+
**Example**
222+
223+
```shell
224+
mds get -f '20250210*_d-CMCC--TEMP-MFSeas9-MEDATL-b20250225_an-sv10.00.nc' -o . -i cmems_mod_med_phy-tem_anfc_4.2km_P1D-m
225+
```
226+
227+
### File List
228+
229+
To retrieve a list of file, use:
230+
231+
```shell
232+
Usage: mds.py file-list [OPTIONS] DATASET_ID MDS_FILTER
233+
234+
Options:
235+
-g, --dataset-version TEXT Dataset version or tag
236+
--help Show this message and exit.
237+
```
238+
239+
**Example**
240+
241+
```shell
242+
mds file-list cmems_mod_med_phy-cur_anfc_4.2km_PT15M-i *b20250225* -g 202411
243+
```
244+
245+
### Etag
246+
247+
```shell
248+
Usage: mds.py etag [OPTIONS]
249+
250+
Options:
251+
-e, --s3_file TEXT Path to a specific s3 file - if present, other
252+
parameters are ignored.
253+
-p, --product TEXT The product name
254+
-d, --dataset_id TEXT The datasetID
255+
-v, --version TEXT Force the selection of a specific dataset version
256+
-s, --subdir TEXT Subdir structure on mds (i.e. {year}/{month})
257+
-f, --mds_filter TEXT Pattern to filter data (no regex)
258+
--help Show this message and exit.
259+
```
260+
261+
**Example**
262+
263+
With a specific file:
264+
265+
```shell
266+
mds etag -e s3://mdl-native-12/native/MEDSEA_ANALYSISFORECAST_PHY_006_013/cmems_mod_med_phy-tem_anfc_4.2km_P1D-m_202411/2023/08/20230820_d-CMCC--TEMP-MFSeas9-MEDATL-b20240607_an-sv10.00.nc
267+
```
268+
269+
Or:
270+
271+
```shell
272+
mds etag -p MEDSEA_ANALYSISFORECAST_PHY_006_013 -i cmems_mod_med_phy-cur_anfc_4.2km_PT15M-i -g 202411 -f '*b20241212*' -s 2024/12
273+
```
274+
275+
---
276+
277+
## Authors
278+
279+
* Antonio Mariani - [email protected]

environment.yml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
name: mdsenv
2+
channels:
3+
- conda-forge
4+
- defaults
5+
dependencies:
6+
- python=3.10
7+
- copernicusmarine=1.2.3
8+
- boto3 >=1.37.4
9+
- click >=8.1.8

mds/__init__.py

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
from mds.conf import settings
2+
from mds.core import mds_s3
3+
from mds.core import wrapper
4+
from mds.utils.log import configure_logging
5+
6+
7+
def setup(**kwargs) -> None:
8+
"""
9+
General mds-toolbox setup
10+
11+
Args:
12+
**kwargs: extra arguments to apply as app settings
13+
"""
14+
settings.configure(**kwargs)
15+
configure_logging(settings.LOGGING_CONFIG, settings.LOGGING, settings.LOG_LEVEL)
16+
17+
18+
__all__ = [
19+
mds_s3.__name__,
20+
wrapper.__name__,
21+
]

mds/conf/__init__.py

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
from mds.conf import global_settings
2+
3+
# List of modules to load settings from
4+
TO_LOAD = [global_settings]
5+
6+
7+
class Settings:
8+
def __init__(self, *modules):
9+
"""
10+
Initialize the Settings instance with the provided modules.
11+
12+
Args:
13+
*modules: Variable length argument list of modules to load settings from.
14+
"""
15+
for module in modules:
16+
for setting in dir(module):
17+
if setting.isupper():
18+
setattr(self, setting, getattr(module, setting))
19+
20+
def configure(self, **ext_settings):
21+
"""
22+
Configure the settings instance by setting new values or overriding existing ones.
23+
24+
Args:
25+
**ext_settings: Arbitrary keyword arguments representing settings to be configured.
26+
Only capital keywords are considered.
27+
"""
28+
for key, value in ext_settings.items():
29+
if key.isupper():
30+
setattr(self, key, value)
31+
32+
33+
# Create a Settings instance as unique entry point to the app settings
34+
settings = Settings(*TO_LOAD)

mds/conf/global_settings.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
########################
2+
# LOG
3+
########################
4+
5+
# The callable to use to configure logging
6+
LOGGING_CONFIG = "logging.config.dictConfig"
7+
8+
# Custom logging configuration.
9+
LOGGING = {}
10+
11+
LOG_LEVEL = "INFO"

mds/core/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)