|
1 | | -# mds-toolbox |
| 1 | +# Marine Data Store ToolBox |
| 2 | + |
| 3 | +This Python script provides a command-line interface (CLI) for downloading datasets using |
| 4 | +[copernicusmarine toolbox](https://help.marine.copernicus.eu/en/collections/4060068-copernicus-marine-toolbox) |
| 5 | +or [botos3](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html) |
| 6 | + |
| 7 | +[](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html) |
| 8 | +[](https://help.marine.copernicus.eu/en/collections/4060068-copernicus-marine-toolbox) |
| 9 | +[](https://github.com/astral-sh/ruff) |
| 10 | + |
| 11 | +<!-- TOC --> |
| 12 | +* [Marine Data Store ToolBox](#marine-data-store-toolbox) |
| 13 | +* [How to Install it](#how-to-install-it) |
| 14 | + * [Uninstall](#uninstall) |
| 15 | +* [Usage](#usage) |
| 16 | + * [S3 direct access](#s3-direct-access) |
| 17 | + * [s3-get](#s3-get) |
| 18 | + * [s3-list](#s3-list) |
| 19 | + * [Wrapper for copernicusmarine](#wrapper-for-copernicusmarine) |
| 20 | + * [Subset](#subset) |
| 21 | + * [Get](#get) |
| 22 | + * [File List](#file-list) |
| 23 | + * [Etag](#etag) |
| 24 | + * [Authors](#authors) |
| 25 | +<!-- TOC --> |
| 26 | + |
| 27 | +--- |
| 28 | +# How to Install it |
| 29 | + |
| 30 | +Create the conda environment: |
| 31 | + |
| 32 | +```shell |
| 33 | +mamba env create -f environment.yml |
| 34 | +mamba activate mdsenv |
| 35 | + |
| 36 | +pip install . |
| 37 | +``` |
| 38 | + |
| 39 | +## Uninstall |
| 40 | + |
| 41 | +To uninstall it: |
| 42 | + |
| 43 | +```shell |
| 44 | +mamba activate mdsenv |
| 45 | + |
| 46 | +pip uninstall mds-toolbox |
| 47 | +``` |
| 48 | + |
| 49 | +--- |
| 50 | + |
| 51 | +# Usage |
| 52 | + |
| 53 | +The script provides several commands for different download operations: |
| 54 | + |
| 55 | +```shell |
| 56 | +Usage: mds [OPTIONS] COMMAND [ARGS]... |
| 57 | + |
| 58 | +Options: |
| 59 | + -h, --help Show this message and exit. |
| 60 | + |
| 61 | +Commands: |
| 62 | + etag Get the etag of a give S3 file |
| 63 | + file-list Wrapper to copernicus marine toolbox file list |
| 64 | + get Wrapper to copernicusmarine get |
| 65 | + s3-get Download files with direct access to MDS using S3 |
| 66 | + s3-list Listing file on MDS using S3 |
| 67 | + subset Wrapper to copernicusmarine subset |
| 68 | +``` |
| 69 | + |
| 70 | +--- |
| 71 | + |
| 72 | +## S3 direct access |
| 73 | + |
| 74 | +Since the copernicusmarine tool add a heavy overhead to s3 request, two functions has been developed to: |
| 75 | + |
| 76 | +* make very fast s3 request |
| 77 | +* provide a thread-safe access to s3 client |
| 78 | + |
| 79 | +### s3-get |
| 80 | + |
| 81 | +```shell |
| 82 | +Usage: mds s3-get [OPTIONS] |
| 83 | + |
| 84 | +Options: |
| 85 | + -b, --bucket TEXT Bucket name [required] |
| 86 | + -f, --filter TEXT Filter on the online files [required] |
| 87 | + -o, --output-directory TEXT Output directory [required] |
| 88 | + -p, --product TEXT The product name [required] |
| 89 | + -i, --dataset-id TEXT Dataset Id [required] |
| 90 | + -g, --dataset-version TEXT Dataset version or tag |
| 91 | + -r, --recursive List recursive all s3 files |
| 92 | + --threads INTEGER Downloading file using threads |
| 93 | + -s, --subdir TEXT Dataset directory on mds (i.e. {year}/{month}) |
| 94 | + - If present boost the connection |
| 95 | + --overwrite Force overwrite of the file |
| 96 | + --keep-timestamps After the download, set the correct timestamp |
| 97 | + to the file |
| 98 | + --sync-time Update the file if it changes on the server |
| 99 | + using last update information |
| 100 | + --sync-etag Update the file if it changes on the server |
| 101 | + using etag information |
| 102 | + --help Show this message and exit. |
| 103 | +``` |
| 104 | +
|
| 105 | +**Example** |
| 106 | +
|
| 107 | +```shell |
| 108 | +mds s3-get -i cmems_obs-ins_med_phybgcwav_mynrt_na_irr -b mdl-native-03 -g 202311 -p INSITU_MED_PHYBGCWAV_DISCRETE_MYNRT_013_035 -o "/work/antonio/20240320" -s latest/$(date -du +"%Y%m%d") --keep-timestamps --sync-etag -f $(date -du +"%Y%m%d") |
| 109 | +``` |
| 110 | +
|
| 111 | +**Example using threads** |
| 112 | +
|
| 113 | +```shell |
| 114 | +mds s3-get --threads 10 -i cmems_obs-ins_med_phybgcwav_mynrt_na_irr -b mdl-native-03 -g 202311 -p INSITU_MED_PHYBGCWAV_DISCRETE_MYNRT_013_035 -o "." -s latest/$(date -du +"%Y%m%d") --keep-timestamps --sync-etag -f $(date -du +"%Y%m%d") |
| 115 | +``` |
| 116 | +
|
| 117 | +### s3-list |
| 118 | +
|
| 119 | +```shell |
| 120 | +Usage: mds.py s3-list [OPTIONS] |
| 121 | + |
| 122 | +Options: |
| 123 | + -b, --bucket TEXT Filter on the online files [required] |
| 124 | + -f, --filter TEXT Filter on the online files [required] |
| 125 | + -p, --product TEXT The product name [required] |
| 126 | + -i, --dataset-id TEXT Dataset Id |
| 127 | + -g, --dataset-version TEXT Dataset version or tag |
| 128 | + -s, --subdir TEXT Dataset directory on mds (i.e. {year}/{month}) - |
| 129 | + If present boost the connection |
| 130 | + -r, --recursive List recursive all s3 files |
| 131 | + --help Show this message and exit. |
| 132 | +``` |
| 133 | +
|
| 134 | +**Example** |
| 135 | +
|
| 136 | +```shell |
| 137 | +mds s3-list -b mdl-native-01 -p INSITU_GLO_PHYBGCWAV_DISCRETE_MYNRT_013_030 -i cmems_obs-ins_glo_phybgcwav_mynrt_na_irr -g 202311 -s "monthly/BO/202401" -f "*" | tr " " "\n" |
| 138 | +``` |
| 139 | +
|
| 140 | +**Example recursive** |
| 141 | +
|
| 142 | +```shell |
| 143 | +mds s3-list -b mdl-native-12 -p MEDSEA_ANALYSISFORECAST_PHY_006_013 -f '*' -r | tr " " "\n" |
| 144 | +``` |
| 145 | +
|
| 146 | +--- |
| 147 | +
|
| 148 | +## Wrapper for copernicusmarine |
| 149 | +
|
| 150 | +**The following functions rely on copernicusmarine implementation, the final result is strictly related to the installed |
| 151 | +version** |
| 152 | +
|
| 153 | +### Subset |
| 154 | +
|
| 155 | +```shell |
| 156 | +Usage: mds.py subset [OPTIONS] |
| 157 | + |
| 158 | +Options: |
| 159 | + -o, --output-directory TEXT Output directory [required] |
| 160 | + -f, --output-filename TEXT Output filename [required] |
| 161 | + -i, --dataset-id TEXT Dataset Id [required] |
| 162 | + -v, --variables TEXT Variables to download. Can be used multiple times |
| 163 | + -x, --minimum-longitude FLOAT Minimum longitude for the subset. |
| 164 | + -X, --maximum-longitude FLOAT Maximum longitude for the subset. |
| 165 | + -y, --minimum-latitude FLOAT Minimum latitude for the subset. Requires a |
| 166 | + float within this range: [-90<=x<=90] |
| 167 | + -Y, --maximum-latitude FLOAT Maximum latitude for the subset. Requires a |
| 168 | + float within this range: [-90<=x<=90] |
| 169 | + -z, --minimum-depth FLOAT Minimum depth for the subset. Requires a |
| 170 | + float within this range: [x>=0] |
| 171 | + -Z, --maximum-depth FLOAT Maximum depth for the subset. Requires a |
| 172 | + float within this range: [x>=0] |
| 173 | + -t, --start-datetime TEXT Start datetime as: |
| 174 | + %Y|%Y-%m-%d|%Y-%m-%dT%H:%M:%S|%Y-%m-%d |
| 175 | + %H:%M:%S|%Y-%m-%dT%H:%M:%S.%fZ |
| 176 | + -T, --end-datetime TEXT End datetime as: |
| 177 | + %Y|%Y-%m-%d|%Y-%m-%dT%H:%M:%S|%Y-%m-%d |
| 178 | + %H:%M:%S|%Y-%m-%dT%H:%M:%S.%fZ |
| 179 | + -r, --dry-run Dry run |
| 180 | + -g, --dataset-version TEXT Dataset version or tag |
| 181 | + -n, --username TEXT Username |
| 182 | + -w, --password TEXT Password |
| 183 | + --help Show this message and exit. |
| 184 | +``` |
| 185 | +
|
| 186 | +**Example** |
| 187 | +
|
| 188 | +```shell |
| 189 | +mds subset -f output.nc -o . -i cmems_mod_glo_phy-thetao_anfc_0.083deg_P1D-m -x -18.16667 -X 1.0 -y 30.16 -Y 46.0 -z 0.493 -Z 5727.918000000001 -t 2025-01-01 -T 2025-01-01 -v thetao |
| 190 | +``` |
| 191 | +
|
| 192 | +### Get |
| 193 | +
|
| 194 | +**Command**: |
| 195 | +
|
| 196 | +```shell |
| 197 | +Usage: mds.py get [OPTIONS] |
| 198 | + |
| 199 | +Options: |
| 200 | + -f, --filter TEXT Filter on the online files |
| 201 | + -o, --output-directory TEXT Output directory [required] |
| 202 | + -i, --dataset-id TEXT Dataset Id [required] |
| 203 | + -g, --dataset-version TEXT Dataset version or tag |
| 204 | + -s, --service TEXT Force download through one of the available |
| 205 | + services using the service name among |
| 206 | + ['original-files', 'ftp'] or its short name |
| 207 | + among ['files', 'ftp']. |
| 208 | + -d, --dry-run Dry run |
| 209 | + -u, --update If the file not exists, download it, otherwise |
| 210 | + update it it changed on mds |
| 211 | + -v, --dataset-version TEXT Dry run |
| 212 | + -nd, --no-directories TEXT Option to not recreate folder hierarchy in |
| 213 | + output directory |
| 214 | + --force-download TEXT Flag to skip confirmation before download |
| 215 | + --disable-progress-bar TEXT Flag to hide progress bar |
| 216 | + -n, --username TEXT Username |
| 217 | + -w, --password TEXT Password |
| 218 | + --help Show this message and exi |
| 219 | +``` |
| 220 | +
|
| 221 | +**Example** |
| 222 | +
|
| 223 | +```shell |
| 224 | +mds get -f '20250210*_d-CMCC--TEMP-MFSeas9-MEDATL-b20250225_an-sv10.00.nc' -o . -i cmems_mod_med_phy-tem_anfc_4.2km_P1D-m |
| 225 | +``` |
| 226 | +
|
| 227 | +### File List |
| 228 | +
|
| 229 | +To retrieve a list of file, use: |
| 230 | +
|
| 231 | +```shell |
| 232 | +Usage: mds.py file-list [OPTIONS] DATASET_ID MDS_FILTER |
| 233 | + |
| 234 | +Options: |
| 235 | + -g, --dataset-version TEXT Dataset version or tag |
| 236 | + --help Show this message and exit. |
| 237 | +``` |
| 238 | +
|
| 239 | +**Example** |
| 240 | +
|
| 241 | +```shell |
| 242 | +mds file-list cmems_mod_med_phy-cur_anfc_4.2km_PT15M-i *b20250225* -g 202411 |
| 243 | +``` |
| 244 | +
|
| 245 | +### Etag |
| 246 | +
|
| 247 | +```shell |
| 248 | +Usage: mds.py etag [OPTIONS] |
| 249 | + |
| 250 | +Options: |
| 251 | + -e, --s3_file TEXT Path to a specific s3 file - if present, other |
| 252 | + parameters are ignored. |
| 253 | + -p, --product TEXT The product name |
| 254 | + -d, --dataset_id TEXT The datasetID |
| 255 | + -v, --version TEXT Force the selection of a specific dataset version |
| 256 | + -s, --subdir TEXT Subdir structure on mds (i.e. {year}/{month}) |
| 257 | + -f, --mds_filter TEXT Pattern to filter data (no regex) |
| 258 | + --help Show this message and exit. |
| 259 | +``` |
| 260 | +
|
| 261 | +**Example** |
| 262 | +
|
| 263 | +With a specific file: |
| 264 | +
|
| 265 | +```shell |
| 266 | +mds etag -e s3://mdl-native-12/native/MEDSEA_ANALYSISFORECAST_PHY_006_013/cmems_mod_med_phy-tem_anfc_4.2km_P1D-m_202411/2023/08/20230820_d-CMCC--TEMP-MFSeas9-MEDATL-b20240607_an-sv10.00.nc |
| 267 | +``` |
| 268 | +
|
| 269 | +Or: |
| 270 | +
|
| 271 | +```shell |
| 272 | +mds etag -p MEDSEA_ANALYSISFORECAST_PHY_006_013 -i cmems_mod_med_phy-cur_anfc_4.2km_PT15M-i -g 202411 -f '*b20241212*' -s 2024/12 |
| 273 | +``` |
| 274 | +
|
| 275 | +--- |
| 276 | +
|
| 277 | +## Authors |
| 278 | +
|
| 279 | +* Antonio Mariani - [email protected] |
0 commit comments