A collection of python scripts to automatically download large dataset for Dataverse
The download_client.py script allows to download a dataset from Dataverse.
The client takes three command line arguments:
--server_urlis the url of the Dataverse repository (e.g.https://dataverse.harvard.edu)--doiis the persistent id (DOI) of the dataset (e.g.doi:10.7910/DVN/CUFVKE)--patternis a Unix filename matching pattern to download selected files from a repository (e.g.*will download all files in the the dataset)
This script has been tested on Linux and MacOS
As an example we consider the 2D Acoustic Numerical Breast Phantoms and USCT Measurement Data dataset (doi:10.7910/DVN/CUFVKE). Please note that this is a very large dataset (~1TB) so download with cautions!
-
Download the full dataset (not reccomended!):
python -u download_client.py --doi doi:10.7910/DVN/CUFVKE -
Download all files with a given extension (e.g. all
.matfiles):python -u download_client.py --doi doi:10.7910/DVN/CUFVKE --pattern *.mat -
Download as specific file (e.g.
read_data.m):python -u download_client.py --doi doi:10.7910/DVN/CUFVKE --pattern read_data.m