Skip to content

Google Drive

Ian Weaver edited this page Apr 15, 2021 · 5 revisions

Data organization

We store all of the data collected by IMACS in a Team Drive, organized in the following way:

ACCESS
└── IMACS
    ├── <TARGET NAME>
    │   └── <UT DATE>
    ├── <...>
    │   └── <...>
    └── <...>
        └── <...>

For example, a dataset for HAT-P-26b collected on the night of 23 Mar 2021 would be stored as: ACCESS/IMACS/HATP26/ut210323.

Data access

The folder can be accessed as a regular cloud directory after access is granted by one of the admins. There are also handy tools for interacting with the data outside of the web browser if desired. Three that have worked well for us in the past are Backup and Sync, Google Drive File Stream (now called Google Drive for desktop, comparison here), and rclone. Rclone is especially useful when working directly from the terminal or on Linux, so we will outline basic usage instructions here.

Create a new remote

The following commands in the terminal will allow rclone to communicate with our Team Drive (more information can be found here):

> rclone config
> n (Select new remote)
> (Enter a name to call the remote)
> (Select cloud storage provider from list. This can be selected by name or number)
> (The client id and secret will be asked for next. If you are comfortable with using
   Google Cloud Platform (https://rclone.org/drive/#making-your-own-client-id),
   go ahead and enter them here. Otherwise, just press enter to skip these prompts)
> (Select the option for full access to the drive)
> (Press enter to skip the root_folder_id and service_account_file prompts)
> (Select n to skip the advanced configuration prompt)
> (Select y to complete the authetication in your browser)
> (Return back to the terminal)
> (Select y to configure as a team drive)
> (Select ACCESS from the list)
> (Select y to complete setup and q to exit the config wizard)

That's it! Next we will see how to use this new remote.

Data transfer

As of this writing, the current LCO policy is to upload data collected by the remote observer at Magellan to a secured Google Drive folder. From there, we can transfer the data using whichever method we prefer. Here, we will step through two examples using rclone, which has the capability to transfer, download, and mount files/directories between cloud storage providers in a parallel and efficient manner.

LCO to ACCESS

LCO uploads the data to a "Shared" folder, which operates differently from a Team Drive. To connect rclone to this new directory, repeat the same config steps under Create a new remote, only this time selecting "no" when asked if we would like to configure the remote as a Team Drive. We also recommend naming this remote "shared" to make differentiating between remotes easier. After setup is complete, open ~/.config/rclone/rclone.conf and add the line shared_with_me = true directly under the newly created remote. This will also prevent rclone from unnecessarily duplicating any directories we choose to sync. This is a known bug in rclone with an open issue that will essentially convert this entire step into a command line argument. This will make it so that users will not need to go through this process of manually editing the config file (or creating a separate remote dedicated to shared folders if they already have a regular remote configured for their own Google Drive account), but for now this is the recommended work-around.

With the new shared remote now configured, we can sync the LCO data with ACCESS by entering the following on the command line:

rclone sync -P --drive-server-side-across-configs=true shared:<FOLDER NAME>
ACCESS:IMACS/<TARGET>/<UT DATE>

The --dry-run flag can also be appended to preview what data will be transferred first. More information about the flags used above and other possible flags can be found here, or with man rclone and then hitting / to start a search. Briefly: -P displays real-time transfer statistics and progress and --driver-server-side-across-configs checks whether we want the transfer to happen locally or directly in the cloud. This flag is usually set to false by default because there is no guarantee that copying server side between any two cloud services will work in general. In the case of transferring between to Google drive directories, this is not an issue.

Finally, if remote observations are being done and we would like to periodically sync the data uploaded to LCO with our ACCESS directory, watch -n X can be prepended to the above command to re-run it every X seconds.

UPDATE: The new syntax for directly syncing from a shared folder is now up!

rclone sync -P <google drive remote here>,shared_with_me:<path in shared folder> <destination>

ACCESS to local computer

Downloading the data to a local computer is much more straightforward. The following command:

rclone copy -P ACCESS:IMACS/<TARGET>/<UT DATE> <UT DATE>

Will download the data to a local folder called UT DATE, which it will automatically create if it does not already exist.

Note that here we are using copy instead of sync because copy will not remove any files at the destination if they are missing from the source, while sync will. Using copy is handy because the data uploaded by LCO is in compressed format while our use case usually keeps them in their original form in a working directory. The following workflow can then be used to preserve all data in general, including when periodically pulling new data down:

  1. Copy the compressed data using the above command
  2. Uncompress the files, but keep the originals, with yes n | gunzip -k <UT DATE>/*.fits.gz (yes n passes "no" automatically each time gunzip asks if we would like to overwrite the uncompressed fits file if it already exists)

In the future, we will compress all of our remote data to improve transfer efficiency.

Home

Targets

Target List

Masks

Observations

Observing Schedule

Instruments

Magellan/IMACS

MMT/Binospec

Data

emu

Google Drive

Data Reduction

tepspec

Systematics Models

Atmospheric Models

Clone this wiki locally