-
Notifications
You must be signed in to change notification settings - Fork 0
Google Drive
We store all of the data collected by IMACS in a Team Drive, organized in the following way:
ACCESS
└── IMACS
├── <TARGET NAME>
│ └── <UT DATE>
├── <...>
│ └── <...>
└── <...>
└── <...>
For example, a dataset for HAT-P-26b collected on the night of 23 Mar 2021 would be stored
as: ACCESS/IMACS/HATP26/ut210323
.
The folder can be accessed as a regular cloud directory after access is granted by one of the admins. There are also handy tools for interacting with the data outside of the web browser if desired. Three that have worked well for us in the past are Backup and Sync, Google Drive File Stream (now called Google Drive for desktop, comparison here), and rclone. Rclone is especially useful when working directly from the terminal or on Linux, so we will outline basic usage instructions here.
The following commands in the terminal will allow rclone to communicate with our Team Drive (more information can be found here):
> rclone config
> n (Select new remote)
> (Enter a name to call the remote)
> (Select cloud storage provider from list. This can be selected by name or number)
> (The client id and secret will be asked for next. If you are comfortable with using
Google Cloud Platform (https://rclone.org/drive/#making-your-own-client-id),
go ahead and enter them here. Otherwise, just press enter to skip these prompts)
> (Select the option for full access to the drive)
> (Press enter to skip the root_folder_id and service_account_file prompts)
> (Select n to skip the advanced configuration prompt)
> (Select y to complete the authetication in your browser)
> (Return back to the terminal)
> (Select y to configure as a team drive)
> (Select ACCESS from the list)
> (Select y to complete setup and q to exit the config wizard)
That's it! Next we will see how to use this new remote.
As of this writing, the current LCO policy is to upload data collected by the remote observer at Magellan to a secured Google Drive folder. From there, we can transfer the data using whichever method we prefer. Here, we will step through two examples using rclone, which has the capability to transfer, download, and mount files/directories between cloud storage providers in a parallel and efficient manner.
LCO uploads the data to a "Shared" folder, which operates differently from a Team Drive.
To connect rclone to this new directory, repeat the same config steps under Create a new
remote, only this time selecting "no" when asked if we would like
to configure the remote as a Team Drive. We also recommend naming this remote "shared" to
make differentiating between remotes easier. After setup is complete, open
~/.config/rclone/rclone.conf
and add the line shared_with_me = true
directly under the
newly created remote. This will also prevent rclone from unnecessarily duplicating any
directories we choose to sync. This is a known bug in rclone with an open
issue that will essentially convert this
entire step into a command line argument. This will make it so that users will not need to
go through this process of manually editing the config file (or creating a separate remote
dedicated to shared folders if they already have a regular remote configured for their own
Google Drive account), but for now this is the recommended work-around.
With the new shared
remote now configured, we can sync the LCO data with ACCESS by
entering the following on the command line:
rclone sync -P --drive-server-side-across-configs=true shared:<FOLDER NAME>
ACCESS:IMACS/<TARGET>/<UT DATE>
The --dry-run
flag can also be appended to preview what data will be transferred first.
More information about the flags used above and other possible flags can be found
here, or with man rclone
and then hitting /
to start a
search. Briefly: -P
displays real-time transfer statistics and progress and
--driver-server-side-across-configs
checks whether we want the transfer to happen
locally or directly in the cloud. This flag is usually set to false by default because
there is no guarantee that copying server side between any two cloud services will work in
general. In the case of transferring between to Google drive directories, this is not an
issue.
Finally, if remote observations are being done and we would like to periodically sync the
data uploaded to LCO with our ACCESS directory, watch -n X
can be prepended to the above
command to re-run it every X seconds.
UPDATE: The new syntax for directly syncing from a shared folder is now up!
rclone sync -P <google drive remote here>,shared_with_me:<path in shared folder> <destination>
Downloading the data to a local computer is much more straightforward. The following command:
rclone copy -P ACCESS:IMACS/<TARGET>/<UT DATE> <UT DATE>
Will download the data to a local folder called UT DATE
, which it will automatically
create if it does not already exist.
Note that here we are using copy
instead of sync
because copy
will not remove any
files at the destination if they are missing from the source, while sync
will. Using
copy
is handy because the data uploaded by LCO is in compressed format while our use
case usually keeps them in their original form in a working directory. The following
workflow can then be used to preserve all data in general, including when periodically
pulling new data down:
- Copy the compressed data using the above command
- Uncompress the files, but keep the originals, with
yes n | gunzip -k <UT DATE>/*.fits.gz
(yes n
passes "no" automatically each timegunzip
asks if we would like to overwrite the uncompressed fits file if it already exists)
In the future, we will compress all of our remote data to improve transfer efficiency.