-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an option to delete the cache when download completes. #394
base: master
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## master #394 +/- ##
=========================================
Coverage ? 85.36%
=========================================
Files ? 39
Lines ? 1866
Branches ? 0
=========================================
Hits ? 1593
Misses ? 273
Partials ? 0
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will probably oppose ever making this default to True. My review comments add language that highlights the benefit of the file store and the implications of removing it.
I suggest that we decide if we want to use the term "cache" or "file store". I actually prefer "cache" but you would have to eliminate all mention of "file store" from both the code and the documentation. I think sticking with "file store" is easier. Whatever we do, we cannot afford to use inconsistent language.
@hannes-ucsc All comments should be addressed. I switched |
@DailyDreaming are you still making changes or is this good to review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know there's been mixed opinions about this, but I'm starting to feel more strongly that we should use the name filestore instead of cache. Here's my reasoning:
For a cache, you typically expect that it can be cleared / deleted and the only impact will be upon performance in the future. In the case of our filestore, deleting it can have dire consequences for some kinds of downloads.
Specifically, consider what would happen if the download_manifest
command is run with the delete_cache
option set and layout=none
. The files would be downloaded to the filestore, the manifest would be updated with paths to the files, then all of the files would be deleted and the paths would be broken. After hours of downloading, the user ends up with a bunch of broken paths. This is definitely not ever desirable behavior. Using the name cache gives users the false impression that it can be safely deleted, or that running with these options could even make sense.
The example above has also made me question adding the delete_cache
feature. I think (at the very least) the name should be changed to delete_filestore
, and if the feature is maintained, it should turn off the writing of the updated manifest or even be removed completely from the manifest download. Also, if it is kept, it should come with a stronger warning, explaining that bundles downloaded with this option will use double the disk space for redundant files, etc.
Maybe we should with @hannes-ucsc, all talk about with together in person.
I hadn't thought about the case with We could of course revert the renaming, ensure that I have a third option: we enforce the specification of a download directory instead of polluting the current working directory. If the user is forced to chose the name of a directory for the download, the user will then be more likely to regard that entire directory as an artifact of the download and therefore delete that directory instead of just the files in it. Deleting the directory would remove the hardlinks created for the selected layout, the rewritten manifest and the filestore. |
@hannes-ucsc I think your third option is a good idea and perhaps we should make a separate ticket about making the download directory a required argument. It's worth pointing out however, that the use case @DailyDreaming was originally trying to solve is not addressed by that change. |
Adds functionality to address: HumanCellAtlas/data-store#2064
In the future, we might consider setting the default to
True
for this, but in the meantime, this should be a useful switch for the needs of thedss
.Will add a test if it makes sense to add this.