Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an option to delete the cache when download completes. #394

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

DailyDreaming
Copy link
Contributor

@DailyDreaming DailyDreaming commented Jul 17, 2019

Adds functionality to address: HumanCellAtlas/data-store#2064

In the future, we might consider setting the default to True for this, but in the meantime, this should be a useful switch for the needs of the dss.

Will add a test if it makes sense to add this.

@codecov-io
Copy link

codecov-io commented Jul 17, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@a496e08). Click here to learn what that means.
The diff coverage is 88.23%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master     #394   +/-   ##
=========================================
  Coverage          ?   85.36%           
=========================================
  Files             ?       39           
  Lines             ?     1866           
  Branches          ?        0           
=========================================
  Hits              ?     1593           
  Misses            ?      273           
  Partials          ?        0
Impacted Files Coverage Δ
hca/dss/__init__.py 89.86% <88.23%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a496e08...bd24130. Read the comment docs.

Copy link
Contributor

@hannes-ucsc hannes-ucsc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will probably oppose ever making this default to True. My review comments add language that highlights the benefit of the file store and the implications of removing it.

I suggest that we decide if we want to use the term "cache" or "file store". I actually prefer "cache" but you would have to eliminate all mention of "file store" from both the code and the documentation. I think sticking with "file store" is easier. Whatever we do, we cannot afford to use inconsistent language.

@DailyDreaming
Copy link
Contributor Author

DailyDreaming commented Jul 29, 2019

@hannes-ucsc All comments should be addressed. I switched filestore to cache as well, since that is my preference as well.

@kozbo kozbo added this to the Q3 2019 Milestone 2 milestone Jul 31, 2019
@jessebrennan
Copy link
Collaborator

@DailyDreaming are you still making changes or is this good to review?

Copy link
Collaborator

@jessebrennan jessebrennan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know there's been mixed opinions about this, but I'm starting to feel more strongly that we should use the name filestore instead of cache. Here's my reasoning:

For a cache, you typically expect that it can be cleared / deleted and the only impact will be upon performance in the future. In the case of our filestore, deleting it can have dire consequences for some kinds of downloads.

Specifically, consider what would happen if the download_manifest command is run with the delete_cache option set and layout=none. The files would be downloaded to the filestore, the manifest would be updated with paths to the files, then all of the files would be deleted and the paths would be broken. After hours of downloading, the user ends up with a bunch of broken paths. This is definitely not ever desirable behavior. Using the name cache gives users the false impression that it can be safely deleted, or that running with these options could even make sense.

The example above has also made me question adding the delete_cache feature. I think (at the very least) the name should be changed to delete_filestore, and if the feature is maintained, it should turn off the writing of the updated manifest or even be removed completely from the manifest download. Also, if it is kept, it should come with a stronger warning, explaining that bundles downloaded with this option will use double the disk space for redundant files, etc.

Maybe we should with @hannes-ucsc, all talk about with together in person.

@hannes-ucsc
Copy link
Contributor

hannes-ucsc commented Aug 6, 2019

I hadn't thought about the case with --layout=none, that @jessebrennan brought up. --layout=none directly contradicts --delete-cache because using both would render a rewritten manifest with paths to non-existent files.

We could of course revert the renaming, ensure that --delete-filestore can't be used in conjunction with --layout=none by making them mutually exclusive.

I have a third option: we enforce the specification of a download directory instead of polluting the current working directory. If the user is forced to chose the name of a directory for the download, the user will then be more likely to regard that entire directory as an artifact of the download and therefore delete that directory instead of just the files in it. Deleting the directory would remove the hardlinks created for the selected layout, the rewritten manifest and the filestore.

@jessebrennan
Copy link
Collaborator

@hannes-ucsc I think your third option is a good idea and perhaps we should make a separate ticket about making the download directory a required argument. It's worth pointing out however, that the use case @DailyDreaming was originally trying to solve is not addressed by that change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants