Skip to content

Add short term storage expiration indicator to history items #20332

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 11 commits into
base: dev
Choose a base branch
from

Conversation

davelopez
Copy link
Contributor

@davelopez davelopez commented May 22, 2025

xref #20169

This simple approach should not be too expensive and can help the user identify when a dataset might be gone because it is stored in a short-term object store.

image

image

This works just by annotating the object store config with a new property object_expires_after_days:

    - id: scratch
      type: disk
      device: device2
      weight: 0
      allow_selection: true
      private: true
      name: Scratch Storage
      description: >
          This object store is connected to institutional scratch storage. This disk space is not backed up and private to
          your user, and datasets belonging to this storage will be automatically deleted after one month.
      quota:
          source: second_tier
      files_dir: /home/dlopez/sandbox/data-gx/dev/objects/temp
      badges:
          - type: faster
          - type: less_stable
          - type: not_backed_up
          - type: short_term
            message: The data stored here is purged after a month.
      object_expires_after_days: 30

There are still some drawbacks to consider/resolve:

  • Synchronize the object store config property object_expires_after_days with the actual expiration time of the object store. It seems the cleanup of the object store is handled by external processes, so this value must be in sync with the actual expiration time of the object store.
  • Collections do not have an object_store_id property. I wonder if we could "estimate" or "assume" the object store ID of a collection by looking at the object store ID of the first dataset in the collection. This is not ideal, but maybe it could be a good enough workaround? I'm not sure how often collection elements are stored in mixed object stores, but I guess it could happen.

How to test the changes?

(Select all options that apply)

  • I've included appropriate automated tests.
  • This is a refactoring of components with existing test coverage.
  • Instructions for manual testing are as follows:
    1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

  • I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

@jmchilton
Copy link
Member

I'm anxious about this idea for a few reasons but Anton is the boss 🤷‍♀️.

If you descend into collections in the history panel - do you get the icon on individual datasets there? The query that summarizes states across the whole collection could accumulate the object store IDS at the same time - it would be a wild query but it would probably the easiest and correct thing to do to summarize the dataset collection. I guess we couldn't get a count down in that case but we could add a storage temp storage icon with more information like a per dataset break down by clicking on it.

@davelopez
Copy link
Contributor Author

I understand and share your concerns, especially with collections. I also think the other proposed solutions, like sending emails, are even more concerning. It would be really hard to do it right and not turn it into a massive spam generator, so likely not worth it 😅

If you descend into collections in the history panel - do you get the icon on individual datasets there?

I would say no. My idea was to do something less accurate, but enough to "inform" the user that the datasets or collections used will be temporal. I thought displaying something a the top level would be enough, if you drilled down that collection, you must have already seen the "indication" and we could still display it at the top.

I know it is technically possible to mix elements from different object stores in the same collection, but will this be a common case? I was hoping we could assume a single common object store for the HDCA by peeking into just one of its datasets. But yeah, in the worst case, we could do what you suggest, aggregating the object store IDs in the summarize query, and if there is at least one object store ID known to be short-term, just display some warning at the top. This would probably already be a huge improvement in raising awareness of the temporary nature of the selected storage without needing many more features.

@jmchilton
Copy link
Member

And we're certain we cannot just take scratch away from people who complain? We "promote" them to a "higher tier" of user where all there data is permanent storage and advanced options are disabled. Not going to fly huh?

I know it is technically possible to mix elements from different object stores in the same collection, but will this be a common case?

It is probably uncommon but they are pretty easy to create and it would be my guess that they would be more common/have more obvious use cases than say mixing dbkeys or file extensions and we deal with a mix of those in the UI in a mostly "correct" fashion.

@nsoranzo
Copy link
Member

The other option is not show the indicator at all for collections, but only when the user drills down to the dataset level.

@davelopez
Copy link
Contributor Author

I made an attempt to include the set of object_store_ids as we do with dbKeys and extensions in 048e2d6, and then find the shortest time to expiration in any of them. I assume that as soon as one of the elements of a collection expires, the whole collection can be considered expired, as it can no longer be used completely.

image

image

Let me know if this is still a bad idea 😅

davelopez added 3 commits May 27, 2025 09:59
This is an optional property that can indicate the number of days that an object (file) will be stored in a short term storage.
To display expiration status of datasets stored in a short term object store.
davelopez added 7 commits May 27, 2025 12:07
Reusing the same query for dbKeys and extensions, we get a unique set of object_store_ids where the elements of the collection are stored.
In case of multiple object stores, we pick the one with the shortest expiration time as we can assume that as soon as the first element expires, the entire collection should be considered "expired" since we cannot access all elements anymore.
I don't remember exactly why this was set to optional, but It seems the default of the database field will always be datetime.now so it makes more sense to make the value required.
@davelopez davelopez force-pushed the explore_short_term_storage_expiration_indicator branch from 86935aa to a0fd976 Compare May 27, 2025 10:08
…ired test

To handle mock datasets when serializing the collection during export
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants