Skip to content

Add Asset garbage collection design doc #2367

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

mvandenburgh
Copy link
Member

Outlines a design for Asset garbage collection. Implementing this will also unblock garbage collection from being run in general, so the idea is that we will begin running the collect_garbage script for assets/uploads/asset blobs after this feature lands.

@mvandenburgh
Copy link
Member Author

@satra @yarikoptic PTAL and let me know if you have any questions or objections to this. I've also included the current counts of garbage-collectible assets and their total file sizes; it looks like we will be able to clean up a significant amount of stale data (almost 50%!).

- AssetBlobs: 88 (375,936,742,348 bytes / 350.12 GB)
- Uploads: 962

**Note that around 50% of the existing data in DANDI (About 329 TB out of 787 TB) consists of orphaned Assets and will be garbage collected.**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure i'm following this section since assets don't have bytes associated with them, just blobs do. an orphaned asset may be attached to a blob that is not orphaned (i.e. connected to another asset)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, "Assets" refers to the bytes taken up by the blobs that would become orphaned if Asset GC was run. Whereas, "AssetBlobs" refers to blobs that are already orphaned as-is.

I could have worded that slightly better, I'll update the doc to clarify.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but where are the numbers coming from? for example it says X of Y. i see X as the bytes associated, but where does Y come from?

@yarikoptic yarikoptic added the design-doc Involves creating or discussing a design document label Jun 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design-doc Involves creating or discussing a design document
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants