Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(server): deduplicate library and metadata jobs #15955

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

etnoy
Copy link
Contributor

@etnoy etnoy commented Feb 7, 2025

This adds jobs ids to several jobs which reduces duplicated work. For example, a large library refresh might go on for such a long time that a cron job kicks in and starts refreshing it before the first refresh finishes. This PR prevents it from happening.

I've also added job ids to metadata extraction and thumbnail generation, so we for instance don't queue a thumb generation for the same asset more than once.

Copy link
Member

@danieldietzler danieldietzler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM

@etnoy etnoy force-pushed the feat/job-ids branch 2 times, most recently from f8455c6 to 24e58be Compare February 7, 2025 22:41
@mertalev
Copy link
Contributor

mertalev commented Feb 8, 2025

Can you set a job ID only for the queue job and not for the asset-level ones? It should be essentially the same result but much lower impact on queueing behavior.

@etnoy
Copy link
Contributor Author

etnoy commented Feb 9, 2025

Can you set a job ID only for the queue job and not for the asset-level ones? It should be essentially the same result but much lower impact on queueing behavior.

The actual work-doing jobs of thumb generation and metadata extraction are the ones that will spend the absolute majority of time in the job queue for large libraries. The queueing jobs dwindle in comparison, and having them only check so that a queueing job already exists makes no difference to what we already have today.

After thinking about this some more I'm more inclined to think that the performance penalty of individual job ids should be worth it. In my current round of testing I've imported 500k assets and after two days I still have 350k thumbs left to generate. That's on a powerful Xeon server with 32 cores. Had I not disabled cron library scanning it would have queued the thumbnail refresh several times over by now.

@mertalev
Copy link
Contributor

mertalev commented Feb 10, 2025

I'm really not in favor of solving this by multiplying the number of requests by 1000x or even 10000x. Maybe you can check at the start of the queue job if there are any library jobs and abort? This doesn't necessarily need to be solved with job IDs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants