-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(server): deduplicate library and metadata jobs #15955
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code LGTM
f8455c6
to
24e58be
Compare
Can you set a job ID only for the queue job and not for the asset-level ones? It should be essentially the same result but much lower impact on queueing behavior. |
The actual work-doing jobs of thumb generation and metadata extraction are the ones that will spend the absolute majority of time in the job queue for large libraries. The queueing jobs dwindle in comparison, and having them only check so that a queueing job already exists makes no difference to what we already have today. After thinking about this some more I'm more inclined to think that the performance penalty of individual job ids should be worth it. In my current round of testing I've imported 500k assets and after two days I still have 350k thumbs left to generate. That's on a powerful Xeon server with 32 cores. Had I not disabled cron library scanning it would have queued the thumbnail refresh several times over by now. |
I'm really not in favor of solving this by multiplying the number of requests by 1000x or even 10000x. Maybe you can check at the start of the queue job if there are any library jobs and abort? This doesn't necessarily need to be solved with job IDs. |
This adds jobs ids to several jobs which reduces duplicated work. For example, a large library refresh might go on for such a long time that a cron job kicks in and starts refreshing it before the first refresh finishes. This PR prevents it from happening.
I've also added job ids to metadata extraction and thumbnail generation, so we for instance don't queue a thumb generation for the same asset more than once.