Feature/task concurrency control#148
Conversation
chrisguidry
left a comment
There was a problem hiding this comment.
Thanks @abrookins! First off, the docs and tests are 🤌. The implementation looks good, but I think there's a way to weave this into the worker that's a little more natural and may need a little less code.
Here you're changing start_task and process_completed_tasks, but there's a natural spot for handling per-task things like this, the Worker._execute method. That would have a couple of advantages, because we already pull out the tasks dependencies (see how the timeout is used there) and we already have some natural exception handling with retry logic (and the ability to put the silly log glyphs in). I think if you stitch in concurrency limiting there, it would come out a little simpler.
The only other question I had was about the inevitable orphaning of concurrency slots. Any thoughts there about how to expire them? What if it was a sorted set with timestamps and we used the workers redelivery_timeout as the expiration? I think you could do a quick atomic ZREMRANGE in the lua script before checking the ZCARD. What do you think?
Great points. I haven't thought much about expiring the slots. Let me explore some ideas -- I'm wondering if we can run an async task to refresh the timestamp while a task is running. It won't be perfect, but we can at least cover some bases for longer-running tasks. If I'm thinking about this correctly, that is. |
2b10df4 to
9c0f13d
Compare
|
@chrisguidry Here's what I'm considering: https://github.com/chrisguidry/docket/pull/148/files#diff-faf9939804414c2603b6478851789a5f3ac874bbcb82a63a3af6d3ccfa780b0fR962-R986 So basically, each worker would start one coroutine that manages refreshing the timestamp on any active tasks. We don't attempt to spawn one coroutine per active task, which could be problematic, and we also don't try to solve the problem of tasks that are intentionally blocked on CPU (meh). I'm not attached to this idea, but what do you think? |
My initial reaction was that your The redelivery timeout isn't currently paired with a corresponding hard-wired task timeout. Even if the user doesn't request a timeout, every task should probably be timed out at the redelivery timeout (or the min of the user's and the redelivery timeout). I was only thinking of the redelivery timeout as being about ensuring tasks get processed when workers die ungracefully, but there's also a problem if a single task runs for longer than the redelivery timeout and then gets redelivered to another worker, which starts working on it, which then exceeds the redelivery timeout and gets redelivered to another, etc, etc, etc. Does it seem reasonable to always time tasks out at the redelivery timeout (or sooner if they request it) and then you wouldn't need your lease extending mechanism? Definitely still has problems if tasks are hogging the CPU and starving the event loop, but that's something we probably can't help with |
|
Let me ponder this while I mine for the 100% test coverage gold. |
|
Ok, yes, after thinking about this, I think your proposal is right. I'll try to implement it in this PR, but I'll be on vacation next week. We'll see where I get before then! |
|
I'm back and will be looking at this during the week. 🫡 |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #148 +/- ##
==========================================
Coverage 100.00% 100.00%
==========================================
Files 28 31 +3
Lines 3675 4382 +707
Branches 205 246 +41
==========================================
+ Hits 3675 4382 +707
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
|
@chrisguidry Ok, other than the docs failure, I think this is good to review again. 🫡 |
chrisguidry
left a comment
There was a problem hiding this comment.
Do you ever get tired of hearing how good your tests are?
| ) | ||
|
|
||
| def start_task(message_id: RedisMessageID, message: RedisMessage) -> bool: | ||
| async def start_task(message_id: RedisMessageID, message: RedisMessage) -> bool: |
There was a problem hiding this comment.
This guy won't need to be async anymore 💪
| if not await self._can_start_task(redis, execution): | ||
| # Task cannot start due to concurrency limits - reschedule | ||
| logger.debug( | ||
| "🔒 Task %s blocked by concurrency limit, rescheduling", |
| execution.key, | ||
| extra=log_context, | ||
| ) | ||
| # Reschedule for a few milliseconds in the future |
There was a problem hiding this comment.
Not for this PR, but @bunchesofdonald has cool algorithms for this we can stitch in later
|
I invited you as a contributor so you can merge and cut a release. All you would need to do is make a release in the GH UI or CLI and everything else should be automated |
|
Oh snap, thanks my dude! |
Adds a
ConcurrencyLimitcontext manager. Closes #86.NOTE: I haven't tested this!