-
Notifications
You must be signed in to change notification settings - Fork 3.9k
chore(engine)!: Share worker threads across all scheduler connections #20229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This adds a new message, WorkerSubscribe, used by the scheduler to explicitly request when a worker has at least one worker thread available. Workers that receive a WorkerSubscribe message *must* send a WorkerReady to the subscribed scheduler the next time a worker thread is available, or if a worker thread is already available. Sending a WorkerReady message in response to a WorkerSubscribe should clear the subscription to reduce message noise. IThis commit only updates the scheduler to send the message, but workers are not yet updated to act on it. ntroducing this message is backwards-compatible, as schedulers can still receive WorkerReady messages without a subscription.
Previously, worker threads would be assigned to a random connected scheduler, with that assignment sticking until: * That thread completes a task from the assigned scheduler, or * the scheduler disconnects. This is fine if there's only one scheduler, but it causes issues when there are multiple schedulers: * Idle schedulers may "hog" a thread that could be used by a busy scheduler. * While the random selection is even over time, any given instant in time does not have even distribution of threads. In the worse case some schedulers may have zero threads assigned to them. This is a particular problem as the total compute capcity per scheduler decreases as the number of schedulers go up, even if queries are distributed amongst the schedulers. This commit fixes this issue by allowing any scheduler to give a task to any ready worker thread. This permits scaling schedulers independently of the number of workers without risking saturation. Details ------- A new message type, WorkerSubscribe, is introduced. This message is sent by a scheduler to a worker, asking the worker to send a WorkerReady when there is at least one ready thread. WorkerSubscribe is sent by the scheduler when a worker connects, and after a worker sends an HTTP 429 after rejecting a task assignment. A new mechanism, `jobManager`, is used by the worker to bridge the connection to schedulers and running worker threads. It is implemented as a cancellable condition variable. BREAKING CHANGE: Workers now expect schedulers to send WorkerSubscribe before any WorkerReady message is sent. Signed-off-by: Robert Fratto <[email protected]>
Adds basic worker metrics to track state: * `loki_engine_worker_tasks_assigned_total` reports total number of successfully assigned tasks. * `loki_engine_worker_task_exec_seconds` reports a histogram of task execution time, permitting to drill down to task time at the worker level to detect symptoms of CPU saturation. * `loki_engine_worker_threads` reports the number of worker threads, by state (idle, ready, busy). `loki_engine_worker_threads` can be used in combination with the scheduler load metric to compute scheduler saturation on the fly. Signed-off-by: Robert Fratto <[email protected]>
d432232 to
31ee43c
Compare
trevorwhitney
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks pretty good to me. I have one question about naming, but other than that (and especially since we've already tested it) I'm happy to ✅ , but I'm commenting for now to give others who have been more involved with scheduling to get a chance to review.
| // from workers once they have at least one worker thread available. | ||
| // | ||
| // The subscription is cleared once the next WorkerReadyMessage is sent. | ||
| message WorkerSubscribeMessage {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to capture the flow of messages in their name? for example, WorkerReady and WorkerHello are from workers, but WorkerSubscribe is now to the workers. Do we want the name to capture this is from the Scheduler asking the working for a subscription, and not from the Worker asking to subscribe? I'm thinking something like SchedulerReadyForSubscription or SchedulerPing?
| } | ||
|
|
||
| // Request to be notified when the worker is ready. | ||
| s.workerSubscribe(ctx, worker) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in which case this would becaome s.readyForSubscription?
Note
I split this up across multiple commits, it's probably a bit easier to review commit-by-commit.
Previously, worker threads would be assigned to a random connected scheduler, with that assignment sticking until:
This is fine if there's only one scheduler, but it causes issues when there are multiple schedulers:
Idle schedulers may "hog" a thread that could be used by a busy scheduler.
While the random selection is even over time, any given instant in time does not have even distribution of threads. In the worse case some schedulers may have zero threads assigned to them.
This is a particular problem as the total compute capacity per scheduler decreases as the number of schedulers go up, even if queries are distributed amongst the schedulers.
This PR fixes this issue by allowing any scheduler to give a task to any ready worker thread. This permits scaling schedulers independently of the number of workers without risking saturation.
Details
A new message type, WorkerSubscribe, is introduced. This message is sent by a scheduler to a worker, asking the worker to send a WorkerReady when there is at least one ready thread.
WorkerSubscribe is sent by the scheduler when a worker connects, and after a worker sends an HTTP 429 after rejecting a task assignment.
A new mechanism,
jobManager, is used by the worker to bridge the connection to schedulers and running worker threads. It is implemented as a cancellable condition variable.BREAKING CHANGE: Workers now expect schedulers to send WorkerSubscribe before any WorkerReady message is sent.