Skip to content

feat: Flotilla scheduler and dispatcher actors #4375

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
May 29, 2025
Merged

Conversation

colin-ho
Copy link
Contributor

@colin-ho colin-ho commented May 19, 2025

Changes Made

Implement scheduler + dispatcher actors for flotilla. Including unit tests for both.

Related Issues

Checklist

  • Documented in API Docs (if applicable)
  • Documented in User Guide (if applicable)
  • If adding a new documentation page, doc is added to docs/mkdocs.yml navigation
  • Documentation builds and is formatted properly (tag @/ccmao1130 for docs review)

@github-actions github-actions bot added the feat label May 19, 2025
Copy link

codecov bot commented May 19, 2025

Codecov Report

Attention: Patch coverage is 88.52941% with 78 lines in your changes missing coverage. Please review.

Project coverage is 76.73%. Comparing base (de7f0ca) to head (be4c54f).
Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
src/daft-distributed/src/python/ray/worker.rs 0.00% 35 Missing ⚠️
src/daft-distributed/src/python/ray/task.rs 0.00% 33 Missing ⚠️
.../daft-distributed/src/python/ray/worker_manager.rs 0.00% 6 Missing ⚠️
src/daft-distributed/src/scheduling/task.rs 96.42% 2 Missing ⚠️
src/daft-distributed/src/utils/joinset.rs 77.77% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #4375      +/-   ##
==========================================
- Coverage   77.72%   76.73%   -0.99%     
==========================================
  Files         844      846       +2     
  Lines      113446   115597    +2151     
==========================================
+ Hits        88176    88704     +528     
- Misses      25270    26893    +1623     
Files with missing lines Coverage Δ
src/daft-distributed/src/scheduling/dispatcher.rs 100.00% <100.00%> (+100.00%) ⬆️
...c/daft-distributed/src/scheduling/scheduler/mod.rs 92.56% <100.00%> (+10.35%) ⬆️
...ibuted/src/scheduling/scheduler/scheduler_actor.rs 93.22% <100.00%> (+93.22%) ⬆️
src/daft-distributed/src/scheduling/worker.rs 68.31% <100.00%> (+48.51%) ⬆️
src/daft-distributed/src/scheduling/task.rs 74.19% <96.42%> (+28.79%) ⬆️
src/daft-distributed/src/utils/joinset.rs 88.66% <77.77%> (+4.99%) ⬆️
.../daft-distributed/src/python/ray/worker_manager.rs 0.00% <0.00%> (ø)
src/daft-distributed/src/python/ray/task.rs 2.35% <0.00%> (-0.15%) ⬇️
src/daft-distributed/src/python/ray/worker.rs 2.19% <0.00%> (-0.11%) ⬇️

... and 39 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@colin-ho colin-ho changed the title feat: Implement flotilla scheduler and dispatcher feat: Flotilla scheduler and dispatcher actors May 19, 2025
@colin-ho colin-ho changed the base branch from main to colin/flotilla-default-scheduler May 20, 2025 01:33
Base automatically changed from colin/flotilla-default-scheduler to main May 29, 2025 18:48
@colin-ho colin-ho requested a review from srilman May 29, 2025 20:58
@colin-ho
Copy link
Contributor Author

Todo, add trace::debugs in the scheduler + dispatcher actors

@colin-ho colin-ho requested a review from samster25 May 29, 2025 20:59
Copy link
Contributor

@srilman srilman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks good, this time had more questions rather than comments. Will approve though

}
}
}
println!("[Scheduler] Scheduler loop complete");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eprintln for logs instead so they don't get buffered?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once this really tripped me up during debugging

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, will do

) -> DaftResult<()> {
todo!("FLOTILLA_MS1: Implement run scheduler loop");
let mut input_exhausted = false;
while !input_exhausted || scheduler.num_pending_tasks() > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have input_exhausted? Is it for tasks that get canceled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Input exhausted means that there won't be any new tasks submitted to the scheduler.

I can probably simplify this logic a bit more, the loop condition i was going for was essentially "keep looping until you have no more tasks to schedule"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, is it possible for input_exhausted = true and scheduler.num_pending_tasks() > 0, and in what state would that be?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah lets say we had 100 tasks in total for this query, but only 1 worker with 1 core.

All 100 tasks could have been submitted to the scheduler already, so input is exhausted, but scheduler could have 99 pending tasks because it only just scheduled 1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

input exhausted means the scheduler will not receive any more tasks via the task receiver channel. but pending tasks is the internal priority queue in the scheduler, which could still have tasks

@colin-ho colin-ho merged commit 302fc17 into main May 29, 2025
47 checks passed
@colin-ho colin-ho deleted the colin/scheduler-impl branch May 29, 2025 23:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants