Skip to content

Conversation

@colinthomas-z80
Copy link
Contributor

@colinthomas-z80 colinthomas-z80 commented Nov 11, 2025

Proposed Changes

Addresses #4276

We cannot partially iterate the ready list and leave it out of order if we wish to maintain priority scheduling.

The skip list offers partial but eventually complete iteration while maintaining order in the ready task list.

Merge Checklist

The following items must be completed before PRs can be merged.
Check these off to verify you have completed all steps.

  • make test Run local tests prior to pushing.
  • make format Format source code to comply with lint policies. Note that some lint errors can only be resolved manually (e.g., Python)
  • make lint Run lint on source code prior to pushing.
  • Manual Update: Update the manual to reflect user-visible changes.
  • Type Labels: Select a github label for the type: bugfix, enhancement, etc.
  • Product Labels: Select a github label for the product: TaskVine, Makeflow, etc.
  • PR RTM: Mark your PR as ready to merge.

Copy link
Member

@btovar btovar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ready to merge?

@dthain
Copy link
Member

dthain commented Nov 11, 2025

Hold on a minute...

@dthain
Copy link
Member

dthain commented Nov 11, 2025

I don't think send_one_task has the desired effect. If no tasks are schedulable, it will rotate through the entire list, yes. But if it starts working through the list and finds one halfway, then it will return after rotating half the list, and the priority order is lost. Or am I misunderstanding something?

@colinthomas-z80
Copy link
Contributor Author

You're right I think. Maybe I can just switch back to list_iterate

@btovar
Copy link
Member

btovar commented Nov 11, 2025

list_rotate was added for performance advantages, right?
Would it be better to keep it and count how many rotations where done and then rotate back to the start if needed?

@colinthomas-z80
Copy link
Contributor Author

That could get complicated quickly if we are pushing new tasks with priorities while the list is out of order

@btovar
Copy link
Member

btovar commented Nov 11, 2025

We can avoid the code duplication if you use a function pointer for whether to use list_nextitem or list_rotate, and update q->attempt_schedule_depth to the length of the ready_queue every time when using priorities.

@colinthomas-z80
Copy link
Contributor Author

Is it necessary to call list_first_item before iterating the list normally?

@dthain
Copy link
Member

dthain commented Nov 11, 2025

Careful! You are iterating over the list but using list_pop_tail to remove a task. Those don't match up.

@colinthomas-z80
Copy link
Contributor Author

@dthain any thoughts? Maybe we create a new branch for Ben C to test first?

@benclifford
Copy link

I'm happy to test whatever branch. I'm also happy to build from source so you don't to do more than point me at a suitable git branch/commit.

@dthain
Copy link
Member

dthain commented Nov 17, 2025

Colin, if you think it's ready to go, then let's have BenC give it a test drive.

@colinthomas-z80
Copy link
Contributor Author

colinthomas-z80 commented Nov 17, 2025

@benclifford If you don't mind cloning my repo feel free to take the latest commit here and see if it helps your issue. I am uncertain about the throughput implications with your particular application.

@benclifford
Copy link

I tried this PR with the test run that I was originally complaining about in issue #4276 and it looks good. This plot is of a run which is intended to run longer tasks before shorter tasks, assuming the submitter knows the task duration, and it looks like that is true (apart from one startup task that I think is probably expected due to submission time/order). The context for that is this blog post https://parsl-project.org/2025/11/08/priority.html

image

@dthain
Copy link
Member

dthain commented Nov 19, 2025

Thank you for brining this up and then testing.
@colinthomas-z80 and @btovar can you coordinate a release for Ben?

@dthain
Copy link
Member

dthain commented Nov 20, 2025

The issue here is that list_pop only makes sense when using list_rotate. When iterating over the list, you want to remove the item at the cursor. list_remove works but runs in linear time. list_drop is the handy thing to use here: it drops the item at the location of a list_cursor.

@btovar
Copy link
Member

btovar commented Nov 21, 2025

I'll update this pr with list_item_remove, merge it and make a release.

@colinthomas-z80
Copy link
Contributor Author

It looks like list_drop requires a reference to the existing cursor which is not available outside list.c unless we re implement the iteration method using the more primitive functions

@colinthomas-z80
Copy link
Contributor Author

colinthomas-z80 commented Dec 9, 2025

@btovar I just realized this may stomp on #4286. This is currently my rather indiscriminate porting of the skip list. Do you have an opinion on which way to proceed?

I am having some issues with multiple tasks. I am guessing I made a mistake and it is not safe to remove items from the list with a cursor other than the ready_list_cursor, and that is why you merged expire_waiting_tasks into the send_one_task iteration in taskvine.

@btovar
Copy link
Member

btovar commented Dec 9, 2025

Yes, that's why I merge them. I suggest porting the skip_list over the most recent changes as that already uses cursors.

@colinthomas-z80 colinthomas-z80 changed the title WQ: Iterate whole ready list if task priority has been set WQ: port skip_list to maintain priority order in ready list. Dec 10, 2025
while( (t = list_rotate(q->ready_list)) ) {
if(tasks_considered++ > q->attempt_schedule_depth) {
return 0;
skip_list_seek(cr, 0); // start at the beginning of the skip list
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you only want to reset this when a task is retrieved, or when a new worker connects. Otherwise some tasks may be starved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this differ somehow from the taskvine implementation? It appears to be the same there

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah you are right, I forgot to make a push to that change. So yes, let's make the seek only on retrieves and new workers. I'll update taskvine accordingly.

@btovar btovar self-requested a review December 11, 2025 18:07
Copy link
Member

@btovar btovar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ready to merge?

@colinthomas-z80
Copy link
Contributor Author

If you do not have any objection, I would like to first ask @benclifford if he might test this with the priority-important workflow to make sure we retained the functionality.

I also recently discussed adding a throughput benchmark for Work Queue to our daily tests to alert us to any drastic changes. We already have one for TaskVine and it passed with the skip list so it is likely not an issue. But sometime soon I intend to get that going. If all checks out with priority I think we're good to go.

@benclifford
Copy link

ok i'll try it out

@benclifford
Copy link

I tried out my priotasks.py reproducer script with 1ac3732 and I see the right behaviour of short tasks running at the end.

@benclifford
Copy link

benclifford commented Dec 16, 2025

sorry I used the wrong commit!

I just tried again with f9ac64c and that works too.

@colinthomas-z80
Copy link
Contributor Author

Great thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants