-
Notifications
You must be signed in to change notification settings - Fork 124
Description
I'm seeing behaviour where WQ tasks aren't executed in priority order, for a Parsl user.
I hacked around in the codebase and I can see the priorities delivered all the way into to the operations that work on q->ready_list.
But then it looks like the list is getting out of order: I added this check at the start of send_one_task on a hypothesis I assumed, that q->ready_list should be in priority order. (although I'm unclear if that is expected to be true).
I end up with a q->ready_list that is quite out of order. For example:
BENC: checking list order
BENC: entry priority -18.730013
BENC: entry priority -19.377128
BENC: entry priority -20.990366
BENC: entry priority -21.471784
BENC: entry priority -22.673790
BENC: entry priority -22.888660
BENC: entry priority -25.742648
BENC: entry priority -26.607197
BENC: entry priority -28.126937
BENC: entry priority -29.548732
BENC: entry priority -29.638328
BENC: entry priority -30.239305
BENC: entry priority -33.161287
BENC: entry priority -34.062007
BENC: entry priority -37.991897
BENC: entry priority -40.609988
BENC: entry priority -40.727513
BENC: entry priority -41.533317
BENC: entry priority -44.636339
BENC: entry priority -45.619628
BENC: entry priority -45.753476
BENC: entry priority -45.829790
BENC: entry priority -47.320308
BENC: entry priority -47.816183
BENC: entry priority -48.364868
BENC: entry priority -48.679864
BENC: entry priority -53.935601
BENC: entry priority -56.517787
BENC: entry priority -61.032222
BENC: entry priority -4.027813
BENC: list is out of order - it should be descending
BENC: entry priority -4.034614
BENC: entry priority -4.096619
BENC: entry priority -4.129197
BENC: entry priority -4.259326
BENC: entry priority -4.277225
BENC: entry priority -4.307081
BENC: entry priority -4.629009
BENC: entry priority -4.847665
BENC: entry priority -4.895281
BENC: entry priority -5.912347
BENC: entry priority -6.119405
BENC: entry priority -6.989896
BENC: entry priority -7.565397
BENC: entry priority -7.774574
BENC: entry priority -7.992148
BENC: entry priority -8.028452
BENC: entry priority -8.735769
BENC: entry priority -8.971470
BENC: entry priority -9.086354
BENC: entry priority -9.511160
BENC: entry priority -10.241796
BENC: entry priority -10.438019
BENC: entry priority -10.801479
BENC: entry priority -11.053111
BENC: entry priority -11.512188
BENC: entry priority -11.749560
BENC: entry priority -12.064031
BENC: entry priority -12.781551
BENC: entry priority -13.288103
BENC: entry priority -13.462147
BENC: entry priority -14.395489
BENC: entry priority -14.497951
BENC: entry priority -15.642678
BENC: entry priority -17.450457
BENC: checked list order
Over time that list seems to get permuted rotated around a bit - staying in-order-except-for-one-patch.
Here's the code for the above output I added at the start of send_one_task in work_queue.c
@@ -4589,18 +4598,42 @@ static int send_one_task( struct work_queue *q )
int tasks_considered = 0;
timestamp_t now = timestamp_get();
+ // the list should be in order, larger values later
+ // because thats how priorities work.
+
+ printf("BENC: checking list order\n");
+ void *i = NULL;
+ void *last = NULL;
+ struct list_cursor *cur = list_cursor_create(q->ready_list);
+ for (list_seek(cur, 0); list_get(cur, &i); list_next(cur)) {
+ printf("BENC: entry priority %f\n", work_queue_task_priority(i));
+ if(last == NULL) {
+ last = i;
+ } else {
+ if(!(work_queue_task_priority(i) < work_queue_task_priority(last))) {
+ printf("BENC: list is out of order - it should be descending\n");
+ }
+ last = i;
+ }
+ // if (p(i) <= p(item)) {
+ i = NULL;
+ }
+ list_cursor_destroy(cur);
+ printf("BENC: checked list order\n");
I think that is the superficial explanation of why I'm not seeing tasks run in priority order.
I'm not clear from the code what the invariants are meant to be - what's happening with that list_rotate call right after that as far as I understand it deliberately violates the above invariant.
Let me know if you want a more focused reproducer or any other investigation.
Attached is the parsl code I'm running that generates a prioritised workload, which is what I have been using to investigate. You should be able to run it as python3 priotasks.py. It doesn't actually give any machine-tested validation that tasks are run in priority order, but it does enough to drive the above patch output.
This is added onto 2c8f80d