You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
scheduler: reschedule tracker dropped if follow-up fails placement
In #12319 we attempted to fix a bug where the reschedule tracker could be
dropped if the scheduler couldn't place the replacement allocation. This would
result in the eventual replacement being reschedulable more than the job
author's policy intended.
In #5602 we introduced plan normalization, where we reduced the size of the
required Raft log entry for plan apply by dropping the job spec from the
plan. This required backwards compatibility shims that we intended to remove in
Nomad 0.11. It's been long impossible to upgrade to any currently supported
version of Nomad from that far back, so I attempted to remove these backwards
compatibility shims. But in doing so, this uncovered that the #12319 fix was
incorrect. The scheduler test harness used the old code paths, which did not
normalize the plans. With normalized plans, we end up dropping the reschedule
tracker.
This changeset fixes the bug by ensuring that a rescheduled allocation that
cannot be placed is not marked with `DesiredStatus: stop`, to match the behavior
we see when an evaluation fires before the `reschedule.delay` window
expires. This ensures that the plan applier doesn't clear the reschedule
tracker because the allocation is terminal.
I've also removed the backwards compatibility shims and version checks for plan
normalization, and fixed a few test incorrect assertions revealed by the fix.
Ref: #12319
Ref: #5602
scheduler: Fixed a bug that was previously patched incorrectly where rescheduled allocations that could not be placed would later ignore their reschedule policy limits
0 commit comments