Skip to content

Conversation

@tgross
Copy link
Member

@tgross tgross commented Nov 19, 2025

In #12319 we attempted to fix a bug where the reschedule tracker could be dropped if the scheduler couldn't place the replacement allocation. This would result in the eventual replacement being reschedulable more than the job author's policy intended.

In #5602 we introduced plan normalization, where we reduced the size of the required Raft log entry for plan apply by dropping the job spec from the plan. This required backwards compatibility shims that we intended to remove in Nomad 0.11. It's been long impossible to upgrade to any currently supported version of Nomad from that far back, so I attempted to remove these backwards compatibility shims. But in doing so, this uncovered that the #12319 fix was incorrect. The scheduler test harness used the old code paths, which did not normalize the plans. With normalized plans, we end up dropping the reschedule tracker.

This changeset fixes the bug by ensuring that a rescheduled allocation that cannot be placed is not marked with DesiredStatus: stop, to match the behavior we see when an evaluation fires before the reschedule.delay window expires. This ensures that the plan applier doesn't clear the reschedule tracker because the allocation is terminal.

I've also removed the backwards compatibility shims and version checks for plan normalization, and fixed a few test incorrect assertions revealed by the fix.

Ref: #12319
Ref: #5602


Testing

To setup a test, run a single node with the node meta role=foo:

nomad node meta apply -node-id $nodeid role=foo

Run the following job:

jobspec
job "example" {

  group "group" {

    constraint {
      attribute = "${meta.role}"
      value     = "foo"
    }

    restart {
      attempts = 0
      mode     = "fail"
    }

    reschedule {
      delay          = "60s"
      delay_function = "constant"
      max_delay      = "1h"
      unlimited      = true
    }

    task "task" {

      driver = "docker"
      user = "www-data"

      config {
        image   = "busybox:1"
        command = "httpd"
        args    = ["-vv", "-f", "-p", "8001", "-h", "/local"]
      }

      resources {
        cpu    = 100
        memory = 100
      }

    }
  }
}

Test steps

  • kill the docker container to force job to fail (note the allocation is left on desired=run)
  • while waiting for the 60s delay, nomad job eval example and get no new allocation, as expected.
  • wait for replacement allocation
  • kill the docker container again to force job to fail
  • while waiting for the 60s delay, nomad node meta apply -node-id $nodeid role=bar to make the node infeasible
  • while waiting on delay, nomad job eval example and expect no alloc
  • wait for the delay to expire, and expect no alloc

At this point, running main will have the failed alloc showing desired=stop even though it hasn't been replaced. The patch here will show it as desired=run still and the reschedule tracker's LastRechedule field will show "no placement":

$ nomad alloc status -json c1e83109 | jq .RescheduleTracker
{
  "Events": [
    {
      "PrevAllocID": "2cf5d53c-89f2-e7b5-0f2d-8791dc278e8a",
      "PrevNodeID": "ab28641e-8e53-5803-33ac-93afa57ebc12",
      "RescheduleTime": 1763671478560326633
    }
  ],
  "LastReschedule": "no placement"
}

Next unblock the eval via nomad node meta apply -node-id $nodeid role=foo. If you run nomad alloc status -json $allocid | jq .RescheduleTracker you'll see whether the replacement has a reschedule tracker. On main it will be null, whereas with this patch you'll see something like:

$ nomad alloc status -json d565 | jq .RescheduleTracker
{
  "Events": [
    {
      "PrevAllocID": "2cf5d53c-89f2-e7b5-0f2d-8791dc278e8a",
      "PrevNodeID": "ab28641e-8e53-5803-33ac-93afa57ebc12",
      "RescheduleTime": 1763671478560326633
    },
    {
      "PrevAllocID": "c1e83109-6363-be52-b030-ac91ef5a2781",
      "PrevNodeID": "ab28641e-8e53-5803-33ac-93afa57ebc12",
      "RescheduleTime": 1763671601538670025
    }
  ],
  "LastReschedule": "ok"
}

Contributor Checklist

  • Changelog Entry If this PR changes user-facing behavior, please generate and add a
    changelog entry using the make cl command.
  • Testing Please add tests to cover any new functionality or to demonstrate bug fixes and
    ensure regressions will be caught.
  • Documentation If the change impacts user-facing functionality such as the CLI, API, UI,
    and job configuration, please update the Nomad product documentation, which is stored in the
    web-unified-docs repo. Refer to the web-unified-docs contributor guide for docs guidelines.
    Please also consider whether the change requires notes within the upgrade
    guide
    . If you would like help with the docs, tag the nomad-docs team in this PR.

Reviewer Checklist

  • Backport Labels Please add the correct backport labels as described by the internal
    backporting document.
  • Commit Type Ensure the correct merge method is selected which should be "squash and merge"
    in the majority of situations. The main exceptions are long-lived feature branches or merges where
    history should be preserved.
  • Enterprise PRs If this is an enterprise only PR, please add any required changelog entry
    within the public repository.
  • If a change needs to be reverted, we will roll out an update to the code within 7 days.

Changes to Security Controls

Are there any changes to security controls (access controls, encryption, logging) in this pull request? If so, explain.

@tgross tgross force-pushed the tech-debt-plan-normalization branch from c0c1ec4 to 389c765 Compare November 20, 2025 21:34
@tgross tgross changed the title remove backcompat shims for plan normalization scheduler: reschedule tracker dropped if follow-up fails placement Nov 20, 2025
@tgross tgross assigned tgross and unassigned tgross Nov 20, 2025
@tgross tgross added type/bug backport/ent/1.8.x+ent Changes are backported to 1.8.x+ent backport/ent/1.10.x+ent backport to 1.10.x+ent release line backport/1.11.x backport to 1.11.x release line and removed type/tech-debt labels Nov 20, 2025
In #12319 we attempted to fix a bug where the reschedule tracker could be
dropped if the scheduler couldn't place the replacement allocation. This would
result in the eventual replacement being reschedulable more than the job
author's policy intended.

In #5602 we introduced plan normalization, where we reduced the size of the
required Raft log entry for plan apply by dropping the job spec from the
plan. This required backwards compatibility shims that we intended to remove in
Nomad 0.11. It's been long impossible to upgrade to any currently supported
version of Nomad from that far back, so I attempted to remove these backwards
compatibility shims. But in doing so, this uncovered that the #12319 fix was
incorrect. The scheduler test harness used the old code paths, which did not
normalize the plans. With normalized plans, we end up dropping the reschedule
tracker.

This changeset fixes the bug by ensuring that a rescheduled allocation that
cannot be placed is not marked with `DesiredStatus: stop`, to match the behavior
we see when an evaluation fires before the `reschedule.delay` window
expires. This ensures that the plan applier doesn't clear the reschedule
tracker because the allocation is terminal.

I've also removed the backwards compatibility shims and version checks for plan
normalization, and fixed a few test incorrect assertions revealed by the fix.

Ref: #12319
Ref: #5602
@tgross tgross force-pushed the tech-debt-plan-normalization branch from 389c765 to 79e48b6 Compare November 20, 2025 21:46
@tgross tgross marked this pull request as ready for review November 20, 2025 21:57
@tgross tgross requested review from a team as code owners November 20, 2025 21:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport/ent/1.8.x+ent Changes are backported to 1.8.x+ent backport/ent/1.10.x+ent backport to 1.10.x+ent release line backport/1.11.x backport to 1.11.x release line theme/plan theme/scheduling type/bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant