Periodic sysbatch jobs run much more frequently than the spec expresses.

### Nomad version
1.5.5 but also the tip.

### Issue

Due to the garbage collection logic applied to periodic sysbatch jobs (and sysbatch jobs in general), the sysbatch jobs will run much more frequently than the job spec expresses. In particular, consider the following:

1. Job GC period of X
2. Eval GC period of Y

If `X < Y` then every periodic run of the sysbatch job will run on every node multiple times, as long as allocations for the a sysbatch job do not end at the exact same time. This may lead to infinite accumulation of the number of periodic jobs and an infinite number of allocations run for each of them on every node. Please see the repro for details.

### Reproduction steps

Start a server and two client nodes
```
# Local server
$ cat server_config.hcl 
data_dir = "/tmp/nomad/server"
log_level = "TRACE"
 
advertise {
  http = "127.0.0.1"
  rpc = "127.0.0.1"
  serf = "127.0.0.1"
}
 
server {
  enabled = true
  bootstrap_expect = 1
  job_gc_interval = "1m"
  job_gc_threshold = "24h"
  eval_gc_threshold = "1m" 
}
$ ./nomad agent -config server_config.hcl 

# Local client no. 1
$ cat client_config.hcl
data_dir = "/tmp/nomad/client-1"
log_level = "debug"
 
advertise {
  http = "127.0.0.1"
  rpc = "127.0.0.1"
  serf = "127.0.0.1"
}
 
ports {
  http = "9876"
  rpc = "9875"
  serf = "9874"
}
 
client {
  enabled = true
  servers = ["127.0.0.1"]
  gc_max_allocs = 1
}
 
plugin "raw_exec" {
  config {
    enabled = true
  }
}
$ ./nomad agent -config client_config.hcl 

$ cat client_config_2.hcl 
data_dir = "/tmp/nomad/client-2"
log_level = "debug"
 
advertise {
  http = "127.0.0.1"
  rpc = "127.0.0.1"
  serf = "127.0.0.1"
}
 
ports {
  http = "8876"
  rpc = "8875"
  serf = "8874"
}
 
client {
  enabled = true
  servers = ["127.0.0.1"]
  gc_max_allocs = 1
}
 
plugin "raw_exec" {
  config {
    enabled = true
  }
}
```
Please note that we will start client node no. 2 in a way that naturally keeps all of its allocations alive forever. This is due to https://github.com/hashicorp/nomad/issues/16381 and we use it just because it's convenient. In production this can be easiest emulated by having sysbatch jobs whose runtimes are non-uniform and simply splay across a large-ish period (say, 10 minutes). All we want for this node is for it to maintain its allocations and not let them be GCed.

```
$ while true; do timeout 45 ./nomad agent -config client_config_2.hcl 
```

Now, let us start a periodic job:
```
# Job
$ cat job.hcl
job "example" {
  datacenters = ["dc1"]
  type = "sysbatch"

  periodic {
    cron = "*/10 * * * * *"
  }
 
  group "test-group" {
    task "test-task" {
      driver = "raw_exec"

 
      config {
        command = "/usr/bin/echo"
        args = [ "I ran!" ]
      }
    }
  }
}

$ ./nomad job run job.hcl                                                                                                 
Job registration successful
Approximate next launch time: 2023-06-01T21:20:00Z (9m25s from now)
```

Now, all we need to do is wait. The Evaluations are only ever GCed every `5 minutes` and the GC is approximate based on the raft index:

```
 [DEBUG] core.sched: eval GC found eligibile objects: evals=6 allocs=3 
```

I left this running overnight, but realistically one may also just add artificial activity onto the node so that the index moves forward. If we now look at some of the periodic job runs they will have a lot of complete allocations (many many more than nodes -- in fact, we can make them have an arbitrary number!):

```
$ ./nomad job status example/periodic-1685712600
ID            = example/periodic-1685712600
Name          = example/periodic-1685712600
Submit Date   = 2023-06-02T09:30:00-04:00
Type          = sysbatch
Priority      = 50
Datacenters   = dc1
Namespace     = default
Status        = dead
Periodic      = false
Parameterized = false

Summary
Task Group  Queued  Starting  Running  Failed  Complete  Lost  Unknown
test-group  0       0         0        0       8         0     0

Allocations
ID        Node ID   Task Group  Version  Desired  Status    Created     Modified
7f95450b  274e6daa  test-group  0        run      complete  41s ago     41s ago
4b36a17c  75560bfc  test-group  0        run      complete  42m21s ago  42s ago
```
The record-holding job had >`1000` completed allocs in a system with just 2 nodes and there was an average of 1 alloc running every 1 second for this sysbatch job which is configured to run every `10` minutes. 

I expect a periodic sysbatch run to ever only have `number_of_nodes` allocations complete. Since this happens for every periodic run in addition to new runs we now get infinite number of sysbatch periodic runs on every node (these jobs are never garbage collected and their number grows without bound). 

#### Expected Result

Each periodic sysbatch job instance runs on every node in the system only once.

#### Actual Result

Each periodic sysbatch job instance runs a large number of times on every node.

#### Root cause

The root cause is a combination of that in https://github.com/hashicorp/nomad/issues/17395 and the fact that the garbage collection for sysbatch jobs is different from that for `batch` jobs. 

Batch jobs maintain at least one allocation per task group ran so that they are not rescheduled when the exit code is 0 (that is -- they are expected not to run again):

https://github.com/hashicorp/nomad/blob/v1.5.5/nomad/core_sched.go#L306-L313

However, this logic does not exist for sysbatch jobs which causes the above behavior.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Periodic sysbatch jobs run much more frequently than the spec expresses. #17397

Nomad version

Issue

Reproduction steps

Expected Result

Actual Result

Root cause

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Periodic sysbatch jobs run much more frequently than the spec expresses. #17397

Description

Nomad version

Issue

Reproduction steps

Expected Result

Actual Result

Root cause

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions