Open
Description
Nomad version
Nomad v1.7.0
Operating system and Environment details
macos
Issue
Setting stop_after_client_disconnect to 0 will block the job, the allocation will be marked as complete and will never start.
Reproduction steps
Run a job with stop_after_client_disconnect
set to 0
Expected Result
My job starts to run and if the client is unable to communicate with a server, the process and alloc are killed.
Actual Result
The allocation is placed but never set to healthy:
nomad % nomad job run bug/service.nomad.hcl
==> 2023-12-07T16:43:37+01:00: Monitoring evaluation "1094c86d"
2023-12-07T16:43:37+01:00: Evaluation triggered by job "service-example"
2023-12-07T16:43:38+01:00: Evaluation within deployment: "7c1f436d"
2023-12-07T16:43:38+01:00: Allocation "619b62ed" created: node "6b10497a", group "service-example"
2023-12-07T16:43:38+01:00: Evaluation status changed: "pending" -> "complete"
==> 2023-12-07T16:43:38+01:00: Evaluation "1094c86d" finished with status "complete"
==> 2023-12-07T16:43:38+01:00: Monitoring deployment "7c1f436d"
⠹ Deployment "7c1f436d" in progress...
2023-12-07T16:43:38+01:00
ID = 7c1f436d
Job ID = service-example
Job Version = 0
Status = running
Description = Deployment is running
Deployed
Task Group Desired Placed Healthy Unhealthy Progress
Deadline
service-example 1 1 0 0
2023-12-07T16:53:37+01:00
And the process, in this case sleep, is never started:
nomad % sudo ps -aef |grep sleep
501 72289 48840 0 4:44PM ttys051 0:00.00 grep sleep
The job status:
nomad % nomad job status service-example
ID = service-example
Name = service-example
Submit Date = 2023-12-07T16:43:37+01:00
Type = service
Priority = 50
Datacenters = dc1
Namespace = default
Node Pool = default
Status = dead
Periodic = false
Parameterized = false
Summary
Task Group Queued Starting Running Failed Complete Lost Unknown
service-example 0 0 0 0 1 0 0
Latest Deployment
ID = 7c1f436d
Status = running
Description = Deployment is running
Deployed
Task Group Desired Placed Healthy Unhealthy Progress Deadline
service-example 1 1 0 0 2023-12-07T16:53:37+01:00
Allocations
ID Node ID Task Group Version Desired Status Created Modified
619b62ed 6b10497a service-example 0 run complete 3m11s ago 3m11s ago
Job file (if appropriate)
job "service-example" {
datacenters = ["dc1"]
group "service-example" {
stop_after_client_disconnect = "0s"
task "lost-no-reschedule" {
driver = "raw_exec"
config {
command = "/bin/sleep"
args = ["2147483647"]
}
}
}
}