Skip to content

Setting stop_after_client_disconnect to 0 will block the job #19344

Open
@Juanadelacuesta

Description

@Juanadelacuesta

Nomad version

Nomad v1.7.0

Operating system and Environment details

macos

Issue

Setting stop_after_client_disconnect to 0 will block the job, the allocation will be marked as complete and will never start.

Reproduction steps

Run a job with stop_after_client_disconnect set to 0

Expected Result

My job starts to run and if the client is unable to communicate with a server, the process and alloc are killed.

Actual Result

The allocation is placed but never set to healthy:

nomad % nomad job run bug/service.nomad.hcl
==> 2023-12-07T16:43:37+01:00: Monitoring evaluation "1094c86d"
    2023-12-07T16:43:37+01:00: Evaluation triggered by job "service-example"
    2023-12-07T16:43:38+01:00: Evaluation within deployment: "7c1f436d"
    2023-12-07T16:43:38+01:00: Allocation "619b62ed" created: node "6b10497a", group "service-example"
    2023-12-07T16:43:38+01:00: Evaluation status changed: "pending" -> "complete"
==> 2023-12-07T16:43:38+01:00: Evaluation "1094c86d" finished with status "complete"
==> 2023-12-07T16:43:38+01:00: Monitoring deployment "7c1f436d"
  ⠹ Deployment "7c1f436d" in progress...
    
    2023-12-07T16:43:38+01:00
    ID          = 7c1f436d
    Job ID      = service-example
    Job Version = 0
    Status      = running
    Description = Deployment is running
    
    Deployed
    Task Group       Desired  Placed  Healthy  Unhealthy  Progress
    Deadline
    service-example  1        1       0        0
    2023-12-07T16:53:37+01:00

And the process, in this case sleep, is never started:

nomad % sudo ps -aef |grep sleep
  501 72289 48840   0  4:44PM ttys051    0:00.00 grep sleep

The job status:

nomad % nomad job status service-example
ID            = service-example
Name          = service-example
Submit Date   = 2023-12-07T16:43:37+01:00
Type          = service
Priority      = 50
Datacenters   = dc1
Namespace     = default
Node Pool     = default
Status        = dead
Periodic      = false
Parameterized = false

Summary
Task Group       Queued  Starting  Running  Failed  Complete  Lost  Unknown
service-example  0       0         0        0       1         0     0

Latest Deployment
ID          = 7c1f436d
Status      = running
Description = Deployment is running

Deployed
Task Group       Desired  Placed  Healthy  Unhealthy  Progress Deadline
service-example  1        1       0        0          2023-12-07T16:53:37+01:00

Allocations
ID        Node ID   Task Group       Version  Desired  Status    Created    Modified
619b62ed  6b10497a  service-example  0        run      complete  3m11s ago  3m11s ago

Job file (if appropriate)

job "service-example" {
  datacenters = ["dc1"]

  group "service-example" {
    stop_after_client_disconnect = "0s"
    task "lost-no-reschedule" {
      driver = "raw_exec"

      config {
        command = "/bin/sleep"
        args    = ["2147483647"]
      }
    }
  }
}

Nomad Server logs (if appropriate)

Nomad Client logs (if appropriate)

Metadata

Metadata

Assignees

No one assigned

    Labels

    stage/acceptedConfirmed, and intend to work on. No timeline committment though.theme/coretype/bug

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions