Skip to content

Cannot stop an allocation on client in case of network partition #18077

Open
@akamensky

Description

@akamensky

Nomad version

Nomad v1.6.1
BuildDate 2023-07-21T13:49:42Z
Revision 515895c7690cdc72278018dc5dc58aca41204ccc

Operating system and Environment details

Fedora 37

Issue

Clients running on remote nodes may experience loss of connectivity to the servers. It is great that this does not interrupt running allocations on those clients. However there is no CLI or API call to make this client shutdown the allocation.

In case of simple allocations the process can be killed (or in case of docker as runtime it can be stopped using docker cli), but in many cases the shutdown may require additional steps for a "clean" state, i.e. calling some remote service to do something, or stopping all processes in allocation in specific order.

It would be great to be able to stop running allocations in a way defined in the job spec from cli when agent/client lost connectivity to the servers.

Reproduction steps

  1. Start separate processes for server and client (btw terminology is confusing, some places it is called client, other places it is called agent, just pick one)
  2. Deploy a job to run on a client (i.e. hello-world example template)
  3. Kill server process or in some other way ensure that client can no longer communicate with server
  4. Try to stop the allocation on client from CLI or API

Expected Result

There is documented and working way to shutdown any allocation running on disconnected client via CLI or API

Actual Result

There is no such method

Job file (if appropriate)

N/A

Nomad Server logs (if appropriate)

N/A

Nomad Client logs (if appropriate)

N/A

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Needs Roadmapping

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions