Skip to content

Group and queue nodes for termination #576

Open
@stevehipwell

Description

@stevehipwell

Describe the feature
I'd like NTH to be able to group nodes (similar to the CA --balance-similar-node-groups) and support processing n nodes per group (this can still be constrained by the workers configuration).

I assume that v2 would be designed around this kind of concept, but I think it'd be worth doing in v1 assuming it wouldn't take too much effort.

Is the feature request related to a problem?
When using NTH to manage ASG instance refresh events it is very easy to get a cluster into a blocking race condition due to pods being terminated off different nodes causing no nodes to be able to fully shut down due to PDBs. This results in hard terminations and general cluster instability.

Describe alternatives you've considered
Using a single worker would work but it would be to slow to respond to time critical events and even for instance refresh it could be too slow for good usability.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: EnhancementNew feature or requeststalebot-ignoreTo NOT let the stalebot update or close the Issue / PR

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions