Skip to content

Refactor filterByTainted#27584

Open
Juanadelacuesta wants to merge 7 commits intomainfrom
NOJIRA-filters
Open

Refactor filterByTainted#27584
Juanadelacuesta wants to merge 7 commits intomainfrom
NOJIRA-filters

Conversation

@Juanadelacuesta
Copy link
Copy Markdown
Member

@Juanadelacuesta Juanadelacuesta commented Feb 24, 2026

Description

This PR attempts to refactor de function filterByTainted to make it clearer and easier to expand. It maintains all its functionality and adds testing.
The approach is to have a priority organised classification rules of the shape condition func(allocContext) bool that will determine in with bucket should each allocation end up.
The allocContext contains all the information necessary to classify an alloc:

type allocContext struct {
	alloc           *structs.Allocation
	shouldReconnect bool
	taintedNode     *structs.Node
	nodeIsTainted   bool
	now             time.Time
}

Testing & Reproduction steps

Links

Contributor Checklist

  • Changelog Entry If this PR changes user-facing behavior, please generate and add a
    changelog entry using the make cl command.
  • Testing Please add tests to cover any new functionality or to demonstrate bug fixes and
    ensure regressions will be caught.
  • Documentation If the change impacts user-facing functionality such as the CLI, API, UI,
    and job configuration, please update the Nomad product documentation, which is stored in the
    web-unified-docs repo. Refer to the web-unified-docs contributor guide for docs guidelines.
    Please also consider whether the change requires notes within the upgrade
    guide
    . If you would like help with the docs, tag the nomad-docs team in this PR.

Reviewer Checklist

  • Backport Labels Please add the correct backport labels as described by the internal
    backporting document.
  • Commit Type Ensure the correct merge method is selected which should be "squash and merge"
    in the majority of situations. The main exceptions are long-lived feature branches or merges where
    history should be preserved.
  • Enterprise PRs If this is an enterprise only PR, please add any required changelog entry
    within the public repository.
  • If a change needs to be reverted, we will roll out an update to the code within 7 days.

Changes to Security Controls

Are there any changes to security controls (access controls, encryption, logging) in this pull request? If so, explain.

@mismithhisler
Copy link
Copy Markdown
Member

Everytime I see filterByTainted I always have to think about what it's actually doing. I think this PR does a good job of "categorizing" them, and so I wonder if it's also worth renaming the function categorizeNodes or something similar?

@Juanadelacuesta
Copy link
Copy Markdown
Member Author

why Nodes?

@Juanadelacuesta Juanadelacuesta changed the title style: refactor filterByTainted Refactor filterByTainted May 1, 2026
@Juanadelacuesta Juanadelacuesta added backport/ent/1.11.x+ent backport to 1.11.x+ent release line backport/2.0.x backport to 2.0.x release line backport/ent/2.0.x+ent backport to 2.0.x+ent release line labels May 4, 2026
Copy link
Copy Markdown
Member

@gulducat gulducat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fabulous. The code itself looks solid to me. My comments are all about adding even more clarity based on the new structure.

Comment thread scheduler/reconciler/filters.go Outdated
Comment thread scheduler/reconciler/filters.go Outdated
Comment thread scheduler/reconciler/filters.go Outdated
Comment thread scheduler/reconciler/filters.go Outdated
Comment thread scheduler/reconciler/filters.go Outdated
Comment thread scheduler/reconciler/filters_test.go
Comment thread scheduler/reconciler/filters_test.go
Comment thread scheduler/reconciler/filters_test.go Outdated
Comment thread scheduler/reconciler/filters_test.go Outdated
Comment thread scheduler/reconciler/filters_test.go Outdated
all := allocSet{tc.alloc.ID: tc.alloc}
state := ClusterState{Now: now, TaintedNodes: tc.nodeMap}

untainted, migrate, lost, disconnecting, reconnecting, ignore, expiring := all.filterByTainted(state)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll put this comment here so github makes it a thread to discuss, only tangentially related to this line.

I agree with @mismithhisler that we could rename this func, because filterByTainted is not very intuitive, but you're right that it's not Nodes that are being categorized. How about categorizeAllocs?

I also think the doc comment could be a little clearer, while we're at it, something like

// categorizeAllocs takes a set of tainted nodes and filters the allocation set
// into the following groups:
// 1. untainted: allocs on untainted nodes (AND clarify what exactly this means...)
// 2. migrate: allocs on nodes that are draining and need to be moved
// 3. lost: allocs on lost nodes or have expired and need to be replaced
// 4. disconnect: allocs on nodes that are disconnected, but have not had their ClientState set to unknown
// 5. reconecting: allocs on a node that has reconnected, and need ......
// 6. ignore: allocs in a state that results in a noop that only need to be counted
// 7. expiring: allocs on disconnected nodes and need to be marked lost (and possibly replaced)
func (set allocSet) categorizeAllocs(state ClusterState)
(untainted, migrate, lost, disconnecting, reconnecting, ignore, expiring allocSet) {

?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and what do we think about backports? The prospect of merge conflicts always sets me on edge, but previous refactors (e.g. #26042 and #26324) were not backported... I assume that's why you targeted 1.11.x+ent and 2.0.x? (PS shouldn't need 2.0.x+ent, because CE->Ent merge will catch that).

Co-authored-by: Daniel Bennett <8812489+gulducat@users.noreply.github.com>
@Juanadelacuesta
Copy link
Copy Markdown
Member Author

new(bool) is not valid still: cannot use (true) (untyped bool constant true) as *bool value in struct literal true and false are constants.

@Juanadelacuesta Juanadelacuesta removed the backport/ent/2.0.x+ent backport to 2.0.x+ent release line label May 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport/ent/1.11.x+ent backport to 1.11.x+ent release line backport/2.0.x backport to 2.0.x release line

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants