Skip to content

fix: etcd client leak in the (legacy) Upgrade API#13508

Merged
talos-bot merged 1 commit into
siderolabs:mainfrom
smira:fix/etcd-client-leak
Jun 3, 2026
Merged

fix: etcd client leak in the (legacy) Upgrade API#13508
talos-bot merged 1 commit into
siderolabs:mainfrom
smira:fix/etcd-client-leak

Conversation

@smira
Copy link
Copy Markdown
Member

@smira smira commented Jun 3, 2026

This doesn't affect new upgrade API.

There were two leaks in the validate path, usually they are harmless, as the Upgrade triggers a reboot, so a couple of leaking clients doesn't matter, but if they API is called repeatedly (and it fails to upgrade), the leak accumulates until etcd container runs out of file descriptor, eventually preventing new connections to etcd from being established.

Also put the etcd pre-checks under the !preserve condition (means the API requests wipe). The wipe behavior was disabled by default for a long time, and all etcd pre-checks only make sense if the wipe is called, otherwise upgrade doesn't affect etcd membership in any way.

Copilot AI review requested due to automatic review settings June 3, 2026 16:06
@github-project-automation github-project-automation Bot moved this to To Do in Planning Jun 3, 2026
@talos-bot talos-bot moved this from To Do to In Review in Planning Jun 3, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses resource leaks in the legacy Upgrade API’s etcd validation path by ensuring temporary etcd clients are properly closed, and it narrows when etcd upgrade pre-checks run to only cases where preserve=false (i.e., when the request implies wiping behavior), aligning the checks with the legacy API’s intended semantics.

Changes:

  • Close per-member etcd clients created during member health validation to prevent file descriptor leaks.
  • Close the etcd client created during legacy Upgrade request validation to prevent accumulation across repeated calls.
  • Gate etcd pre-checks (mutex + ValidateForUpgrade) under !preserve in the legacy Upgrade API path.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
internal/pkg/etcd/etcd.go Ensures temporary etcd clients created for member health/quorum checks are closed.
internal/app/machined/internal/server/v1alpha1/v1alpha1_server.go Ensures the Upgrade validation etcd client is closed and runs etcd pre-checks only when preserve=false.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@github-project-automation github-project-automation Bot moved this from In Review to Approved in Planning Jun 3, 2026
Copy link
Copy Markdown
Member

@frezbo frezbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised no linters caught this, i guess we need better linters that catches these, I guess copilot could have caught this

@smira
Copy link
Copy Markdown
Member Author

smira commented Jun 3, 2026

I'm surprised no linters caught this, i guess we need better linters that catches these, I guess copilot could have caught this

this is also super old/legacy code path not touched for ages

This doesn't affect new upgrade API.

There were two leaks in the validate path, usually they are harmless, as
the Upgrade triggers a reboot, so a couple of leaking clients doesn't
matter, but if they API is called repeatedly (and it fails to upgrade),
the leak accumulates until `etcd` container runs out of file descriptor,
eventually preventing new connections to `etcd` from being established.

Also put the etcd pre-checks under the `!preserve` condition (means the
API requests wipe). The wipe behavior was disabled by default for a long
time, and all etcd pre-checks only make sense if the wipe is called,
otherwise upgrade doesn't affect etcd membership in any way.

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
@smira smira force-pushed the fix/etcd-client-leak branch from 6644fca to 89e307e Compare June 3, 2026 19:04
@smira
Copy link
Copy Markdown
Member Author

smira commented Jun 3, 2026

/m

@talos-bot talos-bot merged commit 89e307e into siderolabs:main Jun 3, 2026
118 of 119 checks passed
@github-project-automation github-project-automation Bot moved this from Approved to Done in Planning Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Proposed
Status: Proposed
Status: Done

Development

Successfully merging this pull request may close these issues.

7 participants