Skip to content

Fix client hangs on stale SSH connections#28910

Merged
baude merged 2 commits into
podman-container-tools:mainfrom
Honny1:fix-hang
Jun 15, 2026
Merged

Fix client hangs on stale SSH connections#28910
baude merged 2 commits into
podman-container-tools:mainfrom
Honny1:fix-hang

Conversation

@Honny1

@Honny1 Honny1 commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Wrap ssh.DialNet in a goroutine so it respects context cancellation, skip retries on context/timeout errors, and invalidate the cached connection on failure.

Fixes: #28453

Checklist

Ensure you have completed the following checklist for your pull request to be reviewed:

  • Certify you wrote the patch or otherwise have the right to pass it on as an open-source patch by signing all
    commits. (git commit -s). (If needed, use git commit -s --amend). The author email must match
    the sign-off email address. See CONTRIBUTING.md
    for more information.
  • Referenced issues using Fixes: #00000 in commit message (if applicable)
  • Tests have been added/updated (or no tests are needed)
  • Documentation has been updated (or no documentation changes are needed)
  • All commits pass make validatepr (format/lint checks)
  • Release note entered in the section below (or None if no user-facing changes)

Does this PR introduce a user-facing change?

None

@packit-as-a-service

Copy link
Copy Markdown

tmt tests failed for commit 08b7d56. @lsm5, @psss, @thrix please check.

@Luap99 Luap99 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure we can do this. In general this does not nothing to root cause the problem so this is as best a workaround and not a fix for the issue at all.

But the main problems is that long stale connections are to be expected under normal operations.

Think of podman-remote events, if there are no events we just wait forever. Some for things like podman logs --follow, the attach logic and likely more I am forgetting.

For normal commands like ps if there are a lot of containers it needs to take all locks and that can take a while.
I have containers on my sever where the lock during startup is taking for many minutes due the expensive selinux relabel on startup.
That means ps will block for minutes at a time, that does not mean podman-remote ps should just give up all of the sudden and error, because that would break scripts even harder.

@Honny1 Honny1 added the No New Tests Allow PR to proceed without adding regression tests label Jun 11, 2026
@Honny1 Honny1 changed the title Fix remote client hang on stale connections Fix client hangs on stale SSH connections Jun 11, 2026
@Honny1

Honny1 commented Jun 11, 2026

Copy link
Copy Markdown
Contributor Author

@Luap99 Good catch, I didn't realize that. I've removed those timeouts. I think the other changes are still useful but won't help with the locking issue.

@Honny1 Honny1 requested a review from Luap99 June 11, 2026 12:16
@Honny1 Honny1 marked this pull request as ready for review June 11, 2026 12:16
Comment thread pkg/bindings/connection.go Outdated
@packit-as-a-service

Copy link
Copy Markdown

tmt tests failed for commit 374e63f. @lsm5, @psss, @thrix please check.

Honny1 added 2 commits June 11, 2026 16:10
Signed-off-by: Jan Rodák <hony.com@seznam.cz>
Wrap ssh.DialNet in a goroutine so it respects context cancellation,
skip retries on context/timeout errors, and invalidate the cached
connection on failure.

Fixes: podman-container-tools#28453

Signed-off-by: Jan Rodák <hony.com@seznam.cz>
@packit-as-a-service

Copy link
Copy Markdown

tmt tests failed for commit ae825f6. @lsm5, @psss, @thrix please check.

@Honny1 Honny1 added the bloat_approved Approve a PR in which binary file size grows by over 50k label Jun 11, 2026
@Honny1

Honny1 commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

/packit retest-failed

@Luap99 Luap99 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@baude

baude commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

LGTM

@baude baude merged commit 0cba15e into podman-container-tools:main Jun 15, 2026
142 of 146 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bloat_approved Approve a PR in which binary file size grows by over 50k No New Tests Allow PR to proceed without adding regression tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

podman ps occasionally hangs

3 participants