Fix client hangs on stale SSH connections#28910
Conversation
Luap99
left a comment
There was a problem hiding this comment.
I am not sure we can do this. In general this does not nothing to root cause the problem so this is as best a workaround and not a fix for the issue at all.
But the main problems is that long stale connections are to be expected under normal operations.
Think of podman-remote events, if there are no events we just wait forever. Some for things like podman logs --follow, the attach logic and likely more I am forgetting.
For normal commands like ps if there are a lot of containers it needs to take all locks and that can take a while.
I have containers on my sever where the lock during startup is taking for many minutes due the expensive selinux relabel on startup.
That means ps will block for minutes at a time, that does not mean podman-remote ps should just give up all of the sudden and error, because that would break scripts even harder.
|
@Luap99 Good catch, I didn't realize that. I've removed those timeouts. I think the other changes are still useful but won't help with the locking issue. |
Signed-off-by: Jan Rodák <hony.com@seznam.cz>
Wrap ssh.DialNet in a goroutine so it respects context cancellation, skip retries on context/timeout errors, and invalidate the cached connection on failure. Fixes: podman-container-tools#28453 Signed-off-by: Jan Rodák <hony.com@seznam.cz>
|
/packit retest-failed |
|
LGTM |
Wrap ssh.DialNet in a goroutine so it respects context cancellation, skip retries on context/timeout errors, and invalidate the cached connection on failure.
Fixes: #28453
Checklist
Ensure you have completed the following checklist for your pull request to be reviewed:
commits. (
git commit -s). (If needed, usegit commit -s --amend). The author email must matchthe sign-off email address. See CONTRIBUTING.md
for more information.
Fixes: #00000in commit message (if applicable)make validatepr(format/lint checks)Noneif no user-facing changes)Does this PR introduce a user-facing change?