Issue Description
Describe your issue
- Can provide repro steps: Connecting to sockets in container with a custom network attached throws intermittent timeouts.
- Probably: TCP_KEEPALIVE parameters are not treated correctly
I'm ready to provide any details.
Steps to reproduce the issue
Timeouts
- Valkey instance on host (I have connectivity issues with valkey, but it's proven that Valkey is not the issue here)
- Run the following (replace 11.11.11.6:6379 with your Valkey):
Some code
ubuntu@host> podman network create --ignore --driver bridge --subnet 173.26.0.0/16 --gateway 173.26.0.1 test-net
ubuntu@host> podman run --network=test-net -it --rm docker.io/python:3.11 bash
root@container> pip install gevent==24.11.1
root@container> cat > check_conn.py <<EOF
from gevent import monkey
monkey.patch_all() # Must be first, before all other imports
import socket
from datetime import datetime
import time
import gevent
from gevent.pool import Pool
REDIS_SOCKET_KEEPALIVE = True
FAILED = 0
ITERSLEEP = 1
def ping_job(job_id: int, iteration: int):
"""Single ping attempt, run inside a greenlet."""
s = None
t0 = time.time()
try:
s = socket.create_connection(("11.11.11.5", 6379), timeout=2)
s.sendall(b"*1\r\n$4\r\nPING\r\n")
s.recv(1024)
print(datetime.fromtimestamp(t0), f"job={job_id} iter={iteration}", "ok", round(time.time() - t0, 3))
except Exception as e:
print(datetime.fromtimestamp(t0), f"job={job_id} iter={iteration}", "fail", round(time.time() - t0, 3), repr(e))
global FAILED
FAILED += 1
finally:
if s:
s.close()
def worker(job_id: int, total_iterations: int = 200):
"""One long-running worker: loops, pings, sleeps."""
for i in range(total_iterations):
ping_job(job_id, i)
gevent.sleep(ITERSLEEP) # yield to other greenlets; never use time.sleep() here
def main(num_workers: int = 5, total_iterations: int = 200):
pool = Pool(num_workers)
def _spawn(i):
time.sleep(ITERSLEEP / num_workers)
return pool.spawn(worker, job_id=i, total_iterations=total_iterations)
greenlets = [_spawn(i) for i in range(num_workers)]
gevent.joinall(greenlets)
print('FAILED JOBS:', FAILED)
if __name__ == "__main__":
main(num_workers=10)
EOF
root@container> python check_conn.py
- You will see:
Logs
2026-04-09 18:35:39.844833 job=0 iter=0 ok 0.007
2026-04-09 18:35:39.945939 job=1 iter=0 ok 0.002
...
2026-04-09 18:36:19.329535 job=4 iter=39 ok 0.002
2026-04-09 18:36:19.436411 job=5 iter=39 ok 0.001
2026-04-09 18:36:19.536003 job=6 iter=39 ok 0.001
2026-04-09 18:36:19.635141 job=7 iter=39 ok 0.001
2026-04-09 18:36:19.741808 job=8 iter=39 ok 0.001
2026-04-09 18:36:19.834467 job=9 iter=39 ok 0.001
2026-04-09 18:36:17.934258 job=0 iter=38 fail 2.002 TimeoutError('timed out')
2026-04-09 18:36:20.033132 job=1 iter=40 ok 0.001
...
2026-04-09 18:36:21.539970 job=6 iter=41 ok 0.002
...
I tried:
- Running on VM host itself works well – no timeouts for 2 runs over 2000 iterations
- Running on podman container without network attached – same – working well for 2 runs
- Running on podman container with network (which has 16 other network-active containers attached) – fails with timeout at least 1 time per run over 2000 iterations, but actually very often
- Running on podman container with fresh network where only this container is attached – same – fails at least 1 time per run over 2000 iterations, but actually very often
- Running on VM and container with network at the same time – container run fails for 10 times, VM 0 times
- Running on container without network and container with network at the same time – with network run fails for 6 times, without network – 0 times
tcpdump seem to not see those fails (at least I cound't find it there).
TCP_KEEPALIVE parameters are not treated correctly
The code below fails with error:
Error
root@7b45ba3c3aa7:/# python check_broken_socket.py
True
b'123'
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/redis/connection.py", line 534, in send_packed_command
self._sock.sendall(item)
BrokenPipeError: [Errno 32] Broken pipe
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "//check_broken_socket.py", line 38, in <module>
print(r.get("LOL"))
^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/redis/commands/core.py", line 1822, in get
return self.execute_command("GET", name, keys=[name])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/redis/client.py", line 559, in execute_command
return self._execute_command(*args, **options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/redis/client.py", line 567, in _execute_command
return conn.retry.call_with_retry(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/redis/retry.py", line 65, in call_with_retry
fail(error)
File "/usr/local/lib/python3.11/site-packages/redis/client.py", line 571, in <lambda>
lambda error: self._disconnect_raise(conn, error),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/redis/client.py", line 555, in _disconnect_raise
raise error
File "/usr/local/lib/python3.11/site-packages/redis/retry.py", line 62, in call_with_retry
return do()
^^^^
File "/usr/local/lib/python3.11/site-packages/redis/client.py", line 568, in <lambda>
lambda: self._send_command_parse_response(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/redis/client.py", line 541, in _send_command_parse_response
conn.send_command(*args, **options)
File "/usr/local/lib/python3.11/site-packages/redis/connection.py", line 556, in send_command
self.send_packed_command(
File "/usr/local/lib/python3.11/site-packages/redis/connection.py", line 545, in send_packed_command
raise ConnectionError(f"Error {errno} while writing to socket. {errmsg}.")
redis.exceptions.ConnectionError: Error 32 while writing to socket. Broken pipe
Removing TCP_KEEPALIVE options or making them 300/30/3 solves the short-term issue, but in a long-term still fails randomly (can't prove – too long to wait :D – there was 1 error in 3 hours of running code).
Code
Prerequisites: pip install redis==5.2.1
import socket
from redis import Redis
import time
REDIS_SOCKET_KEEPALIVE = True
REDIS_SOCKET_KEEPALIVE_OPTS = dict()
# Linux
if hasattr(socket, "TCP_KEEPIDLE"):
# Start probing after 10s idle
REDIS_SOCKET_KEEPALIVE_OPTS[socket.TCP_KEEPIDLE] = 10 # pyright: ignore[reportAttributeAccessIssue]
# macOS equivalent of KEEPIDLE
if hasattr(socket, "TCP_KEEPALIVE"):
REDIS_SOCKET_KEEPALIVE_OPTS[socket.TCP_KEEPALIVE] = 10
# Both Linux and macOS support these
if hasattr(socket, "TCP_KEEPINTVL"):
# Probe every 3s
REDIS_SOCKET_KEEPALIVE_OPTS[socket.TCP_KEEPINTVL] = 3
if hasattr(socket, "TCP_KEEPCNT"):
# Drop after 5 failed probes (~15s total)
REDIS_SOCKET_KEEPALIVE_OPTS[socket.TCP_KEEPCNT] = 5
r = Redis.from_url(
"redis://11.11.11.6/0",
socket_connect_timeout=2,
socket_timeout=3,
health_check_interval=0,
socket_keepalive=True,
socket_keepalive_options=REDIS_SOCKET_KEEPALIVE_OPTS,
)
print(r.set("LOL", "123"))
print(r.get("LOL"))
time.sleep(150)
print(r.get("LOL"))
Describe the results you received
Describe the results you received
Timeouts connecting to socket and broken sockets mid-connection.
Describe the results you expected
Describe the results you expected
No timeouts and no broken sockets:)
podman info output
ubuntu@host:~/builds$ podman version
Client: Podman Engine
Version: 5.6.2
API Version: 5.6.2
Go Version: go1.23.3
Git Commit: 9dd5e1ed33830612bc200d7a13db00af6ab865a4
Built: Sun Mar 1 13:52:35 2026
OS/Arch: linux/amd64
ubuntu@simulations:~/builds$ podman info
host:
arch: amd64
buildahVersion: 1.41.5
cgroupControllers:
- cpu
- memory
- pids
cgroupManager: systemd
cgroupVersion: v2
conmon:
package: conmon_2.1.10+ds1-1build2_amd64
path: /usr/bin/conmon
version: 'conmon version 2.1.10, commit: unknown'
cpuUtilization:
idlePercent: 99.06
systemPercent: 0.19
userPercent: 0.75
cpus: 16
databaseBackend: sqlite
distribution:
codename: noble
distribution: ubuntu
version: "24.04"
eventLogger: journald
freeLocks: 2024
hostname: simulations
idMappings:
gidmap:
- container_id: 0
host_id: 1000
size: 1
- container_id: 1
host_id: 100000
size: 65536
uidmap:
- container_id: 0
host_id: 1000
size: 1
- container_id: 1
host_id: 100000
size: 65536
kernel: 6.8.0-106-generic
linkmode: dynamic
logDriver: journald
memFree: 1806901248
memTotal: 67424378880
networkBackend: netavark
networkBackendInfo:
backend: netavark
dns:
package: aardvark-dns_1.4.0-5_amd64
path: /usr/lib/podman/aardvark-dns
version: aardvark-dns 1.4.0
package: netavark_1.4.0-4_amd64
path: /usr/lib/podman/netavark
version: netavark 1.4.0
ociRuntime:
name: crun
package: Unknown
path: /usr/local/bin/crun
version: |-
crun version 1.24
commit: 54693209039e5e04cbe3c8b1cd5fe2301219f0a1
rundir: /run/user/1000/crun
spec: 1.0.0
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
os: linux
pasta:
executable: /usr/bin/pasta
package: passt_0.0~git20240220.1e6f92b-1_amd64
version: |
pasta unknown version
Copyright Red Hat
GNU General Public License, version 2 or later
<https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
remoteSocket:
exists: true
path: /run/user/1000/podman/podman.sock
rootlessNetworkCmd: pasta
security:
apparmorEnabled: false
capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
rootless: true
seccompEnabled: true
seccompProfilePath: ""
selinuxEnabled: false
serviceIsRemote: false
slirp4netns:
executable: ""
package: ""
version: ""
swapFree: 0
swapTotal: 0
uptime: 311h 18m 37.00s (Approximately 12.96 days)
variant: ""
plugins:
authorization: null
log:
- k8s-file
- none
- passthrough
- journald
network:
- bridge
- macvlan
- ipvlan
volume:
- local
registries: {}
store:
configFile: /home/ubuntu/.config/containers/storage.conf
containerStore:
number: 23
paused: 0
running: 23
stopped: 0
graphDriverName: overlay
graphOptions: {}
graphRoot: /home/ubuntu/.local/share/containers/storage
graphRootAllocated: 155414249472
graphRootUsed: 122991693824
graphStatus:
Backing Filesystem: extfs
Native Overlay Diff: "true"
Supports d_type: "true"
Supports shifting: "false"
Supports volatile: "true"
Using metacopy: "false"
imageCopyTmpDir: /var/tmp
imageStore:
number: 881
runRoot: /run/user/1000/containers
transientStore: false
volumePath: /home/ubuntu/.local/share/containers/storage/volumes
version:
APIVersion: 5.6.2
Built: 1772373155
BuiltTime: Sun Mar 1 13:52:35 2026
GitCommit: 9dd5e1ed33830612bc200d7a13db00af6ab865a4
GoVersion: go1.23.3
Os: linux
OsArch: linux/amd64
Version: 5.6.2
Podman in a container
No
Privileged Or Rootless
Rootless
Upstream Latest Release
Yes
Additional environment details
Additional environment details
OS: Ubuntu 24.04.4 LTS
VM inside Proxmox VE.
Additional information
Additional information like issue happens only occasionally or issue happens with a particular architecture or on a particular setting
Running in podman container with a custom network attached.
Issue Description
Describe your issue
I'm ready to provide any details.
Steps to reproduce the issue
Timeouts
Some code
Logs
I tried:
tcpdump seem to not see those fails (at least I cound't find it there).
TCP_KEEPALIVE parameters are not treated correctly
The code below fails with error:
Error
Removing TCP_KEEPALIVE options or making them
300/30/3solves the short-term issue, but in a long-term still fails randomly (can't prove – too long to wait :D – there was 1 error in 3 hours of running code).Code
Prerequisites:
pip install redis==5.2.1Describe the results you received
Describe the results you received
Timeouts connecting to socket and broken sockets mid-connection.
Describe the results you expected
Describe the results you expected
No timeouts and no broken sockets:)
podman info output
Podman in a container
No
Privileged Or Rootless
Rootless
Upstream Latest Release
Yes
Additional environment details
Additional environment details
OS:
Ubuntu 24.04.4 LTSVM inside Proxmox VE.
Additional information
Additional information like issue happens only occasionally or issue happens with a particular architecture or on a particular setting
Running in podman container with a custom network attached.