Description
Hi!
I'm running a postgres-flex 16.4 instance and every once in a while it stops responding to some queries (but only some of them).
flyio/postgres-flex:16.4@sha256:f107dbfaa732063b31ee94aa728c4f5648a672259fd62bfaa245f9b7a53b5479
The queries are listed as active, and lock monitoring (as per postgres wiki) shows nobody is waiting on anything. There is no load on the server, and the queries are light (one i've got right now is SHOW EXTENSIONS
). The queries cannot be cancelled by pg_cancel/-terminate_backend
. The only way to get rid of them is to restart the server.
I've logged on to the container and was able to confirm that the background processes are indeed there and are indeed stuck doing nothing. pdb
shows they are stuck trying to emit an error message:
# gdb -p 23173
# gdb -p 23173
<snip>
(gdb) backtrace
#0 0x00007f1af43b8574 in __GI___libc_write (fd=2, buf=0x55a80b88bfa0, nbytes=158) at ../sysdeps/unix/sysv/linux/write.c:26
#1 0x000055a7f4e76e65 in EmitErrorReport ()
#2 0x000055a7f4d0da0c in PostgresMain ()
#3 0x000055a7f4eea8ca in ?? ()
#4 0x000055a7f4c71c79 in PostmasterMain ()
#5 0x000055a7f493ede9 in main ()
strace
shows that the process is trying to write a log message to stderr
# strace -p 23173
# strace -p 23173
strace: Process 23173 attached
write(2, "2025-01-31 16:13:37.315 UTC [231"..., 158
# ls -l /proc/23173/fd/{0,1,2}
# ls -l /proc/23173/fd/{0,1,2}
lrwx------ 1 postgres postgres 64 Jan 31 16:14 /proc/23173/fd/0 -> /dev/pts/0
lrwx------ 1 postgres postgres 64 Jan 31 16:14 /proc/23173/fd/1 -> /dev/pts/0
lrwx------ 1 postgres postgres 64 Jan 31 16:14 /proc/23173/fd/2 -> /dev/pts/0
I double checked that I cannot write to that pts from my terminal either, # echo 123 > /dev/pts/0
gets stuck, too. So my current hypothesis as to why the process is stuck is that /dev/pts/0
's internal buffer is full and the process that is supposed to read from it on the other side doesn't do its job for some reason.
pstree
shows that postgres
is started by start
command, and some further digging in this repo got me to the place where I think the pipe is created for postgres
:
https://github.com/fly-apps/postgres-flex/blob/master/internal/supervisor/output.go#L54
But i'm not sure on where to look for what else is happening to the other side of the pipe, as Go is not my native tongue. Please, advise.
Thanks!