Skip to content

Postgres background worker processes get stuck trying to write to stderr #272

Closed
@immerrr

Description

@immerrr

Hi!

I'm running a postgres-flex 16.4 instance and every once in a while it stops responding to some queries (but only some of them).

flyio/postgres-flex:16.4@sha256:f107dbfaa732063b31ee94aa728c4f5648a672259fd62bfaa245f9b7a53b5479

The queries are listed as active, and lock monitoring (as per postgres wiki) shows nobody is waiting on anything. There is no load on the server, and the queries are light (one i've got right now is SHOW EXTENSIONS). The queries cannot be cancelled by pg_cancel/-terminate_backend. The only way to get rid of them is to restart the server.

I've logged on to the container and was able to confirm that the background processes are indeed there and are indeed stuck doing nothing. pdb shows they are stuck trying to emit an error message:

# gdb -p 23173
# gdb -p 23173
<snip>
(gdb) backtrace
#0  0x00007f1af43b8574 in __GI___libc_write (fd=2, buf=0x55a80b88bfa0, nbytes=158) at ../sysdeps/unix/sysv/linux/write.c:26
#1  0x000055a7f4e76e65 in EmitErrorReport ()
#2  0x000055a7f4d0da0c in PostgresMain ()
#3  0x000055a7f4eea8ca in ?? ()
#4  0x000055a7f4c71c79 in PostmasterMain ()
#5  0x000055a7f493ede9 in main ()

strace shows that the process is trying to write a log message to stderr

# strace -p 23173
# strace -p 23173
strace: Process 23173 attached
write(2, "2025-01-31 16:13:37.315 UTC [231"..., 158
That worker's stderr is connected to a PTS
# ls -l /proc/23173/fd/{0,1,2}
# ls -l /proc/23173/fd/{0,1,2}
lrwx------ 1 postgres postgres 64 Jan 31 16:14 /proc/23173/fd/0 -> /dev/pts/0
lrwx------ 1 postgres postgres 64 Jan 31 16:14 /proc/23173/fd/1 -> /dev/pts/0
lrwx------ 1 postgres postgres 64 Jan 31 16:14 /proc/23173/fd/2 -> /dev/pts/0

I double checked that I cannot write to that pts from my terminal either, # echo 123 > /dev/pts/0 gets stuck, too. So my current hypothesis as to why the process is stuck is that /dev/pts/0's internal buffer is full and the process that is supposed to read from it on the other side doesn't do its job for some reason.

pstree shows that postgres is started by start command, and some further digging in this repo got me to the place where I think the pipe is created for postgres:

https://github.com/fly-apps/postgres-flex/blob/master/internal/supervisor/output.go#L54

But i'm not sure on where to look for what else is happening to the other side of the pipe, as Go is not my native tongue. Please, advise.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions