-
Notifications
You must be signed in to change notification settings - Fork 143
Description
Version Info
Hangfire.NetCore version: 1.8.21
Hangfire.Postgresql version: 1.20.12
Issue Details
Earlier this year we switched to using PostgreSqlStorageOptions.EnableLongPolling = true for our Hangfire configuration and that has been an awesome performance update. However, we have noticed in our non-prod environments that if the underlying connection that PostgreSqlJobQueue opens via ListenForNotificationsAsync is disrupted, this causes the subsequent enqueue of jobs to become stuck in an Enqueued state until their invisibility timeout passes and they are re-processed. Steps to repro:
- Configure Hangfire.Postgresql with
PostgreSqlStorageOptions.EnableLongPolling = true; - Configure a server and job queue for processing (EG: email)
- Enqueue a job and note that it processes immediately (thanks to the polling/listener notification mechanism)
- Terminate the listener connection(s) via pgadmin via:
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE pid <> pg_backend_pid() AND query like 'LISTEN new_job'
- Note that the listener connection(s) do not automatically reconnect after this disconnect:
SELECT *
FROM pg_stat_activity
WHERE pid <> pg_backend_pid() AND query like 'LISTEN new_job'
- Enqueue another job (either via code or via Hangfire.Console) and note that it sticks in an Enqueued state:
Question
Would it be possible to add some graceful reconnection capability to the job queue listener connections in this scenario to try to avoid the stuck jobs? I can't quite tell why, but over a longer period of time it appears that listener connections do start to connect up again, seemingly as a result of subsequent job Enqueue activity. But it doesn't seem like a reliable enough recovery mechanism to avoid the stuck jobs so we are looking for anything that could smooth out this experience.