From 35deb3b1e93b611428b6fc49a2cf2647b94b7ad7 Mon Sep 17 00:00:00 2001 From: Marc Scholten Date: Thu, 19 Feb 2026 10:34:18 +0100 Subject: [PATCH] Fix job worker silently exiting on transient fetchNextJob error After the job worker redesign (820cf00e), runJobLoop exits without retrying when fetchNextJob throws a transient error (pool exhaustion, connection timeout). Since the NOTIFY signal was already consumed from the TBQueue, nothing triggers a new worker spawn, so the job sits orphaned until the 60-second poller picks it up. The old MVar-based workers were persistent and always looped back to takeMVar after any outcome. The new on-demand workers are ephemeral, so exiting means the job is lost until the poller runs. Add runJobLoop call to the error branch so the worker retries after the 1-second backoff, matching how the poller handles errors. Fixes amitaibu/ihp-sensors#18 Co-Authored-By: Claude Opus 4.6 --- ihp/IHP/Job/Runner.hs | 1 + 1 file changed, 1 insertion(+) diff --git a/ihp/IHP/Job/Runner.hs b/ihp/IHP/Job/Runner.hs index 5d8038bb5..091d6e651 100644 --- a/ihp/IHP/Job/Runner.hs +++ b/ihp/IHP/Job/Runner.hs @@ -171,6 +171,7 @@ jobWorkerFetchAndRunLoop JobWorkerArgs { .. } = do Left exception -> do Log.error ("Job worker: Failed to fetch next job: " <> tshow exception) Concurrent.threadDelay 1000000 -- 1s backoff to avoid tight error loops + runJobLoop -- retry after transient error Right (Just job) -> do Log.info ("Starting job: " <> tshow job)