From 35deb3b1e93b611428b6fc49a2cf2647b94b7ad7 Mon Sep 17 00:00:00 2001
From: Marc Scholten <marc@digitallyinduced.com>
Date: Thu, 19 Feb 2026 10:34:18 +0100
Subject: [PATCH] Fix job worker silently exiting on transient fetchNextJob
 error

After the job worker redesign (820cf00e), runJobLoop exits without
retrying when fetchNextJob throws a transient error (pool exhaustion,
connection timeout). Since the NOTIFY signal was already consumed from
the TBQueue, nothing triggers a new worker spawn, so the job sits
orphaned until the 60-second poller picks it up.

The old MVar-based workers were persistent and always looped back to
takeMVar after any outcome. The new on-demand workers are ephemeral,
so exiting means the job is lost until the poller runs.

Add runJobLoop call to the error branch so the worker retries after
the 1-second backoff, matching how the poller handles errors.

Fixes amitaibu/ihp-sensors#18

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 ihp/IHP/Job/Runner.hs | 1 +
 1 file changed, 1 insertion(+)

diff --git a/ihp/IHP/Job/Runner.hs b/ihp/IHP/Job/Runner.hs
index 5d8038bb5..091d6e651 100644
--- a/ihp/IHP/Job/Runner.hs
+++ b/ihp/IHP/Job/Runner.hs
@@ -171,6 +171,7 @@ jobWorkerFetchAndRunLoop JobWorkerArgs { .. } = do
                 Left exception -> do
                     Log.error ("Job worker: Failed to fetch next job: " <> tshow exception)
                     Concurrent.threadDelay 1000000  -- 1s backoff to avoid tight error loops
+                    runJobLoop -- retry after transient error
                 Right (Just job) -> do
                     Log.info ("Starting job: " <> tshow job)