You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix(hub): start-local-worker evicts stale worker.lock and verifies readiness
Two root causes of the reported "I created a worker, list says none"
bug:
1. When a local worker loses heartbeats long enough to fall out of the
"active" set — typically because the hub bounced or the network
hiccupped — its bun process usually stays alive and keeps holding
`~/.hapi-dev/worker.lock`. A fresh start-local-worker call would
enroll a new worker, hit the lock, and immediately exit with
"Another worker is already running"; the endpoint still returned
`started: true` because it only checked that spawn() itself
succeeded. Net effect: UI shows empty workers list forever.
2. The endpoint never waited to see if the child actually stayed up,
so it lied about success even on hard failures (e.g. crash during
enrollment, bun interpreter missing, etc.).
Fix:
- evictStaleWorkerLock() runs before each spawn. Reads
<dataDir>/worker.lock, SIGTERMs the holder if still alive, unlinks
the lock. Safe because the lock always belongs to a worker the hub
itself spawned (same HAPI_HOME).
- awaitWorkerReady() polls the child's logs up to 4s after spawn,
looking for either the "Worker starting..." milestone (runner loop
entered) or a terminal exit. On failure returns a structured 500
with reason=worker_lock_conflict|worker_exited plus the tail of
child logs so the UI can surface something concrete.
- Successful responses now carry optional evictedPid so the caller
can see that a cleanup happened.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0 commit comments