Skip to content

Commit 5624419

Browse files
yonromaiclaude
andcommitted
Log lock-release lifecycle in executor_step_status and distributed_lock
Adds two INFO-level log lines so the next occurrence of the self-race described in issue #5026 can be triaged from logs alone. StatusFile.write_status now announces the terminal-status release with path, worker, and reason; and DistributedLease.release logs at INFO when it actually deletes the lock object (promoted from the prior DEBUG). Together these disambiguate a self-release from an external delete or a stale-lease takeover, which the existing LeaseLostError message at distributed_lock.py:152 cannot distinguish. Refs #5026 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 787e034 commit 5624419

2 files changed

Lines changed: 7 additions & 1 deletion

File tree

lib/marin/src/marin/execution/executor_step_status.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -120,6 +120,12 @@ def write_status(self, status: str) -> None:
120120
f.write(status)
121121

122122
if status != STATUS_RUNNING:
123+
logger.info(
124+
"Releasing lock path=%s worker=%s reason=terminal_status:%s",
125+
self._lock_path,
126+
self.worker_id,
127+
status,
128+
)
123129
self.release_lock()
124130
logger.debug("[%s] Wrote status %s to %s", self.worker_id, status, self.path)
125131

lib/rigging/src/rigging/distributed_lock.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -161,7 +161,7 @@ def release(self) -> None:
161161
_, lock_data = self._read_with_generation()
162162
if lock_data and lock_data.worker_id == self.worker_id:
163163
self._delete()
164-
logger.debug("[%s] Released lock %s", self.worker_id, self.lock_path)
164+
logger.info("Released lock path=%s worker=%s", self.lock_path, self.worker_id)
165165
except FileNotFoundError:
166166
pass
167167

0 commit comments

Comments
 (0)