Skip to content

Commit

Permalink
[jobsets] Fix bug in jobset atexit on local scheduler (#2312)
Browse files Browse the repository at this point in the history
- Local scheduler was deleting succesfully completed jobsets. We avoid this now by ensuring that jobsets are killed or deleted when they are not running or waiting (in suspended state).
  • Loading branch information
valayDave authored Feb 28, 2025
1 parent 7617add commit 69c1204
Showing 1 changed file with 2 additions and 0 deletions.
2 changes: 2 additions & 0 deletions metaflow/plugins/kubernetes/kubernetes_jobsets.py
Original file line number Diff line number Diff line change
Expand Up @@ -319,6 +319,8 @@ def _fetch_pod(self):
def kill(self):
plural = "jobsets"
client = self._client.get()
if not (self.is_running or self.is_waiting):
return
try:
# Killing the control pod will trigger the jobset to mark everything as failed.
# Since jobsets have a successPolicy set to `All` which ensures that everything has
Expand Down

0 comments on commit 69c1204

Please sign in to comment.