Open
Description
Would it be possible to add backoffLimit to DaskJobs? Kubernetes jobs have this argument so that the job is reported as failed only it the pod fails a certain number of times (see below). Could we add these to DaskJobs as well? I have been using this argument in jobs because Dask sometimes "just hangs/crashes" in very long jobs and restarting the job fixes that.
apiVersion: batch/v1
kind: Job
metadata:
name: pi
spec:
template:
spec:
containers:
- name: pi
image: perl:5.34.0
command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
restartPolicy: Never
backoffLimit: 4