Timedout tasks do not show correct error messages #1469
Open
Description
Steps that led to this issue:
- Create a task with a timeout decorator. In this case, the timeout was 60 seconds.
- Run the task on Kubernetes. This sets the Kubernetes jobs' activeDeadlineSeconds to 60 seconds.
- If the task did not complete, the error seen by the user is:
[KILLED BY ORCHESTRATOR]
Kubernetes error:
Completed. This could be a transient error. Use @retry to retry.
Only after looking at the pod in K8s, it was seen that the pod had timed out. The status of the pod showed:
...
containerStatuses:
- containerID: containerd://99e036b3ba08df84f126ea41d05bb0842ea06540de49359d8c7d83e8aecc790b
...
state:
terminated:
containerID: containerd://99e036b3ba08df84f126ea41d05bb0842ea06540de49359d8c7d83e8aecc790b
exitCode: 137
reason: Error
...
message: Pod was active on the node longer than the specified deadline
phase: Failed
...
qosClass: Burstable
reason: DeadlineExceeded
It will be good if the correct error message is brought up onto the user console so that they can see the timeout error.
Metadata
Assignees
Labels
No labels