Fix Running condition being re-emitted when pod and job informers are out-of-sync#787
Conversation
… out-of-sync Signed-off-by: Gonzalo Saez <11050889+GonzaloSaez@users.noreply.github.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
cc: @tenzen-y |
|
Sorry for the delayed checking. I found this notification right now... |
tenzen-y
left a comment
There was a problem hiding this comment.
Thank you for fixing that!
I left some comments.
| if isMPIJobSuspended(mpiJob) { | ||
| msg := fmt.Sprintf("MPIJob %s/%s is suspended.", mpiJob.Namespace, mpiJob.Name) | ||
| updateMPIJobConditions(mpiJob, kubeflow.JobRunning, corev1.ConditionFalse, mpiJobSuspendedReason, msg) | ||
| } else if isFinished(mpiJob.Status) { |
There was a problem hiding this comment.
| } else if isFinished(mpiJob.Status) { | |
| } else if isFinished(mpiJob.Status) && getCondition(mpiJob.Status, kubeflow.JobRunning) == nil{ |
It seems that we can simplify the if-else structure by doing this.
| if getCondition(mpiJob.Status, kubeflow.JobRunning) == nil { | ||
| msg := fmt.Sprintf("MPIJob %s/%s is finished but Running condition was never set.", mpiJob.Namespace, mpiJob.Name) | ||
| cond := kubeflow.JobCondition{ | ||
| Type: kubeflow.JobRunning, | ||
| Status: corev1.ConditionFalse, | ||
| Reason: mpiJobRunningReason, | ||
| Message: msg, | ||
| } | ||
| if mpiJob.Status.CompletionTime != nil { | ||
| cond.LastTransitionTime = *mpiJob.Status.CompletionTime | ||
| cond.LastUpdateTime = *mpiJob.Status.CompletionTime | ||
| } else { | ||
| now := metav1.Now() | ||
| cond.LastTransitionTime = now | ||
| cond.LastUpdateTime = now | ||
| } | ||
| mpiJob.Status.Conditions = append(mpiJob.Status.Conditions, cond) | ||
| } |
There was a problem hiding this comment.
| if getCondition(mpiJob.Status, kubeflow.JobRunning) == nil { | |
| msg := fmt.Sprintf("MPIJob %s/%s is finished but Running condition was never set.", mpiJob.Namespace, mpiJob.Name) | |
| cond := kubeflow.JobCondition{ | |
| Type: kubeflow.JobRunning, | |
| Status: corev1.ConditionFalse, | |
| Reason: mpiJobRunningReason, | |
| Message: msg, | |
| } | |
| if mpiJob.Status.CompletionTime != nil { | |
| cond.LastTransitionTime = *mpiJob.Status.CompletionTime | |
| cond.LastUpdateTime = *mpiJob.Status.CompletionTime | |
| } else { | |
| now := metav1.Now() | |
| cond.LastTransitionTime = now | |
| cond.LastUpdateTime = now | |
| } | |
| mpiJob.Status.Conditions = append(mpiJob.Status.Conditions, cond) | |
| } | |
| msg := fmt.Sprintf("MPIJob %s/%s is finished but Running condition was never set.", mpiJob.Namespace, mpiJob.Name) | |
| cond := kubeflow.JobCondition{ | |
| Type: kubeflow.JobRunning, | |
| Status: corev1.ConditionFalse, | |
| Reason: mpiJobRunningReason, | |
| Message: msg, | |
| } | |
| updateTime := ptr.Deref(mpiJob.Status.CompletionTime, c.clock.Now()) | |
| cond.LastTransitionTime := updateTime | |
| cond.LastUpdateTime := updateTime | |
| mpiJob.Status.Conditions = append(mpiJob.Status.Conditions, cond) |
- Unnest the condition
- Use the clock
- Simplify pointer deref operation
| startTime := metav1.Now() | ||
| completionTime := metav1.Now() |
| // TestLauncherSucceededWithRunningPod tests that when a launcher Job has succeeded but its pod is still observed as Running due to | ||
| // informer lag). Workers have been cleaned up. The Running condition is set to False rather than being re-emitted as True alongside | ||
| // Succeeded. | ||
| func TestLauncherSucceededWithRunningPod(t *testing.T) { |
There was a problem hiding this comment.
Can you explicitly check if the Running LastConditions are later or equal to completionTime?
Because testing libraries ignore such time comparison: https://github.com/GonzaloSaez/mpi-operator/blob/8c953844bc5a5deb7ba98d5e1e85a324b6aa2645/pkg/controller/mpi_job_controller_test.go#L350
When the pod and job informers are out-of-sync, it's possible for the launcher job to be finished but the pod be running. In that case, the MPIJob may be considered as completed (when using runLauncherAsWorker and workers having finished). In this scenario, the running condition may be re-emitted with a last transition time after the MPIJob was deemed completed. This results in other controllers watching MPIJob to not be able to evaluate the start and end times using the last transition time of the running condition.
To fix this, we can avoid re-emitting the Running condition. Moreover, we can also ensure that the Running condition is always emitted and that the last transition time is <= the completion time.
Another solution to this would be to re-queue if we see the job and pod informers are out-of-sync but I'm not sure if the latter would be harder to implement.