-
Notifications
You must be signed in to change notification settings - Fork 184
Enhance pod.containerstatuses metric to include lastTerminationState of failed pods #4128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
c15f2c0
to
2f4b4ce
Compare
LGTM, |
2f4b4ce
to
c912e13
Compare
c912e13
to
94755ce
Compare
e849fd3
to
4955cb8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but I think it would benefit for the PR to have an additional approval from someone more knowledgeable about the monitor before merging.
Withdrawing my approval to wait for canary test results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me @yadneshk .
7408707
to
dc66c97
Compare
Capture and include LastTerminationState in pod container status metrics. This includes pods in the Waiting state (not ready), indicating that their last termination state must have contributed to their current condition.
dc66c97
to
6b91046
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good but I suggest to include a link of the TSG/SOP (e.g. Azure RedHat OpenShift Team Doc ) for this monitor.
Capture and include LastTerminationState in pod container status metrics. This includes pods in the Waiting state (not ready), indicating that their last termination state must have contributed to their current condition.
Which issue this PR addresses:
Fixes ARO-7793
What this PR does / why we need it:
Add the reason for last termination state of pods stuck in waiting state as a dimension in pod.containerstatuses metric. The value of this field could further be used to segregate pod failures such as
OOMKilled
.Test plan for issue:
Is there any documentation that needs to be updated for this PR?
How do you know this will function as expected in production?