Skip to content

fix: report correct reason in kube_pod_status_reason metric #2644

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 22 additions & 8 deletions internal/store/pod.go
Original file line number Diff line number Diff line change
Expand Up @@ -1541,15 +1541,12 @@ func createPodStatusReasonFamilyGenerator() generator.FamilyGenerator {
ms := []*metric.Metric{}

for _, reason := range podStatusReasons {
metric := &metric.Metric{}
metric.LabelKeys = []string{"reason"}
metric.LabelValues = []string{reason}
if p.Status.Reason == reason {
metric.Value = boolFloat64(true)
} else {
metric.Value = boolFloat64(false)
m := &metric.Metric{
LabelKeys: []string{"reason"},
LabelValues: []string{reason},
Value: getPodStatusReasonValue(p, reason),
}
ms = append(ms, metric)
ms = append(ms, m)
}

return &metric.Family{
Expand All @@ -1559,6 +1556,23 @@ func createPodStatusReasonFamilyGenerator() generator.FamilyGenerator {
)
}

func getPodStatusReasonValue(p *v1.Pod, reason string) float64 {
if p.Status.Reason == reason {
return 1
}
for _, cond := range p.Status.Conditions {
if cond.Reason == reason {
return 1
}
}
Comment on lines +1563 to +1567
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we only care about the last condition? If so, do we need to remove this part?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's necessary to iterate through all the conditions because the reason may be in any of them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it be a stale condition?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will not be a stale condition. Kubernetes regularly updates Pod conditions, so if a condition with the corresponding reason is found, it is assumed to be current. If a stale condition were detected, that would indicate an issue in Kubernetes, not in this logic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will a pod have multiple different reasons?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, a Pod can have different “Reasons” throughout its lifecycle. Each event or change in the Pod’s state (for example, container creation, image pulling, runtime errors, restarts, etc.) can trigger a different reason. In Kubernetes, these “Reasons” are recorded at different points in the Pod’s lifecycle, so it is entirely possible for a single Pod to go through multiple different “Reasons” as it transitions between states.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was thinking the case where the pod status is failed to image, then runtime errors, then restart.

Will the above metric have all of these three status?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if your Pod transitions through those states (e.g., failed to pull image, runtime errors, then restarts), the metric can capture each corresponding reason at the time it occurs. However, you won’t necessarily see all reasons simultaneously; rather, you’ll see them reflected as changes in the metric over the Pod’s lifecycle.

for _, cs := range p.Status.ContainerStatuses {
if cs.State.Terminated != nil && cs.State.Terminated.Reason == reason {
return 1
}
}
return 0
}

func createPodStatusScheduledFamilyGenerator() generator.FamilyGenerator {
return *generator.NewFamilyGeneratorWithStability(
"kube_pod_status_scheduled",
Expand Down
82 changes: 82 additions & 0 deletions internal/store/pod_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -2290,3 +2290,85 @@ func BenchmarkPodStore(b *testing.B) {
}
}
}

func TestGetPodStatusReasonValue(t *testing.T) {
reason := "TestReason"

tests := []struct {
name string
pod *v1.Pod
want float64
}{
{
name: "matches Status.Reason",
pod: &v1.Pod{
Status: v1.PodStatus{
Reason: "TestReason",
},
},
want: 1,
},
{
name: "matches condition Reason",
pod: &v1.Pod{
Status: v1.PodStatus{
Conditions: []v1.PodCondition{
{
Reason: "TestReason",
},
},
},
},
want: 1,
},
{
name: "matches container terminated Reason",
pod: &v1.Pod{
Status: v1.PodStatus{
ContainerStatuses: []v1.ContainerStatus{
{
State: v1.ContainerState{
Terminated: &v1.ContainerStateTerminated{
Reason: "TestReason",
},
},
},
},
},
},
want: 1,
},
{
name: "no match returns 0",
pod: &v1.Pod{
Status: v1.PodStatus{
Reason: "OtherReason",
Conditions: []v1.PodCondition{
{
Reason: "NotTestReason",
},
},
ContainerStatuses: []v1.ContainerStatus{
{
State: v1.ContainerState{
Terminated: &v1.ContainerStateTerminated{
Reason: "AnotherReason",
},
},
},
},
},
},
want: 0,
},
}

for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
got := getPodStatusReasonValue(tt.pod, reason)
if got != tt.want {
t.Errorf("getPodStatusReasonValue() = %v, want %v", got, tt.want)
}
})
}
}