fix: health should ignore max_unavailable [NR-365513] by paologallinaharbur · Pull Request #1051 · newrelic/newrelic-agent-control

paologallinaharbur · 2025-02-06T14:22:42Z

Before we were taking too many variables into account, but we took wrong assumptions:

we do not know if a rolling updated is ongoing
we do not have a timeout

Therefore, we were always taking into account max_unavailable, making very easy to report false negatives.

The implementation got simplified aiming to report unhealthy if not all the pods expected are ready:

This implies that during a rollout an agent could appear as unHealthy if there are not all the ready pods expected.

Moreover, please notice that following the APM case:

We report as Healthy also if a replica is running an old version of the agent, since we can safely assume that it is in the process to be upgraded.

vjripoll

🚀

agent-control/src/sub_agent/health/k8s/daemon_set.rs

sigilioso

🚀

gsanchezgavier · 2025-02-07T11:08:34Z

agent-control/src/sub_agent/health/k8s/daemon_set.rs

+    // I.e. we are reporting healthy also whenever there is an instance running an old version.
    pub fn check_health_single_daemon_set(ds: &DaemonSet) -> Result<Health, HealthCheckerError> {
        let name = client_utils::get_metadata_name(ds)?;
        let status = Self::get_daemon_set_status(name.as_str(), ds)?;


working on de deployment i see this same pattern, should we fail because of this?
not having status is an expected situation right?
I fear to pollute logs becase of this, perhaps we need to report unhealthy but not fail the health checker

I though that the status is supposed to be there 🤔 (probably there are scenarios I'm not aware of). If it is expected, I agree we should not fail (even if failing also reports unhealthy in the end)

Have you seen often that log?
By the way it fails, but at the end it is returning unhealthy, not beaking anything

let health = health_checker.check_health().unwrap_or_else(|err| { debug!(agent_id = %agent_id_clone, last_error = %err, "the configured health check failed"); HealthWithStartTime::from_unhealthy(Unhealthy::from(err), sub_agent_start_time) });

sigilioso

Two nits 🙂

agent-control/src/sub_agent/health/k8s/daemon_set.rs

sigilioso

LGTM

fix: health should ignore max_unhealthy [NR-365513]

fae3df1

paologallinaharbur changed the title ~~fix: health should ignore max_unhealthy [NR-365513]~~ fix: health should ignore max_unavailable [NR-365513] Feb 6, 2025

vjripoll previously approved these changes Feb 6, 2025

View reviewed changes

paologallinaharbur commented Feb 6, 2025

View reviewed changes

agent-control/src/sub_agent/health/k8s/daemon_set.rs Outdated Show resolved Hide resolved

fix: health should ignore max_unhealthy [NR-365513]

3f98f2c

paologallinaharbur dismissed vjripoll’s stale review via 3f98f2c February 6, 2025 15:09

sigilioso previously approved these changes Feb 6, 2025

View reviewed changes

gsanchezgavier reviewed Feb 7, 2025

View reviewed changes

fix: health should ignore max_unhealthy [NR-365513]

d7d59d6

paologallinaharbur dismissed sigilioso’s stale review via d7d59d6 February 7, 2025 12:33

sigilioso reviewed Feb 7, 2025

View reviewed changes

agent-control/src/sub_agent/health/k8s/daemon_set.rs Outdated Show resolved Hide resolved

agent-control/src/sub_agent/health/k8s/daemon_set.rs Show resolved Hide resolved

fix: health should ignore max_unhealthy [NR-365513]

65cd963

sigilioso approved these changes Feb 7, 2025

View reviewed changes

gsanchezgavier mentioned this pull request Feb 10, 2025

fix(k8s): Deployment health calculation (NR-367062) #1056

Merged

paologallinaharbur merged commit efaa257 into main Feb 10, 2025
26 checks passed

paologallinaharbur deleted the fix/health branch February 10, 2025 11:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: health should ignore max_unavailable [NR-365513]#1051

fix: health should ignore max_unavailable [NR-365513]#1051
paologallinaharbur merged 4 commits intomainfrom
fix/health

paologallinaharbur commented Feb 6, 2025

Uh oh!

vjripoll left a comment

Uh oh!

Uh oh!

sigilioso left a comment

Uh oh!

gsanchezgavier Feb 7, 2025

Uh oh!

sigilioso Feb 7, 2025

Uh oh!

paologallinaharbur Feb 10, 2025

Uh oh!

sigilioso left a comment

Uh oh!

Uh oh!

Uh oh!

sigilioso left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

paologallinaharbur commented Feb 6, 2025

Uh oh!

vjripoll left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sigilioso left a comment

Choose a reason for hiding this comment

Uh oh!

gsanchezgavier Feb 7, 2025

Choose a reason for hiding this comment

Uh oh!

sigilioso Feb 7, 2025

Choose a reason for hiding this comment

Uh oh!

paologallinaharbur Feb 10, 2025

Choose a reason for hiding this comment

Uh oh!

sigilioso left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sigilioso left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants