Skip to content

OCPBUGS-48479: Adding MHC exception to Pausing MHC cluster update … #91781

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

obrown1205
Copy link
Contributor

@obrown1205 obrown1205 commented Apr 7, 2025

…step

Updating to add an exception note for one of the MachineHealthCheck resources that does not need to be paused during this step.

Version(s):
4.14+

Issue:
OCPBUGS-48479

Link to docs preview:
https://91781--ocpdocs-pr.netlify.app/openshift-enterprise/latest/disconnected/updating/disconnected-update.html
https://91781--ocpdocs-pr.netlify.app/openshift-enterprise/latest/updating/updating_a_cluster/updating-cluster-cli.html

QE review:

  • QE has approved this change.

Additional information:

@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Apr 7, 2025
@openshift-ci-robot
Copy link

@obrown1205: This pull request references Jira Issue OCPBUGS-48479, which is invalid:

  • expected the bug to target the "4.19.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

…step

Version(s):

Issue:
OCPBUGS-48479

Link to docs preview:

QE review:

  • QE has approved this change.

Additional information:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@obrown1205 obrown1205 changed the title OCPBUGS-48479: Adding MHC excption to the Pausing MHC cluster update … OCPBUGS-48479: Adding MHC exception to the Pausing MHC cluster update … Apr 7, 2025
@obrown1205 obrown1205 changed the title OCPBUGS-48479: Adding MHC exception to the Pausing MHC cluster update … OCPBUGS-48479: Adding MHC exception to Pausing MHC cluster update … Apr 7, 2025
@openshift-ci openshift-ci bot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Apr 7, 2025

[NOTE]
====
As an exception, you do not have to pause the `machine-api-termination-handler` resource, as it does not deploy a new node if the node has been flagged as `notReady`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of naming a particular MachineHealthCheck, maybe call out the property of the MHC that means you don't need to pause it? In this case, I expect it's using the fatal Terminating as its unhealthyConditions, so something like:

As an exception, you do not have to pause MachineHealthChecks where the unhealthyConditions is a fatal condition like Terminating being True. Those Machines are toast anyway, so no use waiting before reaping.

or whatever the public-docs analog of that is?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically LGTM, but I'd like to see how to address @wking 's question.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense :) TY!

Copy link
Contributor Author

@obrown1205 obrown1205 Apr 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wking how about:

"Some MachineHealthChecks might not need to be paused. If your MachineHealthCheck (MHC) resource has a fatal condition met, new nodes cannot be deployed, and pausing that MHC is unnecessary."


[NOTE]
====
Some MachineHealthChecks might not need to be paused. If your MachineHealthCheck (MHC) resource has a fatal condition met, new nodes cannot be deployed, and pausing that MHC is unnecessary.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 [error] RedHat.TermsErrors: Use 'unrecoverable' rather than 'fatal'. For more information, see RedHat.TermsErrors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: maybe just drop "new nodes cannot be deployed", because it's not about new Nodes watched by the MHC, it's about the Node that the MHC is sad about being terminal. Maybe something like:

If your MachineHealthCheck (MHC) resource relies on unrecoverable conditions, pausing that MHC is unnecessary.

Copy link

openshift-ci bot commented Apr 8, 2025

@obrown1205: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants