CNF-17181: Extend Core reference configuration with NHC/SNR operators#659
CNF-17181: Extend Core reference configuration with NHC/SNR operators#659rdiscala wants to merge 1 commit intoopenshift-kni:mainfrom
Conversation
|
@rdiscala: This pull request references CNF-17181 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: rdiscala The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
| unhealthyConditions: | ||
| - duration: $duration # eg 60s | ||
| status: 'False' | ||
| type: Ready | ||
| - duration: $duration # eg 60s | ||
| status: 'Unknown' | ||
| type: Ready |
There was a problem hiding this comment.
Are these logically "AND" or "OR"? (would be helpful to add this to the note)
There was a problem hiding this comment.
The operator is OR. See this for loop in conditions.go. I'll add this information in the comment.
| # If the node cannot reach the API server within this time, it is | ||
| # considered to have lost connectivity. | ||
| # Tested value: 15s | ||
| apiServerTimeout: $apiServerTimeout # eg 15s |
There was a problem hiding this comment.
There are a lot of configuration values here to set the various timeouts. Between the testing we have done (plus test lanes we put in place) and recommendations from the NHC/SNR team are we sufficiently confident with these values that we can set the tested values here as defaults and then provide guidance in the comments and RDS doc for any values which they may need to udpate? Without defaults here there is a significant amount of per-partner/customer tuning which is required.
(same here and in NodeHealthCheck config)
There was a problem hiding this comment.
Good point. The values provided here as comments are the same ones that were provided in CNF-17181. These were are also used successfully in the TCFE, using this playbook (the only difference is minHealthy set to 75% as the test was conducted on a cluster with 4 workers).
I've asked in the #forum-dragonfly for an additional review.
I can amend this file and define the setting values based on the comments.
@jnunyez: thoughts?
There was a problem hiding this comment.
The values for CNF-17181 were used in a specific setup but they could be different in another environment with for instance a different number of nodes. It would be beneficial to hear feedback from #forum-dragonfly to understand the relevant trade-offs for these parameters and give proper recommendation in RDS documentation.
| - path: reference-crs/required/node-health-check/SelfNodeRemediationTemplate.yaml | ||
| - path: reference-crs/required/node-health-check/SelfNodeRemediationConfig.yaml | ||
| - path: reference-crs/required/node-health-check/NodeHealthCheck.yaml |
There was a problem hiding this comment.
Any required patches should be included here as well. Ideally we will have good default values in the reference-crs and won't need any (or at least relatively few) patches here that the user would update.
There was a problem hiding this comment.
Should I add this statement as a comment?
There was a problem hiding this comment.
No, my intent was only that anything in the reference-crs above which requires user input (noted with field: $value) should be included here as a patch with the reference value. Ideally the defaults are considered correct for most use cases and we don't need anything here (the values are all fully defaulted in the base CRs). If there is a common update made here an example patch can be included but commented out.
There was a problem hiding this comment.
I am not aware of any additional patches we need to reference here.
b678aac to
3ba131b
Compare
|
Thanks for your pull request. Before we can look at it, you'll need to add a 'DCO signoff' to your commits. 📝 Please follow instructions in the contributing guide to update your commits with the DCO Full details of the Developer Certificate of Origin can be found at developercertificate.org. The list of commits missing DCO signoff: DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Signed-off-by: Rigel Di Scala <rdiscala@redhat.com>
3ba131b to
0e850e2
Compare
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (16)
📝 WalkthroughWalkthroughAdds Node Health Check and Self Node Remediation operator deployment and configuration manifests across telco-core policies and reference resources. Two sets of manifests are provided: templated versions for kube-compare with dynamic spec rendering, and fully configured versions for reference-crs with explicit specifications for health monitoring and node remediation. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~15 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Related tests: