Skip to content

Control Plane latency recommendations for reliable clusters #76677

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 339 commits into from

Conversation

shahsahil264
Copy link

@shahsahil264 shahsahil264 commented May 29, 2024

Add Documentation on Control Plane latency recommendations for reliable clusters

Version(s):

4.15+

Issue:

https://issues.redhat.com/browse/CHAOS-832

Link to docs preview:

QE review:

  • QE has approved this change.

Additional information:

@shahsahil264
Copy link
Author

@openshift/team-documentation

@openshift-ci openshift-ci bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label May 29, 2024
@shahsahil264
Copy link
Author

@chaitanyaenr

@ocpdocs-previewbot
Copy link

ocpdocs-previewbot commented May 29, 2024

🤖 Thu Jun 27 13:15:00 - Prow CI generated the docs preview:

https://76677--ocpdocs-pr.netlify.app/

@openshift-ci openshift-ci bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jun 12, 2024
Copy link
Member

@chaitanyaenr chaitanyaenr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chaitanyaenr
Copy link
Member

@mffiedler @ahardin-rh PTAL. Thanks.

@ahardin-rh
Copy link
Contributor

@chaitanyaenr Thank you!
@shahsahil264 What are the applicable OCP versions for this doc update and is there an engineering JIRA that we can associate with this work?
@GroceryBoyJr @sheriff-rh Can you please take a look? Thanks!

Copy link
Contributor

@GroceryBoyJr GroceryBoyJr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @shahsahil264 ! I have a few small items:

  • squash your commits and push to origin.
  • Tell us what versions of OpenShift this applies to? Typically 4.12+ Add that to your comment 0, the first comment in the PR so we can see it easily.
  • When you re-push as I have suggested above, PROW will generate a new preview link. Please put that in Comment 0 if you can please sir. I could not find your updates in the preview that PROW had already generated.
  • If you are working from a JIRA, please add that link to Comment 0
    I am excited, @shahsahil264 , your work is a wonderful addition to the docs!

[id="control-plane-latency_{context}"]
= Recommended Control Plane latency for reliable clusters

It's recommended to ensure that the latency between each of the control plane nodes is within 15ms to ensure performant and reliable cluster. Some of the metrics to keep track of includes etcd gRPC requests latency, fysnc latency and any critical alerts. Here are the PromQL queries:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
It's recommended to ensure that the latency between each of the control plane nodes is within 15ms to ensure performant and reliable cluster. Some of the metrics to keep track of includes etcd gRPC requests latency, fysnc latency and any critical alerts. Here are the PromQL queries:
Latency between each of the control plane nodes must be less than 15ms to ensure a well performing and reliable cluster. Some of the metrics to keep track of include etcd gRPC requests latency, fysnc latency and any critical alerts. Here are the PromQL queries:

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 27, 2024
@openshift-merge-robot
Copy link

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci openshift-ci bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 27, 2024
Copy link

openshift-ci bot commented Jun 27, 2024

@shahsahil264: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/validate-portal a107102 link true /test validate-portal
ci/prow/deploy-preview a107102 link true /test deploy-preview
ci/prow/validate-asciidoc a107102 link true /test validate-asciidoc

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@shahsahil264
Copy link
Author

Because of the issue on this branch, I am closing this PR and have opened up a new one: #78769

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.