Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CORENET-5882: network: gather routeadvertisements.k8s.ovn.org #485

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jcaamano
Copy link

No description provided.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 10, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Apr 10, 2025

@jcaamano: This pull request references CORENET-5882 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

openshift-ci bot commented Apr 10, 2025

@jcaamano: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws 9ccf072 link true /test e2e-aws

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@@ -153,6 +153,11 @@ if [[ "${CNCC_DEPLOYMENT}" == "cloud-network-config-controller" ]]; then
oc adm inspect ${log_collection_args} --dest-dir must-gather cloudprivateipconfigs.cloud.network.openshift.io
fi

ROUTE_ADVERTISEMENTS=$(oc get network.operator.openshift.io cluster -o=jsonpath='{.spec.defaultNetwork.ovnKubernetesConfig.routeAdvertisements}')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this go into the check for only if network plugin is ovnk? i think we still support 3rd party plugins? but there we wouldn't have the .ovnkConfig set so value will be blank is it?

Copy link
Contributor

@tssurya tssurya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 11, 2025
Copy link
Contributor

openshift-ci bot commented Apr 11, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jcaamano, tssurya
Once this PR has been reviewed and has the lgtm label, please assign sferich888 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Contributor

@tssurya tssurya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know if you have time also add CUDNs.. but not related to this specific card, just nice to know the details of the networks that will show up on these RAs as selected

@ingvagabund
Copy link
Member

ingvagabund commented Apr 11, 2025

What's the expected extra size of the collected data? What's the expected percentage of clusters that will have the resource installed and properly populated?

@jcaamano
Copy link
Author

Don't know if you have time also add CUDNs.. but not related to this specific card, just nice to know the details of the networks that will show up on these RAs as selected

Should we be collecting UDNs/CUDNs as part of the corresponding epic already?

@jcaamano
Copy link
Author

What's the expected extra size of the collected data? What's the expected percentage of clusters that will have the resource installed and properly populated?

It's just a small CRD that is not expected to scale much. It is an optional CRD not expected to be used by a lot of clusters.

@jcaamano
Copy link
Author

Don't know if you have time also add CUDNs.. but not related to this specific card, just nice to know the details of the networks that will show up on these RAs as selected

Should we be collecting UDNs/CUDNs as part of the corresponding epic already?

In any case a prefer that handled on a separate PR.

@ingvagabund
Copy link
Member

Can you be more specific in numbers? Few KB? Is it going to be available in more than 10% of clusters or less?

@jcaamano
Copy link
Author

Can you be more specific in numbers? Few KB? Is it going to be available in more than 10% of clusters or less?

The CRD is small

meta
spec
  string
  selector
  selector
  list of max 5 selectors
status
  list of max 1 condition

At the current feature scale, on the worst possible scenario of users making a bad use of the CRD, there is tops 200 instances of it. Each instance 1KB tops on worst case?

It's an optional feature only available on metal clusters. And only a reduced subset of those will use this feature.

I can't come up with specific numbers. We need PMs for that.

I don't think it has any significance relative to the normal size of a MG.

@jcaamano
Copy link
Author

Don't know if you have time also add CUDNs.. but not related to this specific card, just nice to know the details of the networks that will show up on these RAs as selected

Should we be collecting UDNs/CUDNs as part of the corresponding epic already?

In any case a prefer that handled on a separate PR.

Reason is, we can urgently make use of this to troubleshoot downstream CI.

@ingvagabund
Copy link
Member

Thank you for the more detailed description. The current rule of thumb is to avoid adding extra collections into the default must-gather in cases where the number of clusters that actually will benefit from it is low. Instead, it's recommended to ask a customer to run additional commands to collect this extra data.

Historically, we have had many attempts to add "small" extra collections. Yet, these small additions tends to increment over time and the overall extra "baggage" can turn significant. We don't keep track of the total sizes across various clusters and OCP releases since it's quite individual. So we rather prefer not to add anything extra unless there's a visible benefit.

@jcaamano
Copy link
Author

Thank you for the more detailed description. The current rule of thumb is to avoid adding extra collections into the default must-gather in cases where the number of clusters that actually will benefit from it is low. Instead, it's recommended to ask a customer to run additional commands to collect this extra data.

Historically, we have had many attempts to add "small" extra collections. Yet, these small additions tends to increment over time and the overall extra "baggage" can turn significant. We don't keep track of the total sizes across various clusters and OCP releases since it's quite individual. So we rather prefer not to add anything extra unless there's a visible benefit.

Can't disagree more but I guess it is the way it is.

Would it be fine to add it to gather_network_logs? That is optional.

@tssurya
Copy link
Contributor

tssurya commented Apr 11, 2025

Thank you for the more detailed description. The current rule of thumb is to avoid adding extra collections into the default must-gather in cases where the number of clusters that actually will benefit from it is low. Instead, it's
recommended to ask a customer to run additional commands to collect this extra data.

Is there a must-gather docs page that outlines these rules? Asking the customer for extra data always annoys them cc @palonsoro can you please weigh in from support side the implications ? The problem is if we don't get core MG data along with the CR data which if collected separately will be hard to correlate.
Adding this to gather_network_logs as part of extra gather means support needs to always run that when doing networking bugs which is not what always happens.
I think given the size of OCP product growing and new features being added, the end user experience of collecting data for troubleshooting needs to be enhanced and burden shouldn't be on the end user to remember what all to collect, but instead tooling should be robust to consider plugins (which aren't optional flags that user will always forget to set.)

Historically, we have had many attempts to add "small" extra collections. Yet, these small additions tends to increment over time and the overall extra "baggage" can turn significant. We don't keep track of the total sizes across various clusters and OCP releases since it's quite individual.

I thought we had a ball park size, Because in the past we have worked with tooling team: https://docs.google.com/document/d/1C4dIRlMASWY8efrgfmGFn_sRfQl9UbFaCEs1bF-j4rw/edit?tab=t.0#bookmark=id.jsijvb10pq18 which used to track the sizes and there was a rule around "by how much is the load on must-gather time and size wise a PR adding" before we made that determination

So we rather prefer not to add anything extra unless there's a visible benefit.

I understand where you are coming from but in this case there is visible benefit of helping with debugging/troubleshooting if we get any bugs. In the past we used to be able to collect admin level CRs by default. Did this change in recent times?

So if you can reconsider this specific PR keeping the data Jaime said above that would really help us

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants