CORENET-5882: network: gather routeadvertisements.k8s.ovn.org #485

jcaamano · 2025-04-10T10:44:06Z

No description provided.

Signed-off-by: Jaime Caamaño Ruiz <[email protected]>

openshift-ci-robot · 2025-04-10T10:44:10Z

@jcaamano: This pull request references CORENET-5882 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2025-04-10T13:38:59Z

@jcaamano: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-aws	`9ccf072`	link	true	`/test e2e-aws`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

tssurya · 2025-04-11T09:00:31Z

collection-scripts/gather_network_logs_basics

@@ -153,6 +153,11 @@ if [[ "${CNCC_DEPLOYMENT}" == "cloud-network-config-controller" ]]; then
  oc adm inspect ${log_collection_args} --dest-dir must-gather cloudprivateipconfigs.cloud.network.openshift.io
 fi

+ROUTE_ADVERTISEMENTS=$(oc get network.operator.openshift.io cluster -o=jsonpath='{.spec.defaultNetwork.ovnKubernetesConfig.routeAdvertisements}')


shouldn't this go into the check for only if network plugin is ovnk? i think we still support 3rd party plugins? but there we wouldn't have the .ovnkConfig set so value will be blank is it?

tssurya

/lgtm

openshift-ci · 2025-04-11T09:01:41Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jcaamano, tssurya
Once this PR has been reviewed and has the lgtm label, please assign sferich888 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

collection-scripts/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

tssurya

Don't know if you have time also add CUDNs.. but not related to this specific card, just nice to know the details of the networks that will show up on these RAs as selected

ingvagabund · 2025-04-11T09:51:11Z

What's the expected extra size of the collected data? What's the expected percentage of clusters that will have the resource installed and properly populated?

jcaamano · 2025-04-11T10:28:13Z

Don't know if you have time also add CUDNs.. but not related to this specific card, just nice to know the details of the networks that will show up on these RAs as selected

Should we be collecting UDNs/CUDNs as part of the corresponding epic already?

jcaamano · 2025-04-11T10:29:36Z

What's the expected extra size of the collected data? What's the expected percentage of clusters that will have the resource installed and properly populated?

It's just a small CRD that is not expected to scale much. It is an optional CRD not expected to be used by a lot of clusters.

jcaamano · 2025-04-11T10:30:16Z

Don't know if you have time also add CUDNs.. but not related to this specific card, just nice to know the details of the networks that will show up on these RAs as selected

Should we be collecting UDNs/CUDNs as part of the corresponding epic already?

In any case a prefer that handled on a separate PR.

ingvagabund · 2025-04-11T10:32:15Z

Can you be more specific in numbers? Few KB? Is it going to be available in more than 10% of clusters or less?

jcaamano · 2025-04-11T10:43:02Z

Can you be more specific in numbers? Few KB? Is it going to be available in more than 10% of clusters or less?

The CRD is small

meta
spec
  string
  selector
  selector
  list of max 5 selectors
status
  list of max 1 condition

At the current feature scale, on the worst possible scenario of users making a bad use of the CRD, there is tops 200 instances of it. Each instance 1KB tops on worst case?

It's an optional feature only available on metal clusters. And only a reduced subset of those will use this feature.

I can't come up with specific numbers. We need PMs for that.

I don't think it has any significance relative to the normal size of a MG.

jcaamano · 2025-04-11T10:43:25Z

Don't know if you have time also add CUDNs.. but not related to this specific card, just nice to know the details of the networks that will show up on these RAs as selected

Should we be collecting UDNs/CUDNs as part of the corresponding epic already?

In any case a prefer that handled on a separate PR.

Reason is, we can urgently make use of this to troubleshoot downstream CI.

ingvagabund · 2025-04-11T10:53:50Z

Thank you for the more detailed description. The current rule of thumb is to avoid adding extra collections into the default must-gather in cases where the number of clusters that actually will benefit from it is low. Instead, it's recommended to ask a customer to run additional commands to collect this extra data.

Historically, we have had many attempts to add "small" extra collections. Yet, these small additions tends to increment over time and the overall extra "baggage" can turn significant. We don't keep track of the total sizes across various clusters and OCP releases since it's quite individual. So we rather prefer not to add anything extra unless there's a visible benefit.

jcaamano · 2025-04-11T11:09:46Z

Thank you for the more detailed description. The current rule of thumb is to avoid adding extra collections into the default must-gather in cases where the number of clusters that actually will benefit from it is low. Instead, it's recommended to ask a customer to run additional commands to collect this extra data.

Historically, we have had many attempts to add "small" extra collections. Yet, these small additions tends to increment over time and the overall extra "baggage" can turn significant. We don't keep track of the total sizes across various clusters and OCP releases since it's quite individual. So we rather prefer not to add anything extra unless there's a visible benefit.

Can't disagree more but I guess it is the way it is.

Would it be fine to add it to gather_network_logs? That is optional.

tssurya · 2025-04-11T13:01:16Z

Thank you for the more detailed description. The current rule of thumb is to avoid adding extra collections into the default must-gather in cases where the number of clusters that actually will benefit from it is low. Instead, it's
recommended to ask a customer to run additional commands to collect this extra data.

Is there a must-gather docs page that outlines these rules? Asking the customer for extra data always annoys them cc @palonsoro can you please weigh in from support side the implications ? The problem is if we don't get core MG data along with the CR data which if collected separately will be hard to correlate.
Adding this to gather_network_logs as part of extra gather means support needs to always run that when doing networking bugs which is not what always happens.
I think given the size of OCP product growing and new features being added, the end user experience of collecting data for troubleshooting needs to be enhanced and burden shouldn't be on the end user to remember what all to collect, but instead tooling should be robust to consider plugins (which aren't optional flags that user will always forget to set.)

Historically, we have had many attempts to add "small" extra collections. Yet, these small additions tends to increment over time and the overall extra "baggage" can turn significant. We don't keep track of the total sizes across various clusters and OCP releases since it's quite individual.

I thought we had a ball park size, Because in the past we have worked with tooling team: https://docs.google.com/document/d/1C4dIRlMASWY8efrgfmGFn_sRfQl9UbFaCEs1bF-j4rw/edit?tab=t.0#bookmark=id.jsijvb10pq18 which used to track the sizes and there was a rule around "by how much is the load on must-gather time and size wise a PR adding" before we made that determination

So we rather prefer not to add anything extra unless there's a visible benefit.

I understand where you are coming from but in this case there is visible benefit of helping with debugging/troubleshooting if we get any bugs. In the past we used to be able to collect admin level CRs by default. Did this change in recent times?

So if you can reconsider this specific PR keeping the data Jaime said above that would really help us

network: gather routeadvertisements.k8s.ovn.org

9ccf072

Signed-off-by: Jaime Caamaño Ruiz <[email protected]>

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 10, 2025

openshift-ci bot requested review from ingvagabund and sferich888 April 10, 2025 10:44

tssurya reviewed Apr 11, 2025

View reviewed changes

openshift-ci bot assigned tssurya Apr 11, 2025

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 11, 2025

tssurya reviewed Apr 11, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CORENET-5882: network: gather routeadvertisements.k8s.ovn.org #485

CORENET-5882: network: gather routeadvertisements.k8s.ovn.org #485

jcaamano commented Apr 10, 2025

openshift-ci-robot commented Apr 10, 2025 •

edited by openshift-ci bot

Loading

openshift-ci bot commented Apr 10, 2025

tssurya Apr 11, 2025

tssurya left a comment

openshift-ci bot commented Apr 11, 2025

tssurya left a comment

ingvagabund commented Apr 11, 2025 •

edited

Loading

jcaamano commented Apr 11, 2025

jcaamano commented Apr 11, 2025

jcaamano commented Apr 11, 2025

ingvagabund commented Apr 11, 2025

jcaamano commented Apr 11, 2025

jcaamano commented Apr 11, 2025

ingvagabund commented Apr 11, 2025

jcaamano commented Apr 11, 2025

tssurya commented Apr 11, 2025 •

edited

Loading

CORENET-5882: network: gather routeadvertisements.k8s.ovn.org #485

Are you sure you want to change the base?

CORENET-5882: network: gather routeadvertisements.k8s.ovn.org #485

Conversation

jcaamano commented Apr 10, 2025

openshift-ci-robot commented Apr 10, 2025 • edited by openshift-ci bot Loading

openshift-ci bot commented Apr 10, 2025

tssurya Apr 11, 2025

Choose a reason for hiding this comment

tssurya left a comment

Choose a reason for hiding this comment

openshift-ci bot commented Apr 11, 2025

tssurya left a comment

Choose a reason for hiding this comment

ingvagabund commented Apr 11, 2025 • edited Loading

jcaamano commented Apr 11, 2025

jcaamano commented Apr 11, 2025

jcaamano commented Apr 11, 2025

ingvagabund commented Apr 11, 2025

jcaamano commented Apr 11, 2025

jcaamano commented Apr 11, 2025

ingvagabund commented Apr 11, 2025

jcaamano commented Apr 11, 2025

tssurya commented Apr 11, 2025 • edited Loading

openshift-ci-robot commented Apr 10, 2025 •

edited by openshift-ci bot

Loading

ingvagabund commented Apr 11, 2025 •

edited

Loading

tssurya commented Apr 11, 2025 •

edited

Loading