-
Notifications
You must be signed in to change notification settings - Fork 1.6k
KEP-5237: watch-based route controller reconciliation using informers #5289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-5237: watch-based route controller reconciliation using informers #5289
Conversation
lukasmetzner
commented
May 8, 2025
- One-line PR description: Introduce a feature gate to enable informer-based reconciliation in the routes' controller of cloud-controller-manager, reducing API calls and improving efficiency.
- Issue link: CCM: watch-based route controller reconciliation using informers #5237
- Other comments:
|
Welcome @lukasmetzner! |
|
Hi @lukasmetzner. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/cc @elmiko @JoelSpeed |
|
/ok-to-test |
elmiko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the core concepts make sense to me, i think we should clean up the language around the term "cloud provider". in some places we use it to mean the controllers (ie the ccm), and other places we use it to mean the infrastructure provider (eg aws, azure, gcp), and we also refer to the framework as well.
we also need to make some decisions about the open questions. i wonder if we should go over these questions at the next sig meeting?
keps/sig-cloud-provider/5237-watch-based-route-controller-reconciliation/README.md
Outdated
Show resolved
Hide resolved
keps/sig-cloud-provider/5237-watch-based-route-controller-reconciliation/README.md
Outdated
Show resolved
Hide resolved
keps/sig-cloud-provider/5237-watch-based-route-controller-reconciliation/README.md
Outdated
Show resolved
Hide resolved
keps/sig-cloud-provider/5237-watch-based-route-controller-reconciliation/README.md
Outdated
Show resolved
Hide resolved
keps/sig-cloud-provider/5237-watch-based-route-controller-reconciliation/README.md
Outdated
Show resolved
Hide resolved
keps/sig-cloud-provider/5237-watch-based-route-controller-reconciliation/README.md
Outdated
Show resolved
Hide resolved
keps/sig-cloud-provider/5237-watch-based-route-controller-reconciliation/README.md
Outdated
Show resolved
Hide resolved
keps/sig-cloud-provider/5237-watch-based-route-controller-reconciliation/README.md
Outdated
Show resolved
Hide resolved
Co-authored-by: Michael McCune <[email protected]>
|
@elmiko If possible, I’d love to work through the open questions here in the PR so we can keep things moving, and then have a discussion in the next SIG meeting. Otherwise, we might end up losing a two of weeks of time. What do you think? Open Questions:
|
|
i'm fine to continue the discussions here.
i think 12h sounds fine to me.
i'll have to think about this a little more, i have a feeling that those are good to start with.
i like the idea of adjusting the default for the my only issue with using |
To my understanding, both the service and node controllers are already watch-based and should utilize the |
|
ack, thank you @lukasmetzner that makes sense to me. it seems we should focus on |
|
/lgtm |
|
@elmiko @JoelSpeed As I have got lgtms from both of you with a small discussion remaining, shall I fill out the production readiness questionnaire? The next PRR freeze for v1.35 is on the 9th of October. If possible, I would like to target v1.35. |
|
yeah, that sounds good to me @lukasmetzner |
|
I saw that we already had the questionnaire filled out, so I moved it towards prod-readiness. IMO we could also add a metric for A/B testing in beta (cc @JoelSpeed)? I would like to avoid missing the PRR freeze because of that :D I already got @elmiko Do you know if the Since I’m quite new here, I hope I’ve followed the process correctly. Apologies in advance if I’ve missed anything. |
|
@lukasmetzner: Can not set label lead-opted-in: Must be member in one of these teams: [release-team-enhancements release-team-leads sig-api-machinery-leads sig-apps-leads sig-architecture-leads sig-auth-leads sig-autoscaling-leads sig-cli-leads sig-cloud-provider-leads sig-cluster-lifecycle-leads sig-contributor-experience-leads sig-docs-leads sig-instrumentation-leads sig-k8s-infra-leads sig-multicluster-leads sig-network-leads sig-node-leads sig-release-leads sig-scalability-leads sig-scheduling-leads sig-security-leads sig-storage-leads sig-testing-leads sig-windows-leads] In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
good question, i will investigate early next week. |
### Attach Load Balancer to a Subnet
If your CCM is configured for a Private Network, Load Balancers can now
join one of its subnets. To place a Load Balancer in a specific subnet,
use the new `load-balancer.hetzner.cloud/private-subnet-ip-range`
annotation. Learn more about this feature
[here](./docs/guides/load-balancer/private-networks.md).
### Watch-Based Route Reconciliation (Experimental)
Currently, route reconciliation is performed at a fixed interval of 30s.
This leads to unnecessary API requests, as a `GET /v1/networks/{id}`
call is triggered every 30s, even when no changes have occurred.
Upstream we have proposed an event-driven approach, similar to the
mechanism used by other controllers such as the Load Balancer
Controller. With this new approach, route reconciliation is triggered on
node additions, node deletions, or when the `PodCIDRs` or `Addresses` of
nodes change. Additionally, to ensure consistency, reconciliation will
still occur periodically at a randomized interval between 12 and 24
hours.
We are close to merging a [Kubernetes Enhancement Proposal
(KEP)](kubernetes/enhancements#5289).
Furthermore, a pull request containing the implementation is already
open in the Kubernetes repository.
#### Forked Upstream Libraries
In this release, we replaced the upstream `controller-manager` and
`cloud-provider` libraries with our own forks. These forks are based on
the upstream `v0.34.1` release (aligned with Kubernetes v1.34.1) and
include our patches on top.
#### Enabling the Feature
This feature is **disabled by default** and will not affect existing
deployments unless explicitly enabled. We recommend testing it in a
non-production environment before considering use in production.
As the KEP has not yet been reviewed for production readiness, the
feature gate name may change in an upcoming release. Since this feature
is marked as experimental, such changes will not be considered breaking.
To enable the feature, set the following Helm value:
`args.feature-gates=CloudControllerManagerWatchBasedRoutesReconciliation=true`
I'm happy with that as long as we can leverage the metric in the beta stage with the feature both enabled and disabled to make a comparison between the two implementations to determine the performance impact |
…10-08-2025 output
bd052f2 to
3270c52
Compare
|
PRR lgtm. Thank you for addressing metrics early, that always makes things easier for making alpha to beta decisions. /approve |
|
Do you have a sig-cloud-provider owner to look at this? |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: deads2k, elmiko, lukasmetzner The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/lgtm |
Just for posterity - this proposal LGTM too. |