-
Notifications
You must be signed in to change notification settings - Fork 318
Add configurable connection draining timeout for L4 LoadBalancers #3022
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add configurable connection draining timeout for L4 LoadBalancers #3022
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: mattrobenolt The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Welcome @mattrobenolt! |
|
Hi @mattrobenolt. Thanks for your PR. I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
TortillaZHawaii
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, thanks for the PR.
I would like to understand what happens, if the user has modified the timeout on the created Backend Service resource through Cloud Console or CLI. Unfortunately we did somewhat allow this to happen (even if it wasn't supported) and this PR would break the expectations for those users.
It's also not clear what the implications are for the rest of the stack, I'm not sure if there won't be any issues somewhere unexpected.
/hold
/ok-to-test
pkg/l4annotations/service.go
Outdated
| L4LoggingConfigMapKey = "networking.gke.io/l4-logging-config-map" | ||
|
|
||
| // Service annotation key for specifying connection draining timeout in seconds for L4 backend services | ||
| ConnectionDrainingTimeoutKey = "networking.gke.io/connection-draining-timeout-sec" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use "networking.gke.io/connection-draining-timeout" and use time.Duration to allow specifying time with unit, like in https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#configurable-container-restart-delay
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure that makes sense in this context when GCP only accepts seconds as input in the API.
So what would it mean for a spec to say networking.gke.io/connection-draining-timeout = 100ms?
We could error on anything that is sub-second, and that'd enable us to use 5m or something else?
Is that more what you were thinking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure that makes sense in this context when GCP only accepts seconds as input in the API.
So what would it mean for a spec to say networking.gke.io/connection-draining-timeout = 100ms?
We need validation regardless of using int or duration.
and that'd enable us to use 5m or something else?
yes exactly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, I swapped this to just connection-draining-timeout with some validation for the acceptable range.
The expected behavior with this PR is that value is preserved, and that's exactly our setup. We create the LoadBalancer, then need to modify it after with |
Adds `networking.gke.io/connection-draining-timeout-sec` annotation to configure connection draining timeout for L4 LoadBalancers with RBS enabled. Currently, L4 LoadBalancers use a hardcoded 30 second timeout. For workloads with long-lived connections, this is quite insufficient. GCE supports up to 3600 seconds, but there is no way to configure this in the Service definition. Note: for disclosure, I used the assistance of Claude Code to help implement this PR, but I have reviewed everything personally to the best of my knowledge.
97ddad6 to
f4ea8ff
Compare
|
@TortillaZHawaii if you'd like to take another look, I think everything is addressed. Also added more testing around pre-existing values not being clobbered. |
Adds
networking.gke.io/connection-draining-timeout-secannotation to configure connection draining timeout for L4 LoadBalancers with RBS enabled.Currently, L4 LoadBalancers use a hardcoded 30 second timeout. For workloads with long-lived connections, this is quite insufficient. GCE supports up to 3600 seconds, but there is no way to configure this in the Service definition.
Note: for disclosure, I used the assistance of Claude Code to help implement this PR, but I have reviewed everything personally to the best of my knowledge.