Add configurable connection draining timeout for L4 LoadBalancers#3022
Add configurable connection draining timeout for L4 LoadBalancers#3022mattrobenolt wants to merge 2 commits intokubernetes:masterfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: mattrobenolt The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Welcome @mattrobenolt! |
|
Hi @mattrobenolt. Thanks for your PR. I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
TortillaZHawaii
left a comment
There was a problem hiding this comment.
Hi, thanks for the PR.
I would like to understand what happens, if the user has modified the timeout on the created Backend Service resource through Cloud Console or CLI. Unfortunately we did somewhat allow this to happen (even if it wasn't supported) and this PR would break the expectations for those users.
It's also not clear what the implications are for the rest of the stack, I'm not sure if there won't be any issues somewhere unexpected.
/hold
/ok-to-test
pkg/l4annotations/service.go
Outdated
| L4LoggingConfigMapKey = "networking.gke.io/l4-logging-config-map" | ||
|
|
||
| // Service annotation key for specifying connection draining timeout in seconds for L4 backend services | ||
| ConnectionDrainingTimeoutKey = "networking.gke.io/connection-draining-timeout-sec" |
There was a problem hiding this comment.
We could use "networking.gke.io/connection-draining-timeout" and use time.Duration to allow specifying time with unit, like in https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#configurable-container-restart-delay
There was a problem hiding this comment.
I'm not entirely sure that makes sense in this context when GCP only accepts seconds as input in the API.
So what would it mean for a spec to say networking.gke.io/connection-draining-timeout = 100ms?
We could error on anything that is sub-second, and that'd enable us to use 5m or something else?
Is that more what you were thinking?
There was a problem hiding this comment.
I'm not entirely sure that makes sense in this context when GCP only accepts seconds as input in the API.
So what would it mean for a spec to say networking.gke.io/connection-draining-timeout = 100ms?
We need validation regardless of using int or duration.
and that'd enable us to use 5m or something else?
yes exactly
There was a problem hiding this comment.
Done, I swapped this to just connection-draining-timeout with some validation for the acceptable range.
The expected behavior with this PR is that value is preserved, and that's exactly our setup. We create the LoadBalancer, then need to modify it after with |
Adds `networking.gke.io/connection-draining-timeout-sec` annotation to configure connection draining timeout for L4 LoadBalancers with RBS enabled. Currently, L4 LoadBalancers use a hardcoded 30 second timeout. For workloads with long-lived connections, this is quite insufficient. GCE supports up to 3600 seconds, but there is no way to configure this in the Service definition. Note: for disclosure, I used the assistance of Claude Code to help implement this PR, but I have reviewed everything personally to the best of my knowledge.
97ddad6 to
f4ea8ff
Compare
|
@TortillaZHawaii if you'd like to take another look, I think everything is addressed. Also added more testing around pre-existing values not being clobbered. |
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Adds
networking.gke.io/connection-draining-timeout-secannotation to configure connection draining timeout for L4 LoadBalancers with RBS enabled.Currently, L4 LoadBalancers use a hardcoded 30 second timeout. For workloads with long-lived connections, this is quite insufficient. GCE supports up to 3600 seconds, but there is no way to configure this in the Service definition.
Note: for disclosure, I used the assistance of Claude Code to help implement this PR, but I have reviewed everything personally to the best of my knowledge.