KEP-4631: LoadBalancer Service Status Improvements, initial proposal #4632

danwinship · 2024-05-13T12:24:58Z

One-line PR description: Initial proposal for KEP-4631

Issue link: LoadBalancer Service Status Improvements #4631

Other comments:

While updating the e2e load balancer tests after the final removal of
in-tree cloud providers, we have run into three problems:

  1. The tests have hard-coded timeouts (that sometimes differ per
     cloud provider) for deciding how long to wait for the cloud
     provider to update the service. It would make much more sense for
     the cloud provider to just provide information about its status
     on the Service object, so the tests could just monitor that.

  2. The tests recognize that not all cloud providers can implement
     all load balancer features, but in the past this was handled by
     hard-coding the information into the individual tests. (e.g.,
     `e2eskipper.SkipUnlessProviderIs("gce", "gke", "aws")`) These
     skip rules no longer work in the providerless tree, and this
     approach doesn't scale anyway. OTOH, we don't want to have to
     provide a separate `Feature:` tag for each load balancer
     subfeature, or have each cloud provider have to maintain their
     own set of `-ginkgo.skip` rules. It would be better if the e2e
     tests themselves could just figure out, somehow, whether they
     were running under a cloud provider that intends to implement the
     feature they are testing, or a cloud provider that doesn't.

  3. In some cases, because the existing tests were only run on
     certain clouds, it is not clear what the expected semantics are
     on other clouds. For example, since `IPMode: Proxy` load
     balancers can't preserve the client source IP in the way that
     `ExternalTrafficPolicy: Local` expects, should they refuse to
     provision a load balancer at all, or should they provision a load
     balancer that fails to preserve the source IP?

This KEP proposes new additions to `service.Status.LoadBalancer` and
`service.Status.Conditions` to allow cloud providers to better
communicate the status of load balancer support and provisioning, and
new guidelines on how cloud providers should handle load balancers for
services that they cannot fully support.

/assign @aojea @thockin

k8s-ci-robot · 2024-05-13T12:25:01Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

danwinship · 2024-05-13T12:27:28Z

/sig cloud-provider
/sig testing

aojea · 2024-05-13T12:42:11Z

keps/sig-network/4631-loadbalancerstatus/README.md

+     approach doesn't scale anyway. OTOH, we don't want to have to
+     provide a separate `Feature:` tag for each load balancer
+     subfeature, or have each cloud provider have to maintain their
+     own set of `-ginkgo.skip` rules. It would be better if the e2e


I don't like "smart tests" that have different execution flows depending on what... current e2e for loadbalancers are not good tests as try to test a lot of different things in one execution ... the test should assert on a feature and a behavior, we may need to break existing tests down to see if we still need to do this

aojea · 2024-05-13T12:43:12Z

keps/sig-network/4631-loadbalancerstatus/README.md

+
+  3. In some cases, because the existing tests were only run on
+     certain clouds, it is not clear what the expected semantics are
+     on other clouds. For example, since `IPMode: Proxy` load


@thockin this was an interesting finding during the implementation of this mode in cloud-provider-kind, we need to try to flesh out more details before going to GA

The problem isn't with the IPMode KEP; the distinction between "proxy mode" and "VIP mode" already existed before that; it's just that before IPMode made it explicit, it was controlled implicitly by whether the LB set Hostname or IP.

(Which is to say, even if we dropped IPMode, the problem would still exist. For example, with the default cloud-provider-aws load balancers.)

To this specific feature: IMO, eTP=Local means what it says. It's not an awesomely designed feature because it presumes too much, but it says "if you are forwarding external traffic, only choose a local endpoint". Proxy-ish LB impls can't really retain the client IP, regardless of eTP, that doesn't change the meaning.

Looking at API docs for it, I think we can clarify:

// externalTrafficPolicy describes how nodes distribute service traffic they // receive on one of the Service's "externally-facing" addresses (NodePorts, // ExternalIPs, and LoadBalancer IPs). If set to "Local", the proxy will configure // the service in a way that assumes that external load balancers will take care // of balancing the service traffic between nodes, and so each node will deliver // traffic only to the node-local endpoints of the service, without masquerading - // the client source IP. (Traffic mistakenly sent to a node with no endpoints will + // the source IP. (Traffic mistakenly sent to a node with no endpoints will // be dropped.) The default value, "Cluster", uses the standard behavior of // routing to all endpoints evenly (possibly modified by topology and other // features). Note that traffic sent to an External IP or LoadBalancer IP from // within the cluster will always get "Cluster" semantics, but clients sending to // a NodePort from within the cluster may need to take traffic policy into account // when picking a node.

For a Proxy-ish LB the source IP is the proxy itself. eTP=Local should still preserve that. Whether it is useful or not is a question for end users.

aojea · 2024-05-13T12:44:08Z

keps/sig-network/4631-loadbalancerstatus/README.md

+- Allow cloud providers to indicate that they are working on
+  provisioning load balancer infrastructure, so that
+  users/operators/tests can distinguish the case of "it is taking a
+  while for the cloud to provision the load balancer" from "the cloud
+  has failed to provision a load balancer" and "there is no cloud
+  provider so load balancers don't work".


aojea · 2024-05-13T12:44:34Z

keps/sig-network/4631-loadbalancerstatus/README.md

+  provision a particular `LoadBalancer` service, and why.
+
+- Allow cloud providers to indicate when they have provided an
+  "imperfect" load balancer that the user may or may not consider to


kind of degraded mode?

cc: @bowei

keps/sig-network/4631-loadbalancerstatus/README.md

aojea · 2024-05-13T12:57:32Z

keps/sig-network/4631-loadbalancerstatus/README.md

+so this feels "un-Kubernetes-like", though at the same time, it's not
+like the current Kubernetes networking configuration situation is
+really great, and there has been some discussion of trying to provide
+more explicit and well-defined cluster networking configuration in the


yeah, this is in the "train has left the station" section :)

aojea · 2024-05-13T12:58:57Z

keps/sig-network/4631-loadbalancerstatus/README.md

+Would it be better to add a `Conditions` field to
+`v1.LoadBalancerIngress` so that we can specify conditions per element
+of `.Status.LoadBalancer.Ingress`? IOW, should it be possible for a
+load balancer to express that it has multiple IPs in different states?


I think we should only consider final states, and not partial ones

thockin · 2024-05-15T22:32:51Z

keps/sig-network/4631-loadbalancerstatus/README.md

+
+  3. In some cases, because the existing tests were only run on
+     certain clouds, it is not clear what the expected semantics are
+     on other clouds. For example, since `IPMode: Proxy` load


To this specific feature: IMO, eTP=Local means what it says. It's not an awesomely designed feature because it presumes too much, but it says "if you are forwarding external traffic, only choose a local endpoint". Proxy-ish LB impls can't really retain the client IP, regardless of eTP, that doesn't change the meaning.

Looking at API docs for it, I think we can clarify:

// externalTrafficPolicy describes how nodes distribute service traffic they // receive on one of the Service's "externally-facing" addresses (NodePorts, // ExternalIPs, and LoadBalancer IPs). If set to "Local", the proxy will configure // the service in a way that assumes that external load balancers will take care // of balancing the service traffic between nodes, and so each node will deliver // traffic only to the node-local endpoints of the service, without masquerading - // the client source IP. (Traffic mistakenly sent to a node with no endpoints will + // the source IP. (Traffic mistakenly sent to a node with no endpoints will // be dropped.) The default value, "Cluster", uses the standard behavior of // routing to all endpoints evenly (possibly modified by topology and other // features). Note that traffic sent to an External IP or LoadBalancer IP from // within the cluster will always get "Cluster" semantics, but clients sending to // a NodePort from within the cluster may need to take traffic policy into account // when picking a node.

For a Proxy-ish LB the source IP is the proxy itself. eTP=Local should still preserve that. Whether it is useful or not is a question for end users.

thockin · 2024-05-15T22:40:06Z

keps/sig-network/4631-loadbalancerstatus/README.md

+  - The Service has `AllocateLoadBalancerNodePorts: false`, but the
+    cloud only supports NodePort-based load balancing.
+
+  - The Service is `ExternalTrafficPolicy: Local` but the cloud cannot


I think this is not really about IP preservation but has everything to do with not implementing hCNP or some equivalent mechanism. Minor wording change requested

eTP=Local means what it says

lol, I thought that in the whole "what things should be traffic policy vs what things should be topology" debate (around PreferLocal) we had agreed that eTP=Local doesn't mean what it says, because the actual intent of the feature is "preserve client source IP", not "route traffic in a particular way"; the routing is purely a side effect of making it possible to implement "preserve client source IP". (IOW an implementation of eTP:Local that preserves client IP while doing routing in an unexpected way would be compliant, but an implementation that routes "correctly" but loses client IP is not.)

I think this is not really about IP preservation but has everything to do with not implementing hCNP or some equivalent mechanism.

What I was saying in that example was, if client IP preservation is considered a mandatory-to-implement aspect of eTP:Local, then there's no point in proxy-ish LBs implementing eTP:Local at all, and thus no reason for them to implement HCNP. (Whereas for VIP-ish LBs, there is really no good argument for not implementing HCNP.)

(If we don't think that client IP preservation is required for eTP:Local, then we should just assume all LBs will implement HCNP, and they're just buggy if they don't.)

keps/sig-network/4631-loadbalancerstatus/README.md

thockin · 2024-05-15T22:44:53Z

keps/sig-network/4631-loadbalancerstatus/README.md

+    supports single-stack load balancers, so it would only be able to
+    serve clients of one IP family.
+
+  - The Service is `ExternalTrafficPolicy: Local` but the cloud cannot


I think this one is not a warning - IPMode already indicates this, right?

I think actually no. Azure has this trick where the entire cloud network is aware of the load balancer NAT state, so the LB can DNAT packets to a NodePort without masquerading them, and then the reply packet will get un-DNAT-ed correctly even though logically speaking it doesn't pass through the LB.

keps/sig-network/4631-loadbalancerstatus/README.md

thockin · 2024-05-15T23:05:17Z

keps/sig-network/4631-loadbalancerstatus/README.md

+update, even if the value of `LoadBalancerProvisioning` remains
+`False`.
+
+#### Terminating Condition


I'm not conviced this is useful - what does a user do with this information?

keps/sig-network/4631-loadbalancerstatus/README.md

danwinship · 2024-05-24T14:32:34Z

Updates:

Updated for review comments, started filling in the rest of the template ("Test Plan", "Graduation", "Version Skew Strategy", PRR, etc)
Removed UNRESOLVED auto-skipping: Antonio agrees it's reasonable to have "tri-state" tests (pass/fail/skip) as long as the semantics are explicit. In some cases this will require rewriting the tests to put the "objectionable" bits at the start, so we can always skip immediately after the initial LB provisioning if the cloud doesn't support the feature.
The version skew section made me think about the behavior when a cloud provider is updated and finds itself in a cluster with pre-existing "bad" load balancers, and whether we should maybe add an explicit "unsupported" or "broken" condition.
Added a summary section to the top of the "Expected Behavior When the Cloud Provider Doesn't Know That It Can't Implement a Load Balancer" section, which I think is the big open question before this becomes implementable.
Added detailed notes to the e2e Test Plan section clarifying what skips will be needed for which tests. Also, while writing out the Test Plan section, I realized that to avoid regressing coverage, we're basically going to have to fork the LB tests, so we can have one hacky GCE-and-kind-only set, and one KEP-4631 set, which will initially be Alpha, but which will eventually replace the hacky ones.

k8s-triage-robot · 2024-09-04T19:55:31Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

soltysh · 2024-10-01T11:45:07Z

Based on this comment from Tim, I'm assuming the work will happen for 1.32. Please don't forget to fill in the PRR, and its file (see https://github.com/kubernetes/enhancements/blob/master/keps/prod-readiness/template/nnnn.yaml) for this KEP. Feel free to assign me for that approval.

danwinship · 2024-10-02T15:43:54Z

OK, big update, mostly resolving the "when the cloud provider can't implement a feature" question. I went with service.spec.requiredLoadBalancerFeatures, which I think works best overall.

danwinship · 2024-10-02T22:00:11Z

resolving the "when the cloud provider can't implement a feature" question

(I considered dropping that and trying to push through just the LoadBalancerProvisioning and LoadBalancerServing conditions, and saving the "feature discovery" part for later, but it's convenient for clients to be able to assume that if the cloud provider sets the provisioning/serving conditions, then that means it also supports RequiredLoadBalancerFeatures. If we did just the conditions now, then we'd have to add another condition (or something) later to allow the cloud provider to signal its support for feature discovery. Though we could still do that...)

aojea · 2024-10-07T13:14:54Z

keps/sig-network/4631-loadbalancerstatus/README.md

+
+- Helpers for the new API implemented in `k8s.io/cloud-provider`
+
+- cloud-provider-kind and cloud-provider-gcp set conditions


I don't think there is people today to work with cloud-provider-gcp and review the changes, so this is not likely going to happen and cloud-provider-kind can not be the only thing implementing it

oh... hm, that's sort of annoying if true since it means we can't really test this fully within k8s CI. We can change this to say AWS or Azure instead, but who will actually end up testing it in that case?

this is the problem with Service loadbalancer, the last changes added to status, like IPMode and the relax validation on multiple TCP and UDP ports (it seems only Azure) were never implemented and they are not likely going to be implemented so I don't know if this is worth the effort ... in theory Gateway API is the things should be replacing this Service type

soltysh · 2024-10-08T12:02:05Z

Ad reminder that the PRR file is required for this PR
/hold
to make sure we don't merge that w/o it

danwinship · 2024-10-08T13:50:52Z

Yeah, I haven't done the PRR part of the KEP yet because we hadn't actually decided on the scope of the work yet. (The enhancements freeze snuck up on my and I'm assuming I've missed it for 1.32.)

soltysh · 2024-10-09T10:42:46Z

Yeah, I haven't done the PRR part of the KEP yet because we hadn't actually decided on the scope of the work yet. (The enhancements freeze snuck up on my and I'm assuming I've missed it for 1.32.)

There's still time, today and tomorrow. The freeze is 19:00 PDT Thursday 10th October 2024 😉

soltysh

Mostly the alpha PRR looks good, there are a few minor bits in tests (units + enablement/disablement) and that alpha criteria comment from Antonio.

soltysh · 2024-10-09T10:43:07Z

keps/sig-network/4631-loadbalancerstatus/README.md

+- [ ] (R) Production readiness review approved
+- [ ] "Implementation History" section is up-to-date for milestone
+- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
+- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes


Nit: make sure to check the appropriate boxes.

keps/sig-network/4631-loadbalancerstatus/README.md

soltysh · 2024-10-09T10:52:46Z

keps/sig-network/4631-loadbalancerstatus/README.md

+    provisioned, because of a problem with the underlying cloud
+    infrastructure (explained further in `Message`).
+
+#### The `LoadBalancerProvisioning` Condition


Does it make sense to have both LoadBalancerProvisioning and LoadBalancerServing with Status=False, Reason=Provisioning, why one and only one isn't sufficient? It seems this specific condition is more expressive, so I'd suggest dropping the latter. Having both might be confusing to users. Or asking differently, what's the difference between the two?

The difference between the two is that Provisioning tells you "is the cloud provider currently trying to bring the cloud infrastructure in line with the Service definition?" and Serving tells you "is there a working cloud load balancer?"

Note that all four combinations of values are meaningful.

Also, having a separate Provisioning condition lets you kubectl wait --for condition=LoadBalancerProvisioning=False (ie, wait until the cloud provider has either succeeded in provisioning a load balancer or else declared failure), which you couldn't do if the provisioning was a "step" of the Serving condition. (Well, you'd have to use --for jsonpath='...' with some expression to find a LoadBalancerServing condition (of any Value) whose Reason isn't Provisioning.)

All that said, Tim had the same objection so either I'm explaining this badly, or else no one else thinks the distinction is useful...

I've seen that text, and now it makes more sense, thank you. What confused me was that in this section you've used Provisioning and Serving which I assumed are the reason, not the whole condition. What would make it explicitly clear is to use the full name of the condition in that text.

I am fine with them, after reading that response, but I still want clarify on Serving (see above)

soltysh · 2024-10-09T10:57:32Z

keps/sig-network/4631-loadbalancerstatus/README.md

+
+There will be a few unit tests for validation/defaulting of the new
+API, but this feature can really only be tested via e2e tests, since
+it depends on external components.


Can you list the packages with their coverage, following the template.

soltysh · 2024-10-09T11:00:14Z

keps/sig-network/4631-loadbalancerstatus/README.md

+
+###### Are there any tests for feature enablement/disablement?
+
+TODO (we should make sure the cloud provider updates pre-existing


This answer is required for alpha.

soltysh · 2024-10-09T11:01:30Z

keps/sig-network/4631-loadbalancerstatus/README.md

+Pick one more of these and delete the rest.
+-->
+
+- [ ] Metrics


Not required for alpha, but it's nice to think through those.

soltysh · 2024-10-09T11:02:03Z

keps/sig-network/4631-loadbalancerstatus/README.md

+###### Will enabling / using this feature result in increasing size or count of the existing API objects?
+
+It will add 2 or 3 new conditions to Services of `Type: LoadBalancer`, and
+one new optional spec field.


Nit: can you estimate the size?

soltysh · 2024-10-09T11:02:16Z

keps/sig-network/4631-loadbalancerstatus/README.md

+- the first Kubernetes release where an initial version of the KEP was available
+- the version of Kubernetes where the KEP graduated to general availability
+- when the KEP was retired or superseded
+-->


This should be filled in.

soltysh · 2024-10-09T11:02:32Z

keps/sig-network/4631-loadbalancerstatus/README.md

+Why should this KEP _not_ be implemented?
+-->
+
+## Alternatives


Would be nice to have this.

thockin

I note that this does not have a PRR file.

Sending first-half notes now, will finish reading in a bit

thockin · 2024-10-10T21:44:26Z

keps/sig-network/4631-loadbalancerstatus/README.md

+balancer is handling connections, and `False` if it is not. (There are
+no defined semantics for an `Unknown` value of this condition.)
+
+(This means that at construction time, the `LoadBalancerServing`


Pedantically, the .Status.LoadBalancer field means "I have an IP allocated". It doesn't mean that traffic is flowing or that the IP is reachable yet.

For example, it may take seconds or even minutes for infrastructure to actually start receiving and routing traffic.

Does "LoadBalancerServing" mean that we have provisioned the LB or that someone has confirmed that the traffic is flowing (from where? what about LB source ranges?)

Does this line up with Gateway's conditions?

Somewhere else I claim that the e2e tests will expect to be able to connect immediately upon LoadBalancerServing becoming True (though we could change that).

I think my intuition was "Serving means that the cloud provider believes that the LB is accepting/redirecting connections". It definitely doesn't imply that the cloud provider has actually tried making a connection. (In addition to source ranges, there's the problem that maybe the cluster end of the service is currently not-ready, which shouldn't affect the LB condition.)

If there is internal status information that tells the cloud provider that the LB has been allocated but is not yet actually serving, then the cloud provider should not mark the service as LoadBalancerServing until the internal status information says it's serving.

If there's an allocated-but-not-yet-serving state, but the cloud provider doesn't have any way of figuring that out other than by trying to connect to the LB, then I guess it needs to call that state "serving"...

Does this line up with Gateway's conditions?

No... Hm... I must have been looking at an old version of the Gateway API before because I couldn't find any relevant conditions. But it looks like Gateway currently has:

Accepted: "The resource is valid enough to produce some configuration in the underlying data plane", which includes "Features being used by the resource are supported". But also "sufficient capacity exists on underlying infrastructure for the Gateway to be provisioned"

Ready: (Extended conformance) "traffic is ready to flow through the data plane immediately, not at some eventual point in the future"

Programmed: "configuration has been sent to the data plane, and should be ready soon."

The fact that Accepted implies that you won't get an error from the cloud while provisioning the LB implies to me that you can't set Accepted to True until the cloud has returned at least a provisional success status?

In which case it's not clear to me what the difference between Accepted and Programmed is...

thockin · 2024-10-10T21:46:16Z

keps/sig-network/4631-loadbalancerstatus/README.md

+way, the cloud provider must update at least the `LastTransitionTime`
+of the `LoadBalancerServing` condition. (For example, if the user
+changes the service's `LoadBalancerSourceRanges`, and the cloud
+provider updates the load balancer for this, it must update the


This is not generally a requirement of Conditions - I get why that might be valuable information but it seems wrong to overload "LastTransitionTime" to mean anything but "the last time this changed state"

ok. (I was thinking this could potentially help with the e2e tests but I don't actually have any specific use for it.)

thockin · 2024-10-10T21:48:39Z

keps/sig-network/4631-loadbalancerstatus/README.md

+    provisioned, because of a problem with the underlying cloud
+    infrastructure (explained further in `Message`).
+
+#### The `LoadBalancerProvisioning` Condition


I am fine with them, after reading that response, but I still want clarify on Serving (see above)

thockin · 2024-10-10T21:54:34Z

keps/sig-network/4631-loadbalancerstatus/README.md

+object, and `False` if it not. (There are no defined semantics for an
+`Unknown` value of this condition.)
+
+Any time any load-balancer-related field (or provider-specific


other fields are not really timestamped, so unless you know what time you made a change, this semantic is meaningless. If you neeeeed to know this, we should set something atomically in the REST handler and force providers to clear it (same as we are doing for in-place pod updates)

thockin · 2024-10-10T21:55:27Z

keps/sig-network/4631-loadbalancerstatus/README.md

+
+When a load balancer Service is first created (or a non-load-balancer
+Service is converted to `Type: LoadBalancer`), one of three things
+should immediately happen:


how immediately? If it takes 10 seconds for the controller to act, is that OK?

Why would the controller take 10 seconds to act?

The intent is "the controller should just do this before it starts doing anything potentially-time-consuming".

In the e2e tests we currently assume that if you make a change to a pod or service, that the endpointslice controller will act on that change "immediately" and the test will be able to observe a change to the corresponding EndpointSlice object within 30 seconds. I was planning to use the same requirement for cloud-providers setting the intitial conditions.

The loadbalancer controller interacts with the cloud API and infra, this calls to the cloud APIs use to be sync and block the workers ... on very large clusters this can take ~ 1 hour per LoadBalancer.
In our scale tests for loadbalancers we have to define at least the same number of workers in the loadbalancer controller as the number of loadbalancers that are going to be created in parallel on the tests, otherwise, you are not able to process any event until some worker is available.

This KEP would (necessarily) make that architecture "illegal". One of the major points of the conditions is to be able to distinguish between "no cloud provider is running" and "the cloud provider is taking an hour to allocate your load balancer" (and "the cloud provider already determined that it's not going to be able to allocate your load balancer for some reason"). So the cloud provider needs an informer event loop that doesn't block on the cloud API.

there is no need to change any architecture at all, is just the conditions need to be updated before it gets to the worker ...

thockin · 2024-10-10T21:57:03Z

keps/sig-network/4631-loadbalancerstatus/README.md

+because not all cloud providers will support the new conditions.
+However, since supporting these conditions will be a prereq for
+running the `[Feature:LoadBalancer]` e2e tests, we assume that
+eventually all cloud providers will support them.)


This sounds suspiciously like a breaking change

none of the tests are conformance and we don't have any stability guarantees around e2e tests.

But that was written when I had hoped to get this into 1.31, and given that that obviously didn't happen, it's probably better to leave the [Feature:LoadBalancer] tag with its current meaning ("run LB tests under the assumption that the cloud is cloud-provider-gcp; you can manually skip things if you aren't"), and add a new tag for the new ones.

thockin

I let this slip, and now it seems too late, but I am OK with the vast majority of this proposal.

/approve

thockin · 2024-10-10T23:10:52Z

keps/sig-network/4631-loadbalancerstatus/README.md

+running the `[Feature:LoadBalancer]` e2e tests, we assume that
+eventually all cloud providers will support them.)
+
+When the user updates load-balancer-relevant fields in a Service that


I assume LB impls should also remove these if the service is updated in a way that removes the LB?

thockin · 2024-10-10T23:16:26Z

keps/sig-network/4631-loadbalancerstatus/README.md

+    requested session affinity but the cloud provider has provisioned
+    a load balancer that does not support it.
+
+  - `Status=True, Reason=ExternalTrafficPolicyNotSupported`: The


I'm not sure on this one. IPMode=Proxy already covers this, no?

IPMode=VIP is necessary for implementing ExternalTrafficPolicy=Local but strictly speaking it's not sufficient. (The source IP could be lost for some other reason.)

Also, IPMode was intended as cloud-provider-to-service-proxy communication, while the conditions are cloud-provider-to-end-user communication. So even if they're technically redundant, they exist for different reasons?

I agree that it's not clearly necessary though. I could drop it.

k8s-ci-robot · 2024-10-10T23:24:15Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danwinship, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~keps/sig-network/OWNERS~~ [danwinship,thockin]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

aojea · 2024-10-11T12:05:01Z

I let this slip, and now it seems too late, but I am OK with the vast majority of this proposal.

/approve

/hold

The higher risk here is adding fields that nobody implements, @danwinship and I talked offline, and Dan has a good idea to simplify this proposal that can achieve the goals and solve the issues detailed here

aojea · 2024-12-10T09:33:12Z

keps/sig-network/4631-loadbalancerstatus/README.md

+  - `LoadBalancerServing`, to indicate explicitly that the load
+    balancer is (or isn't) serving traffic.


I kind of prefer LoadBalancerReady here, no strong opinion

danwinship · 2024-12-10T14:03:34Z

I need to get back to this but Antonio and I had sort of decided

We should redesign it with an eye toward being able to implement it provider-agnostically in k8s.io/cloud-provider.
- It should be possible to at least implement the "provisioning" status there, and probably the "serving" status.
- Having good Reason values in the case of provisioning failure will probably require adding some new well-known error values.
Don't worry about the feature-detection aspects of it for now.

aojea · 2024-12-10T21:41:48Z

agree, with the reduced scope it can be also good for a new contributor that want to drive it through the KEP process ...

soltysh

Comments from last PRR review still hold, don't forget the PRR 4631.yaml file.

soltysh · 2025-01-31T15:15:05Z

keps/sig-network/4631-loadbalancerstatus/kep.yaml

+
+# The milestone at which this feature was, or is targeted to be, at each stage.
+milestone:
+  alpha: "v1.32"


Update the versions

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 13, 2024

k8s-ci-robot assigned aojea and thockin May 13, 2024

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory labels May 13, 2024

k8s-ci-robot requested review from aojea and MikeZappa87 May 13, 2024 12:25

k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels May 13, 2024

k8s-ci-robot added sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels May 13, 2024

danwinship changed the title ~~KEP-4631: LoadBalancer Service Static Improvements, initial proposal~~ KEP-4631: LoadBalancer Service Status Improvements, initial proposal May 13, 2024

aojea reviewed May 13, 2024

View reviewed changes

keps/sig-network/4631-loadbalancerstatus/README.md Outdated Show resolved Hide resolved

aojea reviewed May 13, 2024

View reviewed changes

keps/sig-network/4631-loadbalancerstatus/README.md Show resolved Hide resolved

aojea reviewed May 13, 2024

View reviewed changes

keps/sig-network/4631-loadbalancerstatus/README.md Outdated Show resolved Hide resolved

aojea reviewed May 13, 2024

View reviewed changes

keps/sig-network/4631-loadbalancerstatus/README.md Outdated Show resolved Hide resolved

aojea reviewed May 13, 2024

View reviewed changes

danwinship mentioned this pull request May 13, 2024

LoadBalancer Service Status Improvements #4631

Open

4 tasks

thockin reviewed May 15, 2024

View reviewed changes

danwinship force-pushed the loadbalancerstatus branch from d5813da to a02030a Compare May 23, 2024 14:18

k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. and removed cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 23, 2024

danwinship force-pushed the loadbalancerstatus branch from a02030a to 795cdd5 Compare May 24, 2024 14:14

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels May 24, 2024

danwinship mentioned this pull request Jun 3, 2024

Support both Proxy and VIP mode load balancing kubernetes-sigs/cloud-provider-kind#77

Open

danwinship marked this pull request as ready for review June 6, 2024 19:33

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 6, 2024

k8s-ci-robot requested a review from thockin June 6, 2024 19:33

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 4, 2024

thockin removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 4, 2024

Initial proposal for KEP-4631 LoadBalancer Service Static Improvements

afad5f1

danwinship force-pushed the loadbalancerstatus branch from 795cdd5 to afad5f1 Compare October 2, 2024 15:42

aojea reviewed Oct 7, 2024

View reviewed changes

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 8, 2024

soltysh reviewed Oct 9, 2024

View reviewed changes

thockin reviewed Oct 10, 2024

View reviewed changes

aojea reviewed Dec 10, 2024

View reviewed changes

soltysh reviewed Jan 31, 2025

View reviewed changes


		- Helpers for the new API implemented in `k8s.io/cloud-provider`

		- cloud-provider-kind and cloud-provider-gcp set conditions


		###### Are there any tests for feature enablement/disablement?

		TODO (we should make sure the cloud provider updates pre-existing

		- `LoadBalancerServing`, to indicate explicitly that the load
		balancer is (or isn't) serving traffic.

KEP-4631: LoadBalancer Service Status Improvements, initial proposal #4632

Are you sure you want to change the base?

KEP-4631: LoadBalancer Service Status Improvements, initial proposal #4632

Conversation

danwinship commented May 13, 2024

k8s-ci-robot commented May 13, 2024

danwinship commented May 13, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danwinship commented May 24, 2024

k8s-triage-robot commented Sep 4, 2024

soltysh commented Oct 1, 2024

danwinship commented Oct 2, 2024

danwinship commented Oct 2, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

soltysh commented Oct 8, 2024

danwinship commented Oct 8, 2024

soltysh commented Oct 9, 2024

soltysh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danwinship Oct 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thockin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danwinship Oct 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thockin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k8s-ci-robot commented Oct 10, 2024

aojea commented Oct 11, 2024

Choose a reason for hiding this comment

danwinship commented Dec 10, 2024 • edited Loading

aojea commented Dec 10, 2024

soltysh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danwinship Oct 9, 2024 •

edited

Loading

danwinship Oct 11, 2024 •

edited

Loading

danwinship commented Dec 10, 2024 •

edited

Loading