Open
Description
Describe the bug
The control plane load balancer sends requests to unhealthy machines. I have seen this cause kubeadm join
to fail on machines during bootstrapping.
The cause is that CAPVCD creates a load balancer whose "Graceful Disable Timeout" is set to the default, 1 minute. The timeout should be disabled. However, vmware/cloud-provider-for-cloud-director#292 does not allow the timeout to be disabled.
Reproduction steps
- Create a CAPVCD cluster.
- Confirm that the timeout is 1 minute in the control plane load balancer configuration:
- Remove the
kube-apiserver
static pod from one of the control plane nodes. - Send Kubernetes API requests to the load balancer. Some request should fail.
Expected behavior
The timeout should be disabled, and the load balancer should not send requests to an unhealthy machine.
Additional context
No response