Kuberntes cluster goes down if capsule is down

# Bug description

I have deployed Capsule with a single replica and noticed an issue if that single Capsule replica goes down[1], it brings the Kubernetes cluster down. 

After reviewing existing issues, it seems the `nodes.capsule.clastix.io` webhook causes the issue if Capsule is unreachable. As per [this]( https://github.com/projectcapsule/capsule/issues/597#issuecomment-1164404524) comment, I set the `failurePolicy` to `Ignore`. Subsequently, the worker nodes recovered[2], but the master nodes moved to `Ready,SchedulingDisabled` status. From the logs[3], I observed that the issue persisted because Capsule was down. To fix this, I had to set `failurePolicy` to `Ignore` for `owner.namespace.capsule.clastix.io` mutating webhook and uncordon the master nodes[4].

Can anyone help me understand if the behavior I encountered is expected when Capsule goes down in the environment? If so, how can it be avoided? Also, what functionality of capsule will be impacted by setting `failurePolicy` to `Ignore` for `owner.namespace.capsule.clastix.io` mutating webhook? 

Thanks in advance.


# Steps to reproduce:

- Deploy Capsule with a single replica.
- Scale it to 0 or cause an OOM by reducing the resources of the pod.
- Eventually, the pods will be evicted and the nodes will go down as kubelet will fail to update the node status to kube-api.

# Workaround:

- Set `failurePoliy` of `nodes.capsule.clastix.io` and `owner.namespace.capsule.clastix.io` webhook to `Ignore`.
- Uncordon the nodes if required.

# Expected behavior
- Kubernetes cluster shouldnt be impacted if capsule goes down

# Additional context
- Capsule version: 0.3.3
- Helm Chart version: capsule-0.4.5
- Kubernetes version: v1.25.15

[1]
```
{"L":"ERROR","T":"2024-07-16T05:55:47.125Z","C":"kubeutils/kube_utils.go:330","M":"failed to update node with newly added labels [failed try 1] [retrying in 20 seconds] : Internal error occurred: failed calling webhook \"nodes.capsule.clastix.io\": failed to call webhook: Post \"https://capsule-webhook-service.capsule-system.svc:443/nodes?timeout=30s\": dial tcp 10.11.113.114:443: connect: connection refused"}

# kubectl get nodes
NAME           STATUS                     ROLES    AGE   VERSION
10.239.0.121   Ready                      master   9d    v1.25.15
10.239.0.122   Ready                      master   9d    v1.25.15
10.239.0.123   Ready,SchedulingDisabled   master   9d    v1.25.15
10.239.0.124   NotReady                   worker   9d    v1.25.15
10.239.0.125   NotReady                   worker   9d    v1.25.15
```

[2]
```
# kubectl get nodes -w
NAME           STATUS                     ROLES    AGE   VERSION
10.239.0.121   Ready,SchedulingDisabled   master   9d    v1.25.15
10.239.0.122   Ready,SchedulingDisabled   master   9d    v1.25.15
10.239.0.123   Ready,SchedulingDisabled   master   9d    v1.25.15
10.239.0.124   Ready                      worker   9d    v1.25.15 <==
10.239.0.125   Ready                      worker   9d    v1.25.15 <==
```

[3]
```
024-07-17 16:22:53] Name: \"kubernetes-dashboard\", Namespace: \"\" [2024-07-17 16:22:53] for: \"STDIN\": error when patching \"STDIN\": Internal error occurred:
failed calling webhook \"owner.namespace.capsule.clastix.io\": failed to call webhook: Post \"https://capsule-webhook-service.capsule-system.svc:443/namespace-own
er-reference?timeout=30s\": dial tcp 10.11.179.188:443: connect: connection refused],}"}
```

[4]
```
# kubectl  get nodes -w
NAME           STATUS   ROLES    AGE   VERSION
10.239.0.121   Ready    master   9d    v1.25.15
10.239.0.122   Ready    master   9d    v1.25.15
10.239.0.123   Ready    master   9d    v1.25.15
10.239.0.124   Ready    worker   9d    v1.25.15
10.239.0.125   Ready    worker   9d    v1.25.15
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kuberntes cluster goes down if capsule is down #1135

Bug description

Steps to reproduce:

Workaround:

Expected behavior

Additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Kuberntes cluster goes down if capsule is down #1135

Description

Bug description

Steps to reproduce:

Workaround:

Expected behavior

Additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions