Skip to content

Commit f24b10b

Browse files
authored
Optional removal of node taint on successful IP assignment (#146)
* feat: Remove taint key from node if provided * refactor: Remove logger dependency from Tainter * feat: Unit tests for Tainter implementation * chore: Add section to README about Node Taints feature * fix: Log warning when taint key not found on node * fix: Add missing operator property to toleration in readme * feat: Update Helm chart to support TAINT_KEY feature * fix: Suppress linter
1 parent a5d4618 commit f24b10b

File tree

8 files changed

+423
-1
lines changed

8 files changed

+423
-1
lines changed

README.md

+40-1
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ To enable IPv6 support, set the `ipv6` flag (or set `IPV6` environment variable)
4848

4949
### Kubernetes Service Account
5050

51-
KubeIP requires a Kubernetes service account with the following permissions:
51+
KubeIP requires a Kubernetes service account with at least the following permissions:
5252

5353
```yaml
5454
apiVersion: v1
@@ -129,6 +129,44 @@ spec:
129129
value: "true"
130130
```
131131
132+
### Node Taints
133+
134+
KubeIP can be configured to attempt removal of a Taint Key from its node once the static IP has been successfully assigned, preventing workloads from being scheduled on the node until it has successfully received a static IP address. This can be useful, for example, in cases where the workload must call resources with IP-whitelisting, to prevent race conditions between KubeIP and the workload on newly provisioned nodes.
135+
136+
To enable this feature, set the `taint-key` configuration parameter (See [How to run KubeIP](#how-to-run-kubeip)) to the taint key that should be removed. Then add a toleration to the KubeIP DaemonSet, so that it itself can be scheduled on the tainted nodes. For example, given that new nodes are created with a taint key of `kubeip.com/not-ready`:
137+
138+
```diff
139+
kind: DaemonSet
140+
spec:
141+
template:
142+
spec:
143+
serviceAccountName: kubeip-service-account
144+
+ tolerations:
145+
+ - key: kubeip.com/not-ready
146+
+ operator: Exists
147+
+ effect: NoSchedule
148+
containers:
149+
- name: kubeip
150+
image: doitintl/kubeip-agent
151+
env:
152+
+ - name: TAINT_KEY
153+
+ value: kubeip.com/not-ready
154+
```
155+
156+
The parameter has no default value, and if not set, KubeIP will not attempt to remove any taints. If the provided Taint Key is not present on the node, KubeIP will simply log this fact and continue normally without attempting to remove it. If the Taint Key is present, but removing it fails for some reason, KubeIP will release the IP address back into the pool before restarting and trying again.
157+
158+
Using this feature requires KubeIP to have permission to patch nodes. To use this feature, the `ClusterRole` resource rules need to be updated. **Note that if this configuration option is not set, KubeIP will not attempt to patch any nodes, and the change to the rules is not necessary.**
159+
160+
Please keep in mind that this will give KubeIP permission to make updates to any node in your cluster, so please make sure that this aligns with your security requirements before enabling this feature!
161+
162+
```diff
163+
rules:
164+
- apiGroups: [ "" ]
165+
resources: [ "nodes" ]
166+
- verbs: [ "get" ]
167+
+ verbs: [ "get", "patch" ]
168+
```
169+
132170
### AWS
133171

134172
Make sure that KubeIP DaemonSet is deployed on nodes that have a public IP (node running in public subnet) and uses a Kubernetes service
@@ -231,6 +269,7 @@ OPTIONS:
231269
--project value name of the GCP project or the AWS account ID (not needed if running in node) [$PROJECT]
232270
--region value name of the GCP region or the AWS region (not needed if running in node) [$REGION]
233271
--release-on-exit release the static public IP address on exit (default: true) [$RELEASE_ON_EXIT]
272+
--taint-key value specify a taint key to remove from the node once the static public IP address is assigned [$TAINT_KEY]
234273
--retry-attempts value number of attempts to assign the static public IP address (default: 10) [$RETRY_ATTEMPTS]
235274
--retry-interval value when the agent fails to assign the static public IP address, it will retry after this interval (default: 5m0s) [$RETRY_INTERVAL]
236275
--lease-duration value duration of the kubernetes lease (default: 5) [$LEASE_DURATION]

chart/templates/clusterrole.yaml

+4
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,11 @@ metadata:
88
rules:
99
- apiGroups: [ "" ]
1010
resources: [ "nodes" ]
11+
{{- if .Values.rbac.allowNodesPatchPermission }}
12+
verbs: [ "get", "patch" ]
13+
{{- else }}
1114
verbs: [ "get" ]
15+
{{- end }}
1216
- apiGroups: [ "coordination.k8s.io" ]
1317
resources: [ "leases" ]
1418
verbs: [ "create", "delete", "get" ]

chart/templates/daemonset.yaml

+2
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,8 @@ spec:
4242
fieldPath: spec.nodeName
4343
- name: FILTER
4444
value: {{ .Values.daemonSet.env.FILTER | quote }}
45+
- name: TAINT_KEY
46+
value: {{ .Values.daemonSet.env.TAINT_KEY | quote }}
4547
- name: LOG_LEVEL
4648
value: {{ .Values.daemonSet.env.LOG_LEVEL | quote }}
4749
- name: LOG_JSON

chart/values.yaml

+2
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ serviceAccount:
2525
# Role-Based Access Control (RBAC) configuration.
2626
rbac:
2727
create: true
28+
allowNodesPatchPermission: false
2829

2930
# DaemonSet configuration.
3031
daemonSet:
@@ -35,6 +36,7 @@ daemonSet:
3536
kubeip: use
3637
env:
3738
FILTER: labels.kubeip=reserved;labels.environment=demo
39+
TAINT_KEY: ""
3840
LOG_LEVEL: debug
3941
LOG_JSON: true
4042
resources:

cmd/main.go

+26
Original file line numberDiff line numberDiff line change
@@ -174,6 +174,26 @@ func run(c context.Context, log *logrus.Entry, cfg *config.Config) error {
174174
return errors.Wrap(err, "assigning static public IP address")
175175
}
176176

177+
if cfg.TaintKey != "" {
178+
logger := log.WithField("taint-key", cfg.TaintKey)
179+
tainter := nd.NewTainter(clientset)
180+
181+
didRemoveTaint, err := tainter.RemoveTaintKey(ctx, n, cfg.TaintKey)
182+
if err != nil {
183+
logger.Error("removing taint key failed, releasing static public IP address")
184+
if releaseErr := releaseIP(assigner, n); releaseErr != nil { //nolint:contextcheck
185+
log.WithError(releaseErr).Error("releasing static public IP address after taint key removal failed")
186+
}
187+
return errors.Wrap(err, "removing node taint key")
188+
}
189+
190+
if didRemoveTaint {
191+
logger.Info("taint key removed successfully")
192+
} else {
193+
logger.Warning("taint key not present on node, skipped removal")
194+
}
195+
}
196+
177197
// pause the agent to prevent it from exiting immediately after assigning the static public IP address
178198
// wait for the context to be done: SIGTERM, SIGINT
179199
<-ctx.Done()
@@ -303,6 +323,12 @@ func main() {
303323
Category: "Configuration",
304324
Value: true,
305325
},
326+
&cli.StringFlag{
327+
Name: "taint-key",
328+
Usage: "specify a taint key to remove from the node once the static public IP address is assigned",
329+
EnvVars: []string{"TAINT_KEY"},
330+
Category: "Configuration",
331+
},
306332
&cli.StringFlag{
307333
Name: "log-level",
308334
Usage: "set log level (debug, info(*), warning, error, fatal, panic)",

internal/config/config.go

+3
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,8 @@ type Config struct {
3333
LeaseDuration int `json:"lease-duration"`
3434
// LeaseNamespace is the namespace of the kubernetes lease
3535
LeaseNamespace string `json:"lease-namespace"`
36+
// TaintKey is the taint key to remove from the node once the IP address is assigned
37+
TaintKey string `json:"taint-key"`
3638
}
3739

3840
func NewConfig(c *cli.Context) *Config {
@@ -50,5 +52,6 @@ func NewConfig(c *cli.Context) *Config {
5052
cfg.ReleaseOnExit = c.Bool("release-on-exit")
5153
cfg.LeaseDuration = c.Int("lease-duration")
5254
cfg.LeaseNamespace = c.String("lease-namespace")
55+
cfg.TaintKey = c.String("taint-key")
5356
return &cfg
5457
}

internal/node/tainter.go

+73
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
package node
2+
3+
import (
4+
"context"
5+
"encoding/json"
6+
"fmt"
7+
8+
"github.com/doitintl/kubeip/internal/types"
9+
"github.com/pkg/errors"
10+
v1 "k8s.io/api/core/v1"
11+
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
12+
typesv1 "k8s.io/apimachinery/pkg/types"
13+
"k8s.io/client-go/kubernetes"
14+
)
15+
16+
type Tainter interface {
17+
RemoveTaintKey(ctx context.Context, node *types.Node, taintKey string) (bool, error)
18+
}
19+
20+
type tainter struct {
21+
client kubernetes.Interface
22+
}
23+
24+
func deleteTaintsByKey(taints []v1.Taint, taintKey string) ([]v1.Taint, bool) {
25+
newTaints := []v1.Taint{}
26+
didDelete := false
27+
28+
for i := range taints {
29+
if taintKey == taints[i].Key {
30+
didDelete = true
31+
continue
32+
}
33+
newTaints = append(newTaints, taints[i])
34+
}
35+
36+
return newTaints, didDelete
37+
}
38+
39+
func NewTainter(client kubernetes.Interface) Tainter {
40+
return &tainter{
41+
client: client,
42+
}
43+
}
44+
45+
func (t *tainter) RemoveTaintKey(ctx context.Context, node *types.Node, taintKey string) (bool, error) {
46+
// get node object from API server
47+
n, err := t.client.CoreV1().Nodes().Get(ctx, node.Name, metav1.GetOptions{})
48+
if err != nil {
49+
return false, errors.Wrap(err, "failed to get kubernetes node")
50+
}
51+
52+
// Remove taint from the node representation
53+
newTaints, didDelete := deleteTaintsByKey(n.Spec.Taints, taintKey)
54+
if !didDelete {
55+
return false, nil
56+
}
57+
58+
// Marshal the remaining taints of the node into json format for patching.
59+
// The remaining taints may be empty, and that will result in an empty json array "[]"
60+
newTaintsMarshaled, err := json.Marshal(newTaints)
61+
if err != nil {
62+
return false, errors.Wrap(err, "failed to marshal new taints")
63+
}
64+
65+
// Patch the node with only the remaining taints
66+
patch := fmt.Sprintf(`{"spec":{"taints":%v}}`, string(newTaintsMarshaled))
67+
_, err = t.client.CoreV1().Nodes().Patch(ctx, node.Name, typesv1.MergePatchType, []byte(patch), metav1.PatchOptions{})
68+
if err != nil {
69+
return false, errors.Wrap(err, "failed to patch node taints")
70+
}
71+
72+
return true, nil
73+
}

0 commit comments

Comments
 (0)