-
Notifications
You must be signed in to change notification settings - Fork 5
Description
Describe the bug
The kubebuilder RBAC marker for nodes only grants list but not watch. When validateCellTopology in pkg/resolver/validation.go calls r.Client.List(ctx, &nodeList), controller-runtime's cache creates an informer for corev1.Node. Informers require both list and watch to function. The missing watch verb causes the informer to enter a retry loop, spamming the operator logs and blocking reconciliation loops.
The marker at api/v1alpha1/multigrescluster_types.go:44:
// +kubebuilder:rbac:groups="",resources=nodes,verbs=listShould be:
// +kubebuilder:rbac:groups="",resources=nodes,verbs=list;watchTo Reproduce
- Deploy the operator (v0.4.0 or v0.4.1) to an EKS cluster using the generated
config/rbac/role.yaml - Create a MultigresCluster with cells that have zone topology (e.g.,
zone: us-east-1d) - Trigger any reconciliation (scale-up, scale-down, or annotate to force reconcile)
- Check operator logs:
kubectl logs -n multigres-operator deployment/multigres-operator-controller-manager | grep "nodes is forbidden"
You'll see repeated errors:
nodes is forbidden: User "system:serviceaccount:multigres-operator:multigres-operator-controller-manager" cannot watch resource "nodes" in API group "" at the cluster scope
Scale-down operations may time out because the failed informer blocks the reconciliation loop.
Expected behaviour
The operator should be able to list and watch nodes without RBAC errors. The validateCellTopology function should work correctly, and reconciliation loops should not be blocked by informer failures.
System information
- Operator version: v0.4.0, v0.4.1
- Kubernetes: EKS 1.31
Additional context
The validateCellTopology function itself handles the failure gracefully, as it returns nil and skips topology validation if it can't list nodes, but the underlying informer retry loop is the problem. It pollutes logs and can block reconciliation.
Workaround: manually patch the ClusterRole:
kubectl patch clusterrole multigres-operator-manager-role --type=json \
-p '[{"op":"add","path":"/rules/2/verbs/-","value":"watch"}]'