Describe the bug
- If the user mistakes the
override.statefulSet.spec.volumeClaimTemplates for a merge operation rather than a replace you will have a cluster that cannot reconcile (permanently).
- If the user omits
spec.resources.requests.storage it is interpreted as 0 by the operator
- The error logged by the operator is:
shrinking persistent volumes is not supported
- The error doesn't aid in debugging this configuration error; and troubleshooting isn't straight forward if you only inspect the
StatefulSet and PVC -- you'd need to check the helm output and/or the Cluster CR.
Symptoms:
- The operator reconciliation loop is continuously failing (every ~15 minutes based on those logs)
- Any changes to the RabbitMQCluster CR won't be applied (operator can't reconcile)
- Scaling (adding/removing nodes) would likely fail or behave unexpectedly
- Helm upgrades might appear successful but some changes won't take effect
Fixes suggested:
- Implement validation at the CRD level to prevent incomplete
VolumeClaimTemplate overrides
- Make the documentation explicit that
override is a replace rather than a merge (yes, this is implied by the name, but, LLMs are gonna LLM, and devs are going to use them 🙃 )
- Added helpful error messages in the operator logs to aid in troubleshooting configuration errors.
Fixes applied:
Logs
{
"container": "operator",
"controller": "rabbitmqcluster",
"controllerGroup": "rabbitmq.com",
"controllerKind": "RabbitmqCluster",
"error": "shrinking persistent volumes is not supported",
"level": "error",
"msg": "Reconciler error",
"name": "rabbitmq",
"namespace": "rabbitmq-system",
"pod": "rabbitmq-cluster-operator-5f8dc96c76-855k6",
"reconcileID": "aaa60dae-fb09-4ea9-a10a-9924c4e7da15",
"service_name": "rabbitmq-cluster-operator",
"stacktrace": "sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.21.0/pkg/internal/controller/controller.go:353\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.21.0/pkg/internal/controller/controller.go:300\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.21.0/pkg/internal/controller/controller.go:202",
"stream": "stderr",
"ts": "2025-12-09T21:08:26Z"
}
{
"container": "operator",
"controller": "rabbitmqcluster",
"controllerGroup": "rabbitmq.com",
"controllerKind": "RabbitmqCluster",
"error": "hit an error while scaling PVC capacity: shrinking persistent volumes is not supported",
"level": "error",
"msg": "Failed to scale PVCs: shrinking persistent volumes is not supported",
"name": "rabbitmq",
"namespace": "rabbitmq-system",
"pod": "rabbitmq-cluster-operator-5f8dc96c76-855k6",
"reconcileID": "aaa60dae-fb09-4ea9-a10a-9924c4e7da15",
"service_name": "rabbitmq-cluster-operator",
"stacktrace": "github.com/rabbitmq/cluster-operator/v2/controllers.(*RabbitmqClusterReconciler).reconcilePVC\n\t/workspace/controllers/reconcile_persistence.go:21\ngithub.com/rabbitmq/cluster-operator/v2/controllers.(*RabbitmqClusterReconciler).Reconcile\n\t/workspace/controllers/rabbitmqcluster_controller.go:225\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.21.0/pkg/internal/controller/controller.go:119\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.21.0/pkg/internal/controller/controller.go:340\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.21.0/pkg/internal/controller/controller.go:300\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.21.0/pkg/internal/controller/controller.go:202",
"stream": "stderr",
"ts": "2025-12-09T21:08:26Z"
}
{
"container": "operator",
"controller": "rabbitmqcluster",
"controllerGroup": "rabbitmq.com",
"controllerKind": "RabbitmqCluster",
"error": "unsupported operation",
"level": "error",
"msg": "shrinking persistent volumes is not supported",
"name": "rabbitmq",
"namespace": "rabbitmq-system",
"pod": "rabbitmq-cluster-operator-5f8dc96c76-855k6",
"reconcileID": "aaa60dae-fb09-4ea9-a10a-9924c4e7da15",
"service_name": "rabbitmq-cluster-operator",
"stacktrace": "github.com/rabbitmq/cluster-operator/v2/internal/scaling.PersistenceScaler.Scale\n\t/workspace/internal/scaling/scaling.go:52\ngithub.com/rabbitmq/cluster-operator/v2/controllers.(*RabbitmqClusterReconciler).reconcilePVC\n\t/workspace/controllers/reconcile_persistence.go:18\ngithub.com/rabbitmq/cluster-operator/v2/controllers.(*RabbitmqClusterReconciler).Reconcile\n\t/workspace/controllers/rabbitmqcluster_controller.go:225\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.21.0/pkg/internal/controller/controller.go:119\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.21.0/pkg/internal/controller/controller.go:340\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.21.0/pkg/internal/controller/controller.go:300\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.21.0/pkg/internal/controller/controller.go:202",
"stream": "stderr",
"ts": "2025-12-09T21:08:26Z"
}
Expected behavior
- Refuse invalid cluster specs at deploy time, rather than logging errors during reconciliation.
- Helpful error messages in the case of misconfiguration not caught by CRDs.
Version and environment information
- RabbitMQ: 4.1.3
- RabbitMQ Cluster Operator: 2.16.1
- Kubernetes: 1.33.5
- Cloud provider or hardware configuration: Azure AKS
Describe the bug
override.statefulSet.spec.volumeClaimTemplatesfor amergeoperation rather than areplaceyou will have a cluster that cannot reconcile (permanently).spec.resources.requests.storageit is interpreted as0by the operatorshrinking persistent volumes is not supportedStatefulSetandPVC-- you'd need to check the helm output and/or the Cluster CR.Symptoms:
Fixes suggested:
VolumeClaimTemplateoverridesoverrideis areplacerather than amerge(yes, this is implied by the name, but, LLMs are gonna LLM, and devs are going to use them 🙃 )Fixes applied:
specand provide helpful error messages when they don't #2024Logs
{ "container": "operator", "controller": "rabbitmqcluster", "controllerGroup": "rabbitmq.com", "controllerKind": "RabbitmqCluster", "error": "shrinking persistent volumes is not supported", "level": "error", "msg": "Reconciler error", "name": "rabbitmq", "namespace": "rabbitmq-system", "pod": "rabbitmq-cluster-operator-5f8dc96c76-855k6", "reconcileID": "aaa60dae-fb09-4ea9-a10a-9924c4e7da15", "service_name": "rabbitmq-cluster-operator", "stacktrace": "sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.21.0/pkg/internal/controller/controller.go:353\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.21.0/pkg/internal/controller/controller.go:300\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.21.0/pkg/internal/controller/controller.go:202", "stream": "stderr", "ts": "2025-12-09T21:08:26Z" } { "container": "operator", "controller": "rabbitmqcluster", "controllerGroup": "rabbitmq.com", "controllerKind": "RabbitmqCluster", "error": "hit an error while scaling PVC capacity: shrinking persistent volumes is not supported", "level": "error", "msg": "Failed to scale PVCs: shrinking persistent volumes is not supported", "name": "rabbitmq", "namespace": "rabbitmq-system", "pod": "rabbitmq-cluster-operator-5f8dc96c76-855k6", "reconcileID": "aaa60dae-fb09-4ea9-a10a-9924c4e7da15", "service_name": "rabbitmq-cluster-operator", "stacktrace": "github.com/rabbitmq/cluster-operator/v2/controllers.(*RabbitmqClusterReconciler).reconcilePVC\n\t/workspace/controllers/reconcile_persistence.go:21\ngithub.com/rabbitmq/cluster-operator/v2/controllers.(*RabbitmqClusterReconciler).Reconcile\n\t/workspace/controllers/rabbitmqcluster_controller.go:225\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.21.0/pkg/internal/controller/controller.go:119\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.21.0/pkg/internal/controller/controller.go:340\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.21.0/pkg/internal/controller/controller.go:300\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.21.0/pkg/internal/controller/controller.go:202", "stream": "stderr", "ts": "2025-12-09T21:08:26Z" } { "container": "operator", "controller": "rabbitmqcluster", "controllerGroup": "rabbitmq.com", "controllerKind": "RabbitmqCluster", "error": "unsupported operation", "level": "error", "msg": "shrinking persistent volumes is not supported", "name": "rabbitmq", "namespace": "rabbitmq-system", "pod": "rabbitmq-cluster-operator-5f8dc96c76-855k6", "reconcileID": "aaa60dae-fb09-4ea9-a10a-9924c4e7da15", "service_name": "rabbitmq-cluster-operator", "stacktrace": "github.com/rabbitmq/cluster-operator/v2/internal/scaling.PersistenceScaler.Scale\n\t/workspace/internal/scaling/scaling.go:52\ngithub.com/rabbitmq/cluster-operator/v2/controllers.(*RabbitmqClusterReconciler).reconcilePVC\n\t/workspace/controllers/reconcile_persistence.go:18\ngithub.com/rabbitmq/cluster-operator/v2/controllers.(*RabbitmqClusterReconciler).Reconcile\n\t/workspace/controllers/rabbitmqcluster_controller.go:225\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.21.0/pkg/internal/controller/controller.go:119\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.21.0/pkg/internal/controller/controller.go:340\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.21.0/pkg/internal/controller/controller.go:300\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.21.0/pkg/internal/controller/controller.go:202", "stream": "stderr", "ts": "2025-12-09T21:08:26Z" }Expected behavior
Version and environment information