[v/25.2] manage/k8s: document decommission timing (--decommission-wait-interval) (#1765)

david-yu · claude · web-flow · commit 433b3bbfa857 · 2026-06-25T08:21:20.000-05:00
Co-authored-by: Claude Opus 4.8 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/modules/manage/pages/kubernetes/k-decommission-brokers.adoc b/modules/manage/pages/kubernetes/k-decommission-brokers.adoc
@@ -466,12 +466,14 @@ helm upgrade --install redpanda-controller redpanda/operator \
   --namespace <namespace> \
   --set image.tag={latest-operator-version} \
   --create-namespace \
-  --set additionalCmdFlags={--additional-controllers="decommission"} \
+  --set "additionalCmdFlags={--additional-controllers=decommission}" \
   --set rbac.createAdditionalControllerCRs=true
 ----
 +
-- `--additional-controllers="decommission"`: Enables the Decommission controller.
+- `--additional-controllers=decommission`: Enables the Decommission controller.
 - `rbac.createAdditionalControllerCRs=true`: Creates the required RBAC rules for the Redpanda Operator to monitor the StatefulSet and update PVCs and PVs.
++
+TIP: To change how often the Decommission controller re-checks the cluster for brokers that need decommissioning, pass the `--decommission-wait-interval` flag through `additionalCmdFlags`. See <<decommission-timing>>.
 
 .. Configure a Redpanda resource with seven Redpanda brokers:
 +
@@ -644,6 +646,61 @@ kubectl logs <pod-name> --namespace <namespace> -c sidecars
 
 You can repeat this procedure to continue to scale down.
 
+[[decommission-timing]]
+== Tune automatic decommission timing
+
+The <<Automated,automatic decommissioner>> re-checks the cluster on a regular interval for brokers that need to be decommissioned. The setting that controls this interval, and any debounce window before the decommissioner acts, depends on how the controller is deployed: as the Decommission controller inside the Redpanda Operator, or as the broker decommissioner sidecar in a Helm-only deployment.
+
+[cols="2,1,4"]
+|===
+| Setting | Default | Description
+
+| `--decommission-wait-interval` (Operator; set through `additionalCmdFlags`)
+| `8s`
+| Requeue interval (`RequeueAfter`) for the Operator's Decommission controller: how often the controller re-checks the cluster for brokers that need decommissioning when a reconcile did not already schedule a sooner re-check.
+
+| `decommissionRequeueTimeout` (Helm sidecar; under `statefulset.sideCars.brokerDecommissioner`)
+| `10s`
+| How often the sidecar re-checks a cluster that already has a broker flagged for decommissioning.
+
+| `decommissionAfter` (Helm sidecar; under `statefulset.sideCars.brokerDecommissioner`)
+| `60s`
+| How long a broker must continuously meet the decommission conditions before the sidecar acts. This debounce window prevents acting on transient conditions, such as a broker that is briefly unreachable during a restart.
+|===
+
+=== Set the interval for the Operator
+
+The Operator's Decommission controller does not expose its interval as a dedicated Helm value. Instead, pass the `--decommission-wait-interval` flag through `additionalCmdFlags` when you install or upgrade the Operator:
+
+[,bash,subs="attributes+"]
+----
+helm upgrade --install redpanda-controller redpanda/operator \
+  --namespace <namespace> \
+  --create-namespace \
+  --set image.tag={latest-operator-version} \
+  --set "additionalCmdFlags={--additional-controllers=decommission,--decommission-wait-interval=30s}" \
+  --set rbac.createAdditionalControllerCRs=true
+----
+
+The flag accepts any Go duration string, such as `8s`, `30s`, or `2m`. The default is `8s`. After each reconcile, the controller logs the next scheduled run, and the `next run in` value reflects the configured interval:
+
+[.no-copy]
+----
+{"level":"info","logger":"DecommissionReconciler.Reconcile","msg":"successful reconciliation finished in 1m0s, next run in 30s","controller":"statefulset", ...}
+----
+
+=== Set the intervals for Helm
+
+For a Helm-only deployment, set the sidecar values directly under `statefulset.sideCars.brokerDecommissioner`. For a full example, see <<Automated,Use the BrokerDecommissioner>>.
+
+=== Guidance for adjusting the intervals
+
+* These settings control only how often the decommissioner *re-checks* for work and how long it waits before acting. They do not change how fast partition data is reallocated once a decommission begins. Reallocation throughput is governed by xref:reference:cluster-properties.adoc#raft_learner_recovery_rate[`raft_learner_recovery_rate`] and xref:reference:tunable-properties.adoc#partition_autobalancing_concurrent_moves[`partition_autobalancing_concurrent_moves`].
+* This interval is the *periodic* re-check cadence. A scale-in that you initiate by reducing `statefulset.replicas` is detected from a StatefulSet watch event and acted on promptly, so raising the interval does not delay a routine scale-in. The interval primarily determines how quickly the controller notices conditions that arise without a triggering event, such as a broker that becomes unreachable.
+* Increase the re-check interval to reduce reconcile frequency, and the associated log and Admin API traffic, on large or stable clusters. Decrease it for faster detection of brokers that need decommissioning.
+* For Helm (sidecar) deployments, keep `decommissionRequeueTimeout` smaller than `decommissionAfter` -- ideally well below it -- so the sidecar re-evaluates the cluster at least once within the debounce window. If the re-check interval is close to or larger than `decommissionAfter`, the decommissioner may wait up to one additional interval before acting. The Kubernetes controller-runtime work queue also adds a small amount of jitter.
+* A single Operator reconcile can take up to about a minute because the Decommission controller verifies that cluster health is stable before it commits to a decommission. This is expected, and is independent of the `--decommission-wait-interval` value.
+
 == Troubleshooting
 
 If the decommissioning process is not making progress, investigate the following potential issues: