Skip to content

[BUG] Cluster creation & OpsRequest Reconfiguring races when PVC provisioning delays first Pod start (MySQL) #9789

@elderapo

Description

@elderapo

Describe the bug
Applying a MySQL Cluster and an OpsRequest (type: Reconfiguring with at least one restart-required parameter) in the same apply for new clusters leads to a crashloop/broken cluster when PVC provisioning delays the first Pod start. The OpsRequest is queued and processed by the operator before the MySQL cluster has completed its first boot. When the volume is finally provisioned and the Pod starts, the already-processed OpsRequest immediately triggers the restart-required reconfigure (e.g., innodb_buffer_pool_instances), and the component fails to complete initial bootstrap reliably.

To Reproduce

  1. Apply the following at once (single kubectl apply -f), using a storage class that takes a few seconds to provision a PVC:

    ---
    kind: Namespace
    apiVersion: v1
    metadata:
      name: kubeblocks-test
    ---
    apiVersion: apps.kubeblocks.io/v1
    kind: Cluster
    metadata:
      name: cluster1
      namespace: kubeblocks-test
    spec:
      clusterDef: mysql
      topology: semisync
      terminationPolicy: Delete
      componentSpecs:
        - name: mysql
          componentDef: "mysql-8.0"
          serviceVersion: 8.0.33
          replicas: 1
          volumeClaimTemplates:
            - name: data
              spec:
                accessModes: ["ReadWriteOnce"]
                resources:
                  requests:
                    storage: 10Gi
    ---
    apiVersion: operations.kubeblocks.io/v1alpha1
    kind: OpsRequest
    metadata:
      name: mysql-reconfiguring
      namespace: kubeblocks-test
    spec:
      clusterName: cluster1
      force: false
      reconfigures:
        - componentName: mysql
          parameters:
            - key: innodb_buffer_pool_instances
              value: "5"
      preConditionDeadlineSeconds: 60
      type: Reconfiguring
  2. Observe: PVC provisioning keeps the Pod at Pending; the OpsRequest is processed and ready to execute before the Pod exists.

  3. When the Pod finally starts, the restart-required reconfigure is executed immediately (before first-boot completes), and the component fails to finish initialization / enters restart loops.

Expected behavior
The OpsRequest should not be processed until the MySQL Pod is running and all init containers have completed; applying Cluster + OpsRequest together for new clusters should be safe for GitOps workflows even when PVC provisioning is slow.

Additional context

  • Kubernetes: 1.33.5+k3s1
  • KubeBlocks: v1.0.1
  • MySQL add-on: 1.0.3
  • Storage class / CSI: hetzner-csi

Does not happen if

  • The OpsRequest is applied after the Cluster successfully bootstraps (all init containers successfully exit).
  • The Cluster has no volumeClaimTemplates (Pod starts quickly).

Metadata

Metadata

Assignees

Labels

kind/bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions