Skip to content

Conversation

@hvan
Copy link
Collaborator

@hvan hvan commented Aug 28, 2025

Description

The PVC reconciler is working as expected for KRaft controllers. I've tested the following scenarios:

  • add additional disk. <- this failed since the disk must be formatted with 'kafka-storage.sh' script.
  • increase disk size <- this works as expected - same as "brokers".

For the "add additional disk" scenario, this would only make since for controllers if we were to replace the disk due to changing the storage class or decreasing the disk size.

To change storage class:

  • for each controller, starting with 0
    • delete PVC then pod -> this will create new pvc and data is automcatically resync when it joins the controller quorum
    • for each controller wait until replication is completed for the new controller
      • validate with: kafka-metadata-quorum.sh --bootstrap-server $KAFKA_BOOTSTRAP describe --replication

Reduce disk size:

  • gets stuck in the operator
  • error":"failed to reconcile resource: one can not reduce the size of a PVC: could not modify pvc size","errorVerbose":"one can not reduce the size of a PVC: could not modify pvc size\nfailed to reconcile resource"
  • this behavior is the same for "brokers", so we will keep this expectation.

There is a Kafka doc on how to replace a disk: https://kafka.apache.org/documentation/#replace_disk. Through my testing, this did not work well with KOperator since it will try to add a second disk first which will get it into an unrecoverable state.

To ensure that we don't get the cluster into an unrecoverable state for KRaft controllers, we can add safeguards to prevent adding new disk or removing disk if it would cause an unrecoverable state for controllers.

Type of Change

  • Bug Fix
  • New Feature
  • Breaking Change
  • Refactor
  • Documentation
  • Other (please describe)

Checklist

  • I have read the contributing guidelines
  • Existing issues have been referenced (where applicable)
  • I have verified this change is not present in other open pull requests
  • Functionality is documented
  • All code style checks pass
  • New code contribution is covered by automated tests
  • All new and existing tests pass

@hvan hvan marked this pull request as ready for review August 28, 2025 18:13
@hvan hvan marked this pull request as draft August 28, 2025 18:13
@hvan hvan changed the title Add PVC safeguards for controllers PVC safeguards for controllers Aug 28, 2025
@hvan hvan marked this pull request as ready for review August 28, 2025 19:02
return nil
}

func handleDiskRemoval(ctx context.Context, pvcList *corev1.PersistentVolumeClaimList, desiredPvcs []*corev1.PersistentVolumeClaim,
Copy link
Collaborator Author

@hvan hvan Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactoring this to reduce code complexity (lint failure). No new logic here.

@hvan hvan changed the title PVC safeguards for controllers PVC safeguards for KRaft controllers Aug 29, 2025
@hvan hvan requested a review from dobrerazvan September 8, 2025 14:03
@hvan hvan merged commit 73a458e into master Sep 9, 2025
9 of 10 checks passed
@hvan hvan deleted the hvan-kraft-pvc branch September 9, 2025 20:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants