Skip to content

Conversation

@hvan
Copy link
Collaborator

@hvan hvan commented Aug 6, 2025

Description

This PR is to address an issue where a KRaft controller was shutting down, but the pod was never restarted. By adding readiness/liveness probe to the kafka container (for Kraft controllers only), if these probes fail, the pod will restart automatically.

For readiness probe, we will check the controller listener port using TCP. When this port is open, the kafka container is ready for communication with other controllers.

For liveness probe, we will check the current-state metric provided by a JMX bean. The expected state is 'leader' or 'follower'. For any other states, we will consider it to be unhealthy.

Type of Change

  • Bug Fix
  • New Feature
  • Breaking Change
  • Refactor
  • Documentation
  • Other (please describe)

Checklist

  • I have read the contributing guidelines
  • Existing issues have been referenced (where applicable)
  • I have verified this change is not present in other open pull requests
  • Functionality is documented
  • All code style checks pass
  • New code contribution is covered by automated tests
  • All new and existing tests pass

@hvan hvan marked this pull request as ready for review August 12, 2025 19:31
b := broker.ReadOnlyConfig
trimmedConfig := strings.TrimSpace(b)

if strings.Contains(trimmedConfig, kafkautils.KafkaConfigSecurityInterBrokerProtocol+"=") {
Copy link
Collaborator Author

@hvan hvan Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change is unrelated to the health check. Just removing unnecessary logging.

@hvan hvan marked this pull request as draft August 27, 2025 12:56
@hvan hvan force-pushed the hvan-healthcheck branch from 145adf4 to 2516cd0 Compare August 27, 2025 13:04
@hvan hvan marked this pull request as ready for review August 27, 2025 13:07
@hvan hvan force-pushed the hvan-healthcheck branch from 2516cd0 to 57a210a Compare August 27, 2025 13:08
@hvan hvan changed the title Add liveness/readiness probe for Kraft controllers liveness/readiness probe for Kraft controllers Aug 28, 2025
@hvan hvan requested a review from dobrerazvan September 4, 2025 15:48
@hvan hvan merged commit c065174 into master Sep 9, 2025
8 checks passed
@hvan hvan deleted the hvan-healthcheck branch September 9, 2025 20:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants