Skip to content

KeeperCluster StatefulSet infinite restart loop due to K8s API server default field drift #131

@brightsparc

Description

@brightsparc

Company or project name

Introspection (introspection.dev) — AI observability platform using self-hosted ClickHouse via the operator.

Describe what's wrong

The KeeperCluster reconciler enters an infinite restart loop because the desired StatefulSet spec omits fields that the Kubernetes API server fills with defaults. On each reconcile cycle, the operator detects a diff between the desired state (nil/zero values) and the actual state (K8s-defaulted values), concludes config has changed, and force-restarts the keeper pod via the kubectl.kubernetes.io/restartedAt annotation. This annotation change itself creates a new diff on the next reconcile, making the loop self-reinforcing.

Does it reproduce on the most recent release?

Yes

How to reproduce

Keeper pod is killed and recreated every ~10-30 seconds indefinitely.

The operator logs show:

  INFO  keeper  forcing Pod restart, because of config changes
  INFO  keeper  updating replica StatefulSet

Expected behavior

Once the keeper is running and config hasn't actually changed, the pod should remain stable.

Error message and/or stacktrace

The operator's templateStatefulSet() and templatePodSpec() functions build a desired spec with several fields left as nil/zero:

Field Desired (operator) Actual (K8s-defaulted)
spec.template.spec.terminationGracePeriodSeconds nil 30
spec.template.spec.schedulerName "" "default-scheduler"
spec.template.spec.securityContext nil {}
spec.updateStrategy.rollingUpdate.partition nil 0
spec.updateStrategy.rollingUpdate.maxUnavailable nil 1
spec.persistentVolumeClaimRetentionPolicy nil {whenDeleted: Retain, whenScaled: Retain}
Liveness probe successThreshold 0 1

When DeepHashObject() hashes the desired spec vs the actual spec from K8s, the hashes never match. This triggers the config-change detection at resources.go:371, which sets restartedAt to time.Now(), which changes the pod template hash, which triggers another update on the next reconcile.

Additional context

Affected files

  • internal/controller/keeper/templates.gotemplateStatefulSet() and templatePodSpec()
  • internal/controller/clickhouse/templates.go — same functions (same pattern)
  • internal/controller/constants.goDefaultLivenessProbeSettings missing SuccessThreshold: 1
  • internal/controller/resources.goReconcileReplicaResources() where the diff is detected

Suggested fix

Explicitly set K8s-defaulted fields in the desired spec so they match what the API server returns:

  1. Set SuccessThreshold: 1 on DefaultLivenessProbeSettings
  2. In templatePodSpec(), default terminationGracePeriodSeconds to 30, schedulerName to "default-scheduler", and securityContext to &PodSecurityContext{}
  3. In templateStatefulSet(), set RollingUpdate.Partition to 0, RollingUpdate.MaxUnavailable to 1, and PersistentVolumeClaimRetentionPolicy to Retain/Retain

Metadata

Metadata

Labels

potential bugTo be reviewed by developers and confirmed/rejected.

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions