Skip to content

chores: reduce amount of reconciles, cache CH connections between reconciles#245

Open
GrigoryPervakov wants to merge 1 commit into
mainfrom
perf/reconcile-efficiency
Open

chores: reduce amount of reconciles, cache CH connections between reconciles#245
GrigoryPervakov wants to merge 1 commit into
mainfrom
perf/reconcile-efficiency

Conversation

@GrigoryPervakov

Copy link
Copy Markdown
Member

Why

In transient states, the operator does a lot of reconciling, generating useless load on the apiserver and constantly reconnect all nodes.

What

  • Increase most reconcile requests 1s->5s
  • Add a ClickHouse connections pool

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces unnecessary reconcile churn in transient states and avoids repeatedly reconnecting to ClickHouse nodes by (1) slowing down common poll/requeue loops and (2) introducing a per-cluster ClickHouse connection cache that survives across reconciles and is cleaned up on manager shutdown.

Changes:

  • Introduce a per-ClickHouseCluster connection pool (connCache) and wire it through the ClickHouse controller/commander, including shutdown cleanup.
  • Increase several requeue/poll intervals from 1s to 5s by adding RequeueProbePoll and using it in multiple “wait/poll” paths.
  • Update integration tests to use the new dialer + connection cache flow.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
internal/controller/resourcemanager.go Switch replica-resource requeues to RequeueProbePoll to reduce apiserver churn.
internal/controller/keeper/sync.go Slow down requeueing when cluster is not stable / waiting on replica readiness.
internal/controller/constants.go Add RequeueProbePoll = 5s constant.
internal/controller/clickhouse/sync.go Wire connection cache into commander creation; use RequeueProbePoll for blocked/requeue steps.
internal/controller/clickhouse/controller.go Create/own the connection cache, evict on cluster deletion, and close on manager shutdown.
internal/controller/clickhouse/conncache.go New per-cluster connection cache implementation with credential-hash invalidation.
internal/controller/clickhouse/commands.go Commander now uses cached connections rather than per-reconcile connection maps.
internal/controller/clickhouse/commands_test.go Adapt integration test to use dialer routing and cached connections.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +62 to 69
func newCommander(log controllerutil.Logger, cluster *v1.ClickHouseCluster, secret *corev1.Secret, dialer controllerutil.DialContextFunc, cache *connCache) *commander {
log = log.Named("commander")
credHash, _ := controllerutil.DeepHashObject(secret.Data[SecretKeyManagementPassword])

return &commander{
log: log.Named("commander"),
conns: map[v1.ClickHouseReplicaID]clickhouse.Conn{},
log: log,
entry: cache.Get(cluster.NamespacedName(), credHash, log),
cluster: cluster,
Comment on lines 75 to 80
if errors.IsNotFound(err) {
cc.Logger.Info("clickhouse cluster not found")
cc.connCache.Evict(req.NamespacedName, cc.Logger)

return ctrl.Result{}, nil
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants