Skip to content

[r382] blockbuilder: fix how kafka reader metrics are instantiated#14384

Open
mimir-github-bot[bot] wants to merge 1 commit intor382from
backport-14383-to-r382
Open

[r382] blockbuilder: fix how kafka reader metrics are instantiated#14384
mimir-github-bot[bot] wants to merge 1 commit intor382from
backport-14383-to-r382

Conversation

@mimir-github-bot
Copy link
Contributor

@mimir-github-bot mimir-github-bot bot commented Feb 16, 2026

Backport 5cda4ce from #14383


Note

Medium Risk
Touches Kafka reader metrics wiring and lifecycle (start/stop) plus adds concurrency protection around a shared metrics source, which could affect metric reporting or introduce subtle runtime issues if mis-handled.

Overview
Fixes how Kafka reader metrics are instantiated and managed for the block-builder when concurrent fetching is enabled.

BlockBuilder now starts/stops the ingest.ReaderMetrics service with the component lifecycle, registers its metrics under a component="block-builder" label, and uses an explicit kprom metrics instance rather than implicit/nil defaults. The swappable metrics source used by concurrent fetchers is reworked to be thread-safe (RW mutex + nil-safe getters) instead of swapping an embedded interface.

Updates ingest tests and PartitionReader to always create and pass kprom metrics into NewReaderMetrics, and adjusts tests to initialize kprom directly via OnNewClient.

Written by Cursor Bugbot for commit c8570a6. This will update automatically on new commits. Configure here.

#### What this PR does

This PR improves how the block-builder instantiates its copy of the
Kafka reader metrics. That is, the `ingest.ReaderMetrics` is a service,
that internally pulls the observations from its underlying metrics
source. Because the block-builder never starts the service, some metrics
— e.g. `cortex_ingest_storage_reader_estimated_bytes_per_record` — are
never populated. The PR addresses that with the following changes:

1. make sure the source inside the `swappableReaderMetricsSource` is
safe to be observed concurrently
2. make the block-builder to start / stop the `ReaderMetrics` service

Also, fixed a small akwardness in the NewReaderMetrics, where the
`kpromMetrics` argument could be `nil`.

#### Which issue(s) this PR fixes or relates to

Relates to https://github.com/grafana/mimir-squad/issues/3529

---------

Signed-off-by: Vladimir Varankin <vladimir.varankin@grafana.com>
(cherry picked from commit 5cda4ce)
@mimir-github-bot mimir-github-bot bot requested a review from a team as a code owner February 16, 2026 17:08
@mimir-github-bot mimir-github-bot bot requested a review from narqo February 16, 2026 17:08
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is ON. A Cloud Agent has been kicked off to fix the reported issues.

readerMetricsSource,
cfg.Kafka.Topic,
kpm,
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate component label causes panic at registration

High Severity

The block builder wraps the registerer with prometheus.Labels{"component": "block-builder"}, but NewReaderMetrics internally calls NewStrongReadConsistencyInstrumentation with the hardcoded const component = "partition-reader", which applies ConstLabels: map[string]string{"component": component} to each metric. When registering these metrics through the wrapped registerer, wrapDesc detects the duplicate component label name and produces an error Desc, causing promauto's MustRegister to panic. This crashes the block builder at construction time whenever FetchConcurrencyMax > 0.

Additional Locations (1)

Fix in Cursor Fix in Web

}

if b.schedulerConn != nil {
return b.schedulerConn.Close()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaked readerMetrics service if kafka client creation fails

Low Severity

If readerMetrics is successfully started (line 163–168) but NewKafkaReaderClient fails (line 170–177), starting returns an error. Per dskit's BasicService contract, stopping is not called when starting returns an error, so the running readerMetrics TimerService goroutine is never stopped. This is a resource/goroutine leak on startup failure.

Fix in Cursor Fix in Web

@cursor
Copy link

cursor bot commented Feb 16, 2026

Bugbot Autofix prepared fixes for 2 of the 2 bugs found in the latest run.

  • ✅ Fixed: Duplicate component label causes panic at registration
    • Removed the WrapRegistererWith call that conflicted with ConstLabels in NewStrongReadConsistencyInstrumentation, which already sets the component label internally.
  • ✅ Fixed: Leaked readerMetrics service if kafka client creation fails
    • Added deferred cleanup in starting() to stop readerMetrics if an error occurs after it has been started, since dskit's BasicService does not call stopping when starting returns an error.

Create PR

Or push these changes by commenting:

@cursor push 1cdb635d50
Preview (1cdb635d50)
diff --git a/pkg/blockbuilder/blockbuilder.go b/pkg/blockbuilder/blockbuilder.go
--- a/pkg/blockbuilder/blockbuilder.go
+++ b/pkg/blockbuilder/blockbuilder.go
@@ -79,12 +79,7 @@
 
 	var readerMetrics *ingest.ReaderMetrics
 	if cfg.Kafka.FetchConcurrencyMax > 0 {
-		m := ingest.NewReaderMetrics(
-			prometheus.WrapRegistererWith(prometheus.Labels{"component": "block-builder"}, reg),
-			readerMetricsSource,
-			cfg.Kafka.Topic,
-			kpm,
-		)
+		m := ingest.NewReaderMetrics(reg, readerMetricsSource, cfg.Kafka.Topic, kpm)
 		readerMetrics = &m
 	}
 
@@ -167,6 +162,14 @@
 		}
 	}
 
+	// Ensure readerMetrics is stopped if starting fails after this point.
+	// dskit's BasicService does not call stopping when starting returns an error.
+	defer func() {
+		if err != nil && b.readerMetrics != nil {
+			_ = services.StopAndAwaitTerminated(context.Background(), b.readerMetrics)
+		}
+	}()
+
 	b.kafkaClient, err = ingest.NewKafkaReaderClient(
 		b.cfg.Kafka,
 		b.kpromMetrics,

@narqo
Copy link
Contributor

narqo commented Feb 16, 2026

Will have to add the changes from #14385. Will amend the backport after testing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments