Bucket store: Support GCS rate limiting #13703

aknuds1 · 2025-11-28T17:26:10Z

What this PR does

Augment bucket store subsystem with support for GCS rate limiting, as per Google's best practices. There are separate configurations for respectively upload and read rate limiting, because of different GCS guidelines for each.

A remaining question is whether to divide initial and max QPS by the expected number of replicas at deployment time.

Which issue(s) this PR fixes or relates to

Checklist

Tests updated.
Documentation added.
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]. If changelog entry is not needed, please add the changelog-not-needed label to the PR.
about-versioning.md updated with experimental features.

Note

Adds GCS upload/read rate limiting with exponential ramp-up, wiring it into the bucket client, plus new flags, docs, defaults, metrics, validation, and tests.

Enhancement:
- Add GCS request rate limiting (uploads and reads) with exponential ramp-up following Google best practices.
Storage/GCS:
- Wrap GCS bucket client with retry and new rate-limiting layer; expose Prometheus metrics and accept a prometheus.Registerer.
- Validate GCS config on use.
Configuration & Flags:
- New flags and config fields: *-gcs.upload-rate-limit-enabled, *-gcs.upload-initial-qps, *-gcs.upload-max-qps, *-gcs.upload-ramp-period, *-gcs.read-rate-limit-enabled, *-gcs.read-initial-qps, *-gcs.read-max-qps, *-gcs.read-ramp-period across blocks-storage, ruler-storage, alertmanager-storage, and common.storage.
- Update defaults (operations/mimir/mimir-flags-defaults.json) and help output (help-all.txt.tmpl), and docs (configuration-parameters/index.md).
Code Changes:
- Change gcs.NewBucketClient and its caller to pass a Registerer; add rate limiter implementation (rate_limiter.go).
- Add config validation for rate-limiting parameters.
Tests:
- Add comprehensive unit tests for rate limiter and rate-limited bucket behavior.
Changelog:
- Document new GCS rate limiting support and flags.

^{Written by Cursor Bugbot for commit 48d19a3. This will update automatically on new commits. Configure here.}

github-actions · 2025-11-28T17:27:52Z

💻 Deploy preview available (WIP: Bucket store: Support GCS rate limiting):

github-actions · 2025-12-01T15:38:16Z

💻 Deploy preview available (Bucket store: Support GCS rate limiting):

cursor · 2025-12-01T17:06:14Z

pkg/storage/bucket/gcs/rate_limiter.go

+			ConstLabels: constLabels,
+		}, []string{"allowed"})
+		rl.currentQPSGauge.Set(float64(startQPS))
+	}


Bug: Rate limiter metrics lack component labels causing duplicate registration

The rate limiter metrics (cortex_gcs_rate_limited_seconds_total, cortex_gcs_current_qps, cortex_gcs_requests_total) are registered with only an operation label but no component/bucket name differentiation. In NewClient, the registerer is passed directly to gcs.NewBucketClient before being wrapped with prometheus.WrapRegistererWith(prometheus.Labels{"component": name}, reg) in bucketWithMetrics. This means if multiple GCS bucket clients with rate limiting enabled are created in the same process (e.g., compactor and store-gateway in monolithic mode), the duplicate metric registration will cause a panic at startup. The name parameter is passed to NewBucketClient but not used in the rate limiter metrics.

Additional Locations (1)

pkg/storage/bucket/gcs/bucket_client.go#L55-L64

Thanks Cursor, I believe it's now resolved.

NickAnge · 2025-12-02T10:03:03Z

docs/sources/mimir/configure/configuration-parameters/index.md

+# (advanced) Initial queries per second limit for GCS uploads. The rate doubles
+# every ramp period until it reaches the maximum.
+# CLI flag: -<prefix>.gcs.upload-initial-qps
+[upload_initial_qps: <int> | default = 1000]


At the Google best practices I’m reading:

If your request rate is expected to go over these thresholds, you should start with a request rate below or near the thresholds and then gradually increase the rate, no faster than doubling over a period of 20 minutes.

Here we have the default set to the maximum value for both uploads (1000) and reads (5000). Should we instead start with values closer to the recommended limits rather than using the maximum? I’m thinking we might run into rate-limiting

Aren't recommended limits respectively 1000 and 5000? I don't know of any others. Also, when I was speaking with Google Cloud support, they were just concerned about going above said limits. Additionally, we won't in practice attain exact numbers for these limits since we have to approximate them by dividing by the expected number of replicas. All in all, I wouldn't worry, especially as the limiter automatically backs off if it receives a rate limiting error.

NickAnge · 2025-12-02T10:45:31Z

pkg/storage/bucket/gcs/rate_limiter.go

+	default:
+		panic(fmt.Errorf("unrecognized rateLimiterMode %v", mode))
+	}
+	startQPS := min(initialQPS, maxQPS)


I wonder if we should not allow having maxQps < initialQps instead of getting the min here. It should be possible with the current config to set:

initialQps: 10

maxQps: 5
Is it something that we should allow ? Or at least recommend that maxQps should be higher ?

In practice one shouldn't have max QPS lower than initial QPS, this is just a guard. It doesn't really matter to me whether we return a validation error instead.

NickAnge · 2025-12-02T10:46:36Z

pkg/storage/bucket/gcs/rate_limiter.go

+	if reg != nil {
+		constLabels := prometheus.Labels{"name": name, "operation": operation}
+		rl.rateLimitedSeconds = promauto.With(reg).NewCounter(prometheus.CounterOpts{
+			Name:        "cortex_gcs_rate_limited_seconds_total",


Why do we use a counter for measuring seconds , and not a a histogram ? Just curious.

I haven't given it much thought yet. The PR is still in an early state. Do you have specific arguments for using a histogram instead?

NickAnge · 2025-12-02T10:50:04Z

pkg/storage/bucket/gcs/rate_limiter.go

+	if newQPS != rl.currentQPS {
+		rl.currentQPS = newQPS
+		rl.limiter.SetLimit(rate.Limit(newQPS))
+		rl.limiter.SetBurst(newQPS * 2)


I think this can be higher than the maxQps 🤔 , which can cause rate limit , no ?

Above we set newQPS = rl.maxQPS

tacole02

Docs look good! I left a few minor suggestions. Thank you!

CHANGELOG.md

cmd/mimir/help-all.txt.tmpl

Signed-off-by: Arve Knudsen <[email protected]>

Co-authored-by: Taylor C <[email protected]>

Co-authored-by: Taylor C <[email protected]> Signed-off-by: Arve Knudsen <[email protected]>

Signed-off-by: Arve Knudsen <[email protected]>

tacole02

Docs look good! Thank you!

aknuds1 force-pushed the arve/gcs-rate-limiter branch 10 times, most recently from 268072a to c0249fa Compare December 1, 2025 15:28

aknuds1 added the enhancement New feature or request label Dec 1, 2025

aknuds1 changed the title ~~WIP: Bucket store: Support GCS rate limiting~~ Bucket store: Support GCS rate limiting Dec 1, 2025

aknuds1 marked this pull request as ready for review December 1, 2025 15:29

aknuds1 requested review from a team and tacole02 as code owners December 1, 2025 15:29

aknuds1 added component/compactor component/ingester component/store-gateway component/block-builder component/alertmanager component/ruler component/querier labels Dec 1, 2025

aknuds1 force-pushed the arve/gcs-rate-limiter branch from c0249fa to 48950f5 Compare December 1, 2025 15:36

aknuds1 force-pushed the arve/gcs-rate-limiter branch 2 times, most recently from a23f386 to 48d19a3 Compare December 1, 2025 16:57

aknuds1 marked this pull request as draft December 1, 2025 16:58

cursor bot reviewed Dec 1, 2025

View reviewed changes

NickAnge reviewed Dec 2, 2025

View reviewed changes

tacole02 reviewed Dec 3, 2025

View reviewed changes

aknuds1 added 2 commits December 5, 2025 10:08

Bucket store: Add GCS rate limiting

d83ddeb

Signed-off-by: Arve Knudsen <[email protected]>

Introduce adaptive rate limiting

4fe3e06

Signed-off-by: Arve Knudsen <[email protected]>

aknuds1 force-pushed the arve/gcs-rate-limiter branch from d99a957 to 4fe3e06 Compare December 5, 2025 09:09

aknuds1 and others added 4 commits December 5, 2025 10:52

Update cmd/mimir/help-all.txt.tmpl

f84a613

Co-authored-by: Taylor C <[email protected]>

Update cmd/mimir/help-all.txt.tmpl

04e1e3a

Co-authored-by: Taylor C <[email protected]>

Update cmd/mimir/help-all.txt.tmpl

e9175b1

Co-authored-by: Taylor C <[email protected]>

Update cmd/mimir/help-all.txt.tmpl

054ea7a

Co-authored-by: Taylor C <[email protected]> Signed-off-by: Arve Knudsen <[email protected]>

aknuds1 force-pushed the arve/gcs-rate-limiter branch from 9e60897 to 054ea7a Compare December 5, 2025 10:09

Apply reviewer feedback

dc103fb

Signed-off-by: Arve Knudsen <[email protected]>

tacole02 reviewed Dec 5, 2025

View reviewed changes

Bucket store: Support GCS rate limiting #13703

Are you sure you want to change the base?

Bucket store: Support GCS rate limiting #13703

Conversation

aknuds1 commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does

Which issue(s) this PR fixes or relates to

Checklist

Uh oh!

github-actions bot commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor bot Dec 1, 2025

Choose a reason for hiding this comment

Bug: Rate limiter metrics lack component labels causing duplicate registration

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tacole02 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tacole02 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

aknuds1 commented Nov 28, 2025 •

edited

Loading

github-actions bot commented Nov 28, 2025 •

edited

Loading

github-actions bot commented Dec 1, 2025 •

edited

Loading