Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@
* [ENHANCEMENT] OTLP: Add experimental metric `cortex_distributor_otlp_array_lengths` to better understand the layout of OTLP packets in practice. #13525
* [ENHANCEMENT] Ruler: gRPC errors without details are classified as `operator` errors, and rule evaluation failures (such as duplicate labelsets) are classified as `user` errors. #13586
* [ENHANCEMENT] Server: The `/metrics` endpoint now supports metrics filtering by providing one or more `name[]` query parameters. #13746
* [ENHANCEMENT] Bucket storage: Add support for GCS rate limiting with exponential ramping following Google Cloud Storage best practices. Enable upload rate limiting with `-gcs.upload-rate-limit-enabled` and configure with `-gcs.upload-initial-qps`, `-gcs.upload-max-qps`, and `-gcs.upload-ramp-period`. Enable read rate limiting with `-gcs.read-rate-limit-enabled` and configure with `-gcs.read-initial-qps`, `-gcs.read-max-qps`, and `-gcs.read-ramp-period`. #13703
* [BUGFIX] Compactor: Fix potential concurrent map writes. #13053
* [BUGFIX] Query-frontend: Fix issue where queries sometimes fail with `failed to receive query result stream message: rpc error: code = Canceled desc = context canceled` if remote execution is enabled. #13084
* [BUGFIX] Query-frontend: Fix issue where query stats, such as series read, did not include the parameters to the `histogram_quantile` and `histogram_fraction` functions if remote execution was enabled. #13084
Expand Down
352 changes: 352 additions & 0 deletions cmd/mimir/config-descriptor.json

Large diffs are not rendered by default.

64 changes: 64 additions & 0 deletions cmd/mimir/help-all.txt.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -73,10 +73,26 @@ Usage of ./cmd/mimir/mimir:
Maximum number of idle (keep-alive) connections to keep per-host. Set to 0 to use a built-in default value of 2. (default 100)
-alertmanager-storage.gcs.max-retries int
Maximum number of attempts for GCS operations (0 = unlimited, 1 = no retries). Applies to both regular and upload retry modes. (default 20)
-alertmanager-storage.gcs.read-initial-qps int
Initial queries per second limit for GCS reads. The rate doubles every ramp period until it reaches the maximum. (default 5000)
-alertmanager-storage.gcs.read-max-qps int
Maximum queries per second limit for GCS reads. (default 16000)
-alertmanager-storage.gcs.read-ramp-period duration
Time period over which the read rate doubles, following the Google recommendation. (default 20m0s)
-alertmanager-storage.gcs.read-rate-limit-enabled
Enable rate limiting for GCS reads. When enabled, reads gradually ramp up following Google Cloud Storage best practices.
-alertmanager-storage.gcs.service-account string
JSON either from a Google Developers Console client_credentials.json file, or a Google Developers service account key. Needs to be valid JSON, not a filesystem path.
-alertmanager-storage.gcs.tls-handshake-timeout duration
Maximum time to wait for a TLS handshake. Set to 0 for no limit. (default 10s)
-alertmanager-storage.gcs.upload-initial-qps int
Initial queries per second limit for GCS uploads. The rate doubles every ramp period until it reaches the maximum. (default 1000)
-alertmanager-storage.gcs.upload-max-qps int
Maximum queries per second limit for GCS uploads. (default 3200)
-alertmanager-storage.gcs.upload-ramp-period duration
Time period over which the upload rate doubles, following the Google recommendation. (default 20m0s)
-alertmanager-storage.gcs.upload-rate-limit-enabled
Enable rate limiting for GCS uploads. When enabled, uploads gradually ramp up following Google Cloud Storage best practices.
-alertmanager-storage.local.path string
Path at which alertmanager configurations are stored.
-alertmanager-storage.s3.access-key-id string
Expand Down Expand Up @@ -657,10 +673,26 @@ Usage of ./cmd/mimir/mimir:
Maximum number of idle (keep-alive) connections to keep per-host. Set to 0 to use a built-in default value of 2. (default 100)
-blocks-storage.gcs.max-retries int
Maximum number of attempts for GCS operations (0 = unlimited, 1 = no retries). Applies to both regular and upload retry modes. (default 20)
-blocks-storage.gcs.read-initial-qps int
Initial queries per second limit for GCS reads. The rate doubles every ramp period until it reaches the maximum. (default 5000)
-blocks-storage.gcs.read-max-qps int
Maximum queries per second limit for GCS reads. (default 16000)
-blocks-storage.gcs.read-ramp-period duration
Time period over which the read rate doubles, following the Google recommendation. (default 20m0s)
-blocks-storage.gcs.read-rate-limit-enabled
Enable rate limiting for GCS reads. When enabled, reads gradually ramp up following Google Cloud Storage best practices.
-blocks-storage.gcs.service-account string
JSON either from a Google Developers Console client_credentials.json file, or a Google Developers service account key. Needs to be valid JSON, not a filesystem path.
-blocks-storage.gcs.tls-handshake-timeout duration
Maximum time to wait for a TLS handshake. Set to 0 for no limit. (default 10s)
-blocks-storage.gcs.upload-initial-qps int
Initial queries per second limit for GCS uploads. The rate doubles every ramp period until it reaches the maximum. (default 1000)
-blocks-storage.gcs.upload-max-qps int
Maximum queries per second limit for GCS uploads. (default 3200)
-blocks-storage.gcs.upload-ramp-period duration
Time period over which the upload rate doubles, following the Google recommendation. (default 20m0s)
-blocks-storage.gcs.upload-rate-limit-enabled
Enable rate limiting for GCS uploads. When enabled, uploads gradually ramp up following Google Cloud Storage best practices.
-blocks-storage.s3.access-key-id string
S3 access key ID
-blocks-storage.s3.bucket-lookup-type value
Expand Down Expand Up @@ -927,10 +959,26 @@ Usage of ./cmd/mimir/mimir:
Maximum number of idle (keep-alive) connections to keep per-host. Set to 0 to use a built-in default value of 2. (default 100)
-common.storage.gcs.max-retries int
Maximum number of attempts for GCS operations (0 = unlimited, 1 = no retries). Applies to both regular and upload retry modes. (default 20)
-common.storage.gcs.read-initial-qps int
Initial queries per second limit for GCS reads. The rate doubles every ramp period until it reaches the maximum. (default 5000)
-common.storage.gcs.read-max-qps int
Maximum queries per second limit for GCS reads. (default 16000)
-common.storage.gcs.read-ramp-period duration
Time period over which the read rate doubles, following the Google recommendation. (default 20m0s)
-common.storage.gcs.read-rate-limit-enabled
Enable rate limiting for GCS reads. When enabled, reads gradually ramp up following Google Cloud Storage best practices.
-common.storage.gcs.service-account string
JSON either from a Google Developers Console client_credentials.json file, or a Google Developers service account key. Needs to be valid JSON, not a filesystem path.
-common.storage.gcs.tls-handshake-timeout duration
Maximum time to wait for a TLS handshake. Set to 0 for no limit. (default 10s)
-common.storage.gcs.upload-initial-qps int
Initial queries per second limit for GCS uploads. The rate doubles every ramp period until it reaches the maximum. (default 1000)
-common.storage.gcs.upload-max-qps int
Maximum queries per second limit for GCS uploads. (default 3200)
-common.storage.gcs.upload-ramp-period duration
Time period over which the upload rate doubles, following the Google recommendation. (default 20m0s)
-common.storage.gcs.upload-rate-limit-enabled
Enable rate limiting for GCS uploads. When enabled, uploads gradually ramp up following Google Cloud Storage best practices.
-common.storage.s3.access-key-id string
S3 access key ID
-common.storage.s3.bucket-lookup-type value
Expand Down Expand Up @@ -2753,10 +2801,26 @@ Usage of ./cmd/mimir/mimir:
Maximum number of idle (keep-alive) connections to keep per-host. Set to 0 to use a built-in default value of 2. (default 100)
-ruler-storage.gcs.max-retries int
Maximum number of attempts for GCS operations (0 = unlimited, 1 = no retries). Applies to both regular and upload retry modes. (default 20)
-ruler-storage.gcs.read-initial-qps int
Initial queries per second limit for GCS reads. The rate doubles every ramp period until it reaches the maximum. (default 5000)
-ruler-storage.gcs.read-max-qps int
Maximum queries per second limit for GCS reads. (default 16000)
-ruler-storage.gcs.read-ramp-period duration
Time period over which the read rate doubles, following the Google recommendation. (default 20m0s)
-ruler-storage.gcs.read-rate-limit-enabled
Enable rate limiting for GCS reads. When enabled, reads gradually ramp up following Google Cloud Storage best practices.
-ruler-storage.gcs.service-account string
JSON either from a Google Developers Console client_credentials.json file, or a Google Developers service account key. Needs to be valid JSON, not a filesystem path.
-ruler-storage.gcs.tls-handshake-timeout duration
Maximum time to wait for a TLS handshake. Set to 0 for no limit. (default 10s)
-ruler-storage.gcs.upload-initial-qps int
Initial queries per second limit for GCS uploads. The rate doubles every ramp period until it reaches the maximum. (default 1000)
-ruler-storage.gcs.upload-max-qps int
Maximum queries per second limit for GCS uploads. (default 3200)
-ruler-storage.gcs.upload-ramp-period duration
Time period over which the upload rate doubles, following the Google recommendation. (default 20m0s)
-ruler-storage.gcs.upload-rate-limit-enabled
Enable rate limiting for GCS uploads. When enabled, uploads gradually ramp up following Google Cloud Storage best practices.
-ruler-storage.local.directory string
Directory to scan for rules
-ruler-storage.s3.access-key-id string
Expand Down
38 changes: 38 additions & 0 deletions docs/sources/mimir/configure/configuration-parameters/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -6014,6 +6014,44 @@ The gcs_backend block configures the connection to Google Cloud Storage object s
# CLI flag: -<prefix>.gcs.max-retries
[max_retries: <int> | default = 20]
# (advanced) Enable rate limiting for GCS uploads. When enabled, uploads
# gradually ramp up following Google Cloud Storage best practices.
# CLI flag: -<prefix>.gcs.upload-rate-limit-enabled
[upload_rate_limit_enabled: <boolean> | default = false]
# (advanced) Initial queries per second limit for GCS uploads. The rate doubles
# every ramp period until it reaches the maximum.
# CLI flag: -<prefix>.gcs.upload-initial-qps
[upload_initial_qps: <int> | default = 1000]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the Google best practices I’m reading:

If your request rate is expected to go over these thresholds, you should start with a request rate below or near the thresholds and then gradually increase the rate, no faster than doubling over a period of 20 minutes.

Here we have the default set to the maximum value for both uploads (1000) and reads (5000). Should we instead start with values closer to the recommended limits rather than using the maximum? I’m thinking we might run into rate-limiting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't recommended limits respectively 1000 and 5000? I don't know of any others. Also, when I was speaking with Google Cloud support, they were just concerned about going above said limits. Additionally, we won't in practice attain exact numbers for these limits since we have to approximate them by dividing by the expected number of replicas. All in all, I wouldn't worry, especially as the limiter automatically backs off if it receives a rate limiting error.

# (advanced) Maximum queries per second limit for GCS uploads.
# CLI flag: -<prefix>.gcs.upload-max-qps
[upload_max_qps: <int> | default = 3200]
# (advanced) Time period over which the upload rate doubles, following the
# Google recommendation.
# CLI flag: -<prefix>.gcs.upload-ramp-period
[upload_ramp_period: <duration> | default = 20m]
# (advanced) Enable rate limiting for GCS reads. When enabled, reads gradually
# ramp up following Google Cloud Storage best practices.
# CLI flag: -<prefix>.gcs.read-rate-limit-enabled
[read_rate_limit_enabled: <boolean> | default = false]
# (advanced) Initial queries per second limit for GCS reads. The rate doubles
# every ramp period until it reaches the maximum.
# CLI flag: -<prefix>.gcs.read-initial-qps
[read_initial_qps: <int> | default = 5000]
# (advanced) Maximum queries per second limit for GCS reads.
# CLI flag: -<prefix>.gcs.read-max-qps
[read_max_qps: <int> | default = 16000]
# (advanced) Time period over which the read rate doubles, following the Google
# recommendation.
# CLI flag: -<prefix>.gcs.read-ramp-period
[read_ramp_period: <duration> | default = 20m]
http:
# (advanced) The time an idle connection remains idle before closing.
# CLI flag: -<prefix>.gcs.http.idle-conn-timeout
Expand Down
24 changes: 24 additions & 0 deletions operations/mimir/mimir-flags-defaults.json
Original file line number Diff line number Diff line change
Expand Up @@ -625,6 +625,12 @@
"blocks-storage.gcs.bucket-name": "",
"blocks-storage.gcs.service-account": "",
"blocks-storage.gcs.max-retries": 20,
"blocks-storage.gcs.upload-initial-qps": 1000,
"blocks-storage.gcs.upload-max-qps": 3200,
"blocks-storage.gcs.upload-ramp-period": 1200000000000,
"blocks-storage.gcs.read-initial-qps": 5000,
"blocks-storage.gcs.read-max-qps": 16000,
"blocks-storage.gcs.read-ramp-period": 1200000000000,
"blocks-storage.gcs.http.idle-conn-timeout": 90000000000,
"blocks-storage.gcs.http.response-header-timeout": 120000000000,
"blocks-storage.gcs.tls-handshake-timeout": 10000000000,
Expand Down Expand Up @@ -1037,6 +1043,12 @@
"ruler-storage.gcs.bucket-name": "",
"ruler-storage.gcs.service-account": "",
"ruler-storage.gcs.max-retries": 20,
"ruler-storage.gcs.upload-initial-qps": 1000,
"ruler-storage.gcs.upload-max-qps": 3200,
"ruler-storage.gcs.upload-ramp-period": 1200000000000,
"ruler-storage.gcs.read-initial-qps": 5000,
"ruler-storage.gcs.read-max-qps": 16000,
"ruler-storage.gcs.read-ramp-period": 1200000000000,
"ruler-storage.gcs.http.idle-conn-timeout": 90000000000,
"ruler-storage.gcs.http.response-header-timeout": 120000000000,
"ruler-storage.gcs.tls-handshake-timeout": 10000000000,
Expand Down Expand Up @@ -1213,6 +1225,12 @@
"alertmanager-storage.gcs.bucket-name": "",
"alertmanager-storage.gcs.service-account": "",
"alertmanager-storage.gcs.max-retries": 20,
"alertmanager-storage.gcs.upload-initial-qps": 1000,
"alertmanager-storage.gcs.upload-max-qps": 3200,
"alertmanager-storage.gcs.upload-ramp-period": 1200000000000,
"alertmanager-storage.gcs.read-initial-qps": 5000,
"alertmanager-storage.gcs.read-max-qps": 16000,
"alertmanager-storage.gcs.read-ramp-period": 1200000000000,
"alertmanager-storage.gcs.http.idle-conn-timeout": 90000000000,
"alertmanager-storage.gcs.http.response-header-timeout": 120000000000,
"alertmanager-storage.gcs.tls-handshake-timeout": 10000000000,
Expand Down Expand Up @@ -1423,6 +1441,12 @@
"common.storage.gcs.bucket-name": "",
"common.storage.gcs.service-account": "",
"common.storage.gcs.max-retries": 20,
"common.storage.gcs.upload-initial-qps": 1000,
"common.storage.gcs.upload-max-qps": 3200,
"common.storage.gcs.upload-ramp-period": 1200000000000,
"common.storage.gcs.read-initial-qps": 5000,
"common.storage.gcs.read-max-qps": 16000,
"common.storage.gcs.read-ramp-period": 1200000000000,
"common.storage.gcs.http.idle-conn-timeout": 90000000000,
"common.storage.gcs.http.response-header-timeout": 120000000000,
"common.storage.gcs.tls-handshake-timeout": 10000000000,
Expand Down
8 changes: 7 additions & 1 deletion pkg/storage/bucket/client.go
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,12 @@ func (cfg *StorageBackendConfig) Validate() error {
}
}

if cfg.Backend == GCS {
if err := cfg.GCS.Validate(); err != nil {
return err
}
}

return nil
}

Expand Down Expand Up @@ -170,7 +176,7 @@ func NewClient(ctx context.Context, cfg Config, name string, logger log.Logger,
case S3:
backendClient, err = s3.NewBucketClient(cfg.S3, name, logger)
case GCS:
backendClient, err = gcs.NewBucketClient(ctx, cfg.GCS, name, logger)
backendClient, err = gcs.NewBucketClient(ctx, cfg.GCS, name, logger, reg)
case Azure:
backendClient, err = azure.NewBucketClient(cfg.Azure, name, logger)
case Swift:
Expand Down
Loading
Loading