Skip to content

Commit aa5a823

Browse files
authored
Update the default configuration to enable ingest storage concurrency (#15072)
#### What this PR does Today, when talking to a community member, I realised ingest storage concurrency is still disabled in the default config. At Grafana Labs, we're running it in production since a long time, and we should just make it the default. #### Which issue(s) this PR fixes or relates to N/A #### Checklist - [ ] Tests updated. - [ ] Documentation added. - [x] `CHANGELOG.md` updated - the order of entries should be `[CHANGE]`, `[FEATURE]`, `[ENHANCEMENT]`, `[BUGFIX]`. If changelog entry is not needed, please add the `changelog-not-needed` label to the PR. - [ ] [`about-versioning.md`](https://github.com/grafana/mimir/blob/main/docs/sources/mimir/configure/about-versioning.md) updated with experimental features. --------- Signed-off-by: Marco Pracucci <marco@pracucci.com>
1 parent 2a55552 commit aa5a823

27 files changed

+122
-406
lines changed

CHANGELOG.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -186,6 +186,12 @@
186186
* [ENHANCEMENT] Distributor: OTLP endpoint now returns partial success (HTTP 200) instead of HTTP 429 when the usage tracker rejects some series due to the active series limit but other series are successfully ingested. The `RejectedDataPoints` field reports the count of distributor-side rejections (usage tracker filtering). #14789
187187
* [ENHANCEMENT] MQE: Account for memory consumption of labels returned by binary operations in query memory consumption estimate earlier. #15033
188188
* [ENHANCEMENT] Query-frontend: Log the number of series and samples returned for queries in `query stats` log lines. #15044
189+
* [ENHANCEMENT] Ingest storage: Update the default configuration to enable ingest storage concurrency: #15072
190+
* `-ingest-storage.kafka.fetch-concurrency-max` from `0` to `12`
191+
* `-ingest-storage.kafka.ingestion-concurrency-max` from `0` to `8`
192+
* `-ingest-storage.kafka.ingestion-concurrency-queue-capacity` from `5` to `3`
193+
* `-ingest-storage.kafka.ingestion-concurrency-target-flushes-per-shard` from `80` to `40`
194+
* `-ingest-storage.kafka.max-buffered-bytes` from `100MB` to `1GB`
189195
* [BUGFIX] Ingester: enforce a minimum 10s delay between TSDB head compaction iterations when an iteration approaches or exceeds the configured `-blocks-storage.tsdb.head-compaction-interval`, so ingestion is not starved by back-to-back compactions. #15061
190196
* [BUGFIX] Update to Go v1.25.9. #15030
191197
* [BUGFIX] Distributor: OTLP partial success responses now correctly populate `RejectedDataPoints` with the actual count of rejected samples, instead of always reporting 0. In classical architecture, this includes rejected samples propagated from the ingester. #14789

cmd/mimir/config-descriptor.json

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -9236,7 +9236,7 @@
92369236
"required": false,
92379237
"desc": "The maximum number of concurrent fetch requests that the ingester makes when reading data from Kafka during startup. Concurrent fetch requests are issued only when there is sufficient backlog of records to consume. Set to 0 to disable.",
92389238
"fieldValue": null,
9239-
"fieldDefaultValue": 0,
9239+
"fieldDefaultValue": 12,
92409240
"fieldFlag": "ingest-storage.kafka.fetch-concurrency-max",
92419241
"fieldType": "int"
92429242
},
@@ -9256,7 +9256,7 @@
92569256
"required": false,
92579257
"desc": "The maximum number of buffered records ready to be processed. This limit applies to the sum of all inflight requests. Set to 0 to disable the limit.",
92589258
"fieldValue": null,
9259-
"fieldDefaultValue": 100000000,
9259+
"fieldDefaultValue": 1000000000,
92609260
"fieldFlag": "ingest-storage.kafka.max-buffered-bytes",
92619261
"fieldType": "int"
92629262
},
@@ -9266,7 +9266,7 @@
92669266
"required": false,
92679267
"desc": "The maximum number of concurrent ingestion streams to the TSDB head. Every tenant has their own set of streams. 0 to disable.",
92689268
"fieldValue": null,
9269-
"fieldDefaultValue": 0,
9269+
"fieldDefaultValue": 8,
92709270
"fieldFlag": "ingest-storage.kafka.ingestion-concurrency-max",
92719271
"fieldType": "int"
92729272
},
@@ -9286,7 +9286,7 @@
92869286
"required": false,
92879287
"desc": "The number of batches to prepare and queue to ingest to the TSDB head. Only use this setting when -ingest-storage.kafka.ingestion-concurrency-max is greater than 0.",
92889288
"fieldValue": null,
9289-
"fieldDefaultValue": 5,
9289+
"fieldDefaultValue": 3,
92909290
"fieldFlag": "ingest-storage.kafka.ingestion-concurrency-queue-capacity",
92919291
"fieldType": "int"
92929292
},
@@ -9296,7 +9296,7 @@
92969296
"required": false,
92979297
"desc": "The expected number of times to ingest timeseries to the TSDB head after batching. With fewer flushes, the overhead of splitting up the work is higher than the benefit of parallelization. Only use this setting when -ingest-storage.kafka.ingestion-concurrency-max is greater than 0.",
92989298
"fieldValue": null,
9299-
"fieldDefaultValue": 80,
9299+
"fieldDefaultValue": 40,
93009300
"fieldFlag": "ingest-storage.kafka.ingestion-concurrency-target-flushes-per-shard",
93019301
"fieldType": "int"
93029302
},

cmd/mimir/help-all.txt.tmpl

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1480,25 +1480,25 @@ Usage of ./cmd/mimir/mimir:
14801480
-ingest-storage.kafka.dial-timeout duration
14811481
The maximum time allowed to open a connection to a Kafka broker. (default 2s)
14821482
-ingest-storage.kafka.fetch-concurrency-max int
1483-
The maximum number of concurrent fetch requests that the ingester makes when reading data from Kafka during startup. Concurrent fetch requests are issued only when there is sufficient backlog of records to consume. Set to 0 to disable.
1483+
The maximum number of concurrent fetch requests that the ingester makes when reading data from Kafka during startup. Concurrent fetch requests are issued only when there is sufficient backlog of records to consume. Set to 0 to disable. (default 12)
14841484
-ingest-storage.kafka.fetch-max-wait duration
14851485
The maximum amount of time a Kafka broker waits for some records before a Fetch response is returned. (default 5s)
14861486
-ingest-storage.kafka.ingestion-concurrency-batch-size int
14871487
The number of timeseries to batch together before ingesting to the TSDB head. Only use this setting when -ingest-storage.kafka.ingestion-concurrency-max is greater than 0. (default 150)
14881488
-ingest-storage.kafka.ingestion-concurrency-estimated-bytes-per-sample int
14891489
The estimated number of bytes a sample has at time of ingestion. This value is used to estimate the timeseries without decompressing them. Only use this setting when -ingest-storage.kafka.ingestion-concurrency-max is greater than 0. (default 500)
14901490
-ingest-storage.kafka.ingestion-concurrency-max int
1491-
The maximum number of concurrent ingestion streams to the TSDB head. Every tenant has their own set of streams. 0 to disable.
1491+
The maximum number of concurrent ingestion streams to the TSDB head. Every tenant has their own set of streams. 0 to disable. (default 8)
14921492
-ingest-storage.kafka.ingestion-concurrency-queue-capacity int
1493-
The number of batches to prepare and queue to ingest to the TSDB head. Only use this setting when -ingest-storage.kafka.ingestion-concurrency-max is greater than 0. (default 5)
1493+
The number of batches to prepare and queue to ingest to the TSDB head. Only use this setting when -ingest-storage.kafka.ingestion-concurrency-max is greater than 0. (default 3)
14941494
-ingest-storage.kafka.ingestion-concurrency-target-flushes-per-shard int
1495-
The expected number of times to ingest timeseries to the TSDB head after batching. With fewer flushes, the overhead of splitting up the work is higher than the benefit of parallelization. Only use this setting when -ingest-storage.kafka.ingestion-concurrency-max is greater than 0. (default 80)
1495+
The expected number of times to ingest timeseries to the TSDB head after batching. With fewer flushes, the overhead of splitting up the work is higher than the benefit of parallelization. Only use this setting when -ingest-storage.kafka.ingestion-concurrency-max is greater than 0. (default 40)
14961496
-ingest-storage.kafka.last-produced-offset-poll-interval duration
14971497
How frequently to poll the last produced offset, used to enforce strong read consistency. (default 1s)
14981498
-ingest-storage.kafka.last-produced-offset-retry-timeout duration
14991499
How long to retry a failed request to get the last produced offset. (default 10s)
15001500
-ingest-storage.kafka.max-buffered-bytes int
1501-
The maximum number of buffered records ready to be processed. This limit applies to the sum of all inflight requests. Set to 0 to disable the limit. (default 100000000)
1501+
The maximum number of buffered records ready to be processed. This limit applies to the sum of all inflight requests. Set to 0 to disable the limit. (default 1000000000)
15021502
-ingest-storage.kafka.max-consumer-lag-at-startup duration
15031503
The guaranteed maximum lag before a consumer is considered to have caught up reading from a partition at startup, becomes ACTIVE in the hash ring and passes the readiness check. Set both -ingest-storage.kafka.target-consumer-lag-at-startup and -ingest-storage.kafka.max-consumer-lag-at-startup to 0 to disable waiting for maximum consumer lag being honored at startup. (default 15s)
15041504
-ingest-storage.kafka.producer-max-buffered-bytes int

cmd/mimir/help.txt.tmpl

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -416,25 +416,25 @@ Usage of ./cmd/mimir/mimir:
416416
-ingest-storage.kafka.dial-timeout duration
417417
The maximum time allowed to open a connection to a Kafka broker. (default 2s)
418418
-ingest-storage.kafka.fetch-concurrency-max int
419-
The maximum number of concurrent fetch requests that the ingester makes when reading data from Kafka during startup. Concurrent fetch requests are issued only when there is sufficient backlog of records to consume. Set to 0 to disable.
419+
The maximum number of concurrent fetch requests that the ingester makes when reading data from Kafka during startup. Concurrent fetch requests are issued only when there is sufficient backlog of records to consume. Set to 0 to disable. (default 12)
420420
-ingest-storage.kafka.fetch-max-wait duration
421421
The maximum amount of time a Kafka broker waits for some records before a Fetch response is returned. (default 5s)
422422
-ingest-storage.kafka.ingestion-concurrency-batch-size int
423423
The number of timeseries to batch together before ingesting to the TSDB head. Only use this setting when -ingest-storage.kafka.ingestion-concurrency-max is greater than 0. (default 150)
424424
-ingest-storage.kafka.ingestion-concurrency-estimated-bytes-per-sample int
425425
The estimated number of bytes a sample has at time of ingestion. This value is used to estimate the timeseries without decompressing them. Only use this setting when -ingest-storage.kafka.ingestion-concurrency-max is greater than 0. (default 500)
426426
-ingest-storage.kafka.ingestion-concurrency-max int
427-
The maximum number of concurrent ingestion streams to the TSDB head. Every tenant has their own set of streams. 0 to disable.
427+
The maximum number of concurrent ingestion streams to the TSDB head. Every tenant has their own set of streams. 0 to disable. (default 8)
428428
-ingest-storage.kafka.ingestion-concurrency-queue-capacity int
429-
The number of batches to prepare and queue to ingest to the TSDB head. Only use this setting when -ingest-storage.kafka.ingestion-concurrency-max is greater than 0. (default 5)
429+
The number of batches to prepare and queue to ingest to the TSDB head. Only use this setting when -ingest-storage.kafka.ingestion-concurrency-max is greater than 0. (default 3)
430430
-ingest-storage.kafka.ingestion-concurrency-target-flushes-per-shard int
431-
The expected number of times to ingest timeseries to the TSDB head after batching. With fewer flushes, the overhead of splitting up the work is higher than the benefit of parallelization. Only use this setting when -ingest-storage.kafka.ingestion-concurrency-max is greater than 0. (default 80)
431+
The expected number of times to ingest timeseries to the TSDB head after batching. With fewer flushes, the overhead of splitting up the work is higher than the benefit of parallelization. Only use this setting when -ingest-storage.kafka.ingestion-concurrency-max is greater than 0. (default 40)
432432
-ingest-storage.kafka.last-produced-offset-poll-interval duration
433433
How frequently to poll the last produced offset, used to enforce strong read consistency. (default 1s)
434434
-ingest-storage.kafka.last-produced-offset-retry-timeout duration
435435
How long to retry a failed request to get the last produced offset. (default 10s)
436436
-ingest-storage.kafka.max-buffered-bytes int
437-
The maximum number of buffered records ready to be processed. This limit applies to the sum of all inflight requests. Set to 0 to disable the limit. (default 100000000)
437+
The maximum number of buffered records ready to be processed. This limit applies to the sum of all inflight requests. Set to 0 to disable the limit. (default 1000000000)
438438
-ingest-storage.kafka.max-consumer-lag-at-startup duration
439439
The guaranteed maximum lag before a consumer is considered to have caught up reading from a partition at startup, becomes ACTIVE in the hash ring and passes the readiness check. Set both -ingest-storage.kafka.target-consumer-lag-at-startup and -ingest-storage.kafka.max-consumer-lag-at-startup to 0 to disable waiting for maximum consumer lag being honored at startup. (default 15s)
440440
-ingest-storage.kafka.producer-max-buffered-bytes int

docs/sources/mimir/configure/configuration-parameters/index.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5450,7 +5450,7 @@ kafka:
54505450
# only when there is sufficient backlog of records to consume. Set to 0 to
54515451
# disable.
54525452
# CLI flag: -ingest-storage.kafka.fetch-concurrency-max
5453-
[fetch_concurrency_max: <int> | default = 0]
5453+
[fetch_concurrency_max: <int> | default = 12]
54545454
54555455
# When enabled, the fetch request MaxBytes field is computed using the
54565456
# compressed size of previous records. When disabled, MaxBytes is computed
@@ -5462,12 +5462,12 @@ kafka:
54625462
# The maximum number of buffered records ready to be processed. This limit
54635463
# applies to the sum of all inflight requests. Set to 0 to disable the limit.
54645464
# CLI flag: -ingest-storage.kafka.max-buffered-bytes
5465-
[max_buffered_bytes: <int> | default = 100000000]
5465+
[max_buffered_bytes: <int> | default = 1000000000]
54665466
54675467
# The maximum number of concurrent ingestion streams to the TSDB head. Every
54685468
# tenant has their own set of streams. 0 to disable.
54695469
# CLI flag: -ingest-storage.kafka.ingestion-concurrency-max
5470-
[ingestion_concurrency_max: <int> | default = 0]
5470+
[ingestion_concurrency_max: <int> | default = 8]
54715471
54725472
# The number of timeseries to batch together before ingesting to the TSDB
54735473
# head. Only use this setting when
@@ -5479,14 +5479,14 @@ kafka:
54795479
# use this setting when -ingest-storage.kafka.ingestion-concurrency-max is
54805480
# greater than 0.
54815481
# CLI flag: -ingest-storage.kafka.ingestion-concurrency-queue-capacity
5482-
[ingestion_concurrency_queue_capacity: <int> | default = 5]
5482+
[ingestion_concurrency_queue_capacity: <int> | default = 3]
54835483
54845484
# The expected number of times to ingest timeseries to the TSDB head after
54855485
# batching. With fewer flushes, the overhead of splitting up the work is
54865486
# higher than the benefit of parallelization. Only use this setting when
54875487
# -ingest-storage.kafka.ingestion-concurrency-max is greater than 0.
54885488
# CLI flag: -ingest-storage.kafka.ingestion-concurrency-target-flushes-per-shard
5489-
[ingestion_concurrency_target_flushes_per_shard: <int> | default = 80]
5489+
[ingestion_concurrency_target_flushes_per_shard: <int> | default = 40]
54905490
54915491
# The estimated number of bytes a sample has at time of ingestion. This value
54925492
# is used to estimate the timeseries without decompressing them. Only use this

operations/mimir-tests/test-ingest-storage-auto-client-rack-generated.yaml

Lines changed: 0 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1949,14 +1949,8 @@ spec:
19491949
- -ingest-storage.kafka.auto-create-topic-default-partitions=1000
19501950
- -ingest-storage.kafka.client-rack=zone-a
19511951
- -ingest-storage.kafka.consumer-group-offset-commit-interval=5s
1952-
- -ingest-storage.kafka.fetch-concurrency-max=12
1953-
- -ingest-storage.kafka.ingestion-concurrency-batch-size=150
19541952
- -ingest-storage.kafka.ingestion-concurrency-estimated-bytes-per-sample=500
1955-
- -ingest-storage.kafka.ingestion-concurrency-max=8
1956-
- -ingest-storage.kafka.ingestion-concurrency-queue-capacity=3
1957-
- -ingest-storage.kafka.ingestion-concurrency-target-flushes-per-shard=40
19581953
- -ingest-storage.kafka.last-produced-offset-poll-interval=500ms
1959-
- -ingest-storage.kafka.max-buffered-bytes=1000000000
19601954
- -ingest-storage.kafka.topic=ingest
19611955
- -ingester.max-global-metadata-per-metric=10
19621956
- -ingester.max-global-metadata-per-user=30000
@@ -2096,14 +2090,8 @@ spec:
20962090
- -ingest-storage.kafka.auto-create-topic-default-partitions=1000
20972091
- -ingest-storage.kafka.client-rack=zone-b
20982092
- -ingest-storage.kafka.consumer-group-offset-commit-interval=5s
2099-
- -ingest-storage.kafka.fetch-concurrency-max=12
2100-
- -ingest-storage.kafka.ingestion-concurrency-batch-size=150
21012093
- -ingest-storage.kafka.ingestion-concurrency-estimated-bytes-per-sample=500
2102-
- -ingest-storage.kafka.ingestion-concurrency-max=8
2103-
- -ingest-storage.kafka.ingestion-concurrency-queue-capacity=3
2104-
- -ingest-storage.kafka.ingestion-concurrency-target-flushes-per-shard=40
21052094
- -ingest-storage.kafka.last-produced-offset-poll-interval=500ms
2106-
- -ingest-storage.kafka.max-buffered-bytes=1000000000
21072095
- -ingest-storage.kafka.topic=ingest
21082096
- -ingester.max-global-metadata-per-metric=10
21092097
- -ingester.max-global-metadata-per-user=30000
@@ -2237,14 +2225,8 @@ spec:
22372225
- -ingest-storage.kafka.auto-create-topic-default-partitions=1000
22382226
- -ingest-storage.kafka.client-rack=zone-c
22392227
- -ingest-storage.kafka.consumer-group-offset-commit-interval=5s
2240-
- -ingest-storage.kafka.fetch-concurrency-max=12
2241-
- -ingest-storage.kafka.ingestion-concurrency-batch-size=150
22422228
- -ingest-storage.kafka.ingestion-concurrency-estimated-bytes-per-sample=500
2243-
- -ingest-storage.kafka.ingestion-concurrency-max=8
2244-
- -ingest-storage.kafka.ingestion-concurrency-queue-capacity=3
2245-
- -ingest-storage.kafka.ingestion-concurrency-target-flushes-per-shard=40
22462229
- -ingest-storage.kafka.last-produced-offset-poll-interval=500ms
2247-
- -ingest-storage.kafka.max-buffered-bytes=1000000000
22482230
- -ingest-storage.kafka.topic=ingest
22492231
- -ingester.max-global-metadata-per-metric=10
22502232
- -ingester.max-global-metadata-per-user=30000

0 commit comments

Comments
 (0)