Skip to content

Commit 155d9dc

Browse files
authored
[r387] Revert "HA dedup on every sample (#13665)" (#15023)
Backport 49c3084 from #15018 <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Medium Risk** > Changes HA deduplication behavior in the distributor by no longer evaluating cluster/replica labels per-series within a write request, which can alter ingestion outcomes for mixed-label batches. Risk is mitigated by updated docs but could impact non-standard Prometheus setups (federation/proxies) that send heterogeneous series in one request. > > **Overview** > Reverts the distributor HA deduplication optimization that evaluated HA state per-series, and replaces it with a single HA check based on the *first* timeseries’ cluster/replica labels, applying the decision uniformly across the whole request. > > This removes the per-replica state tracking/sorting logic and associated tests/benchmarks, updates `findHALabels` to return `(cluster, replica)` strings, and updates docs/CHANGELOG to reflect the new assumption that all series in a request share the same HA labels. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit 1c41453. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY -->
1 parent e0c3f73 commit 155d9dc

6 files changed

Lines changed: 77 additions & 932 deletions

File tree

CHANGELOG.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -154,7 +154,6 @@
154154
* [ENHANCEMENT] Query-frontend: Extend query blocking to optionally only apply a blocking rule if the query is an unaligned range query. Set `unaligned_range_queries: true` to enable. #14643
155155
* [ENHANCEMENT] Store-gateway: Add experimental flag `blocks-storage.bucket-store.partitioner-max-gap-bytes-chunks` to specify the gap size for the chunks partitioner. #14649
156156
* [ENHANCEMENT] Compactor: Add expermental `-compactor.first-level-compaction-ooo-wait-period` to configure a separate compaction wait period for out-of-order blocks. It's an analogue of `-compactor.first-level-compaction-wait-period`, which currently ignores out-of-order blocks. #14627
157-
* [ENHANCEMENT] HA: Deduplicate per sample instead of per batch. #13665
158157
* [ENHANCEMENT] Usage-tracker: Improve performance of TrackSeriesBatch by preprocessing input data. #14702 #14734
159158
* [ENHANCEMENT] MQE: Improve per-query memory consumption limit enforcement in histogram function evaluations. #14691
160159
* [ENHANCEMENT] MQE: Improve per-query memory consumption limit enforcement within aggregation operations. #14735

docs/sources/mimir/configure/configure-high-availability-deduplication.md

Lines changed: 1 addition & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -45,14 +45,7 @@ Incoming samples are considered duplicated (and thus dropped) if they are receiv
4545

4646
If the HA tracker is enabled but incoming samples contain only one or none of the cluster and replica labels, these samples are accepted by default and never deduplicated.
4747

48-
> Note: the HA tracker checks the cluster and replica label of every series in the request to determine whether each series in the request should be deduplicated.
49-
50-
### Error responses
51-
52-
When the HA tracker drops samples, Mimir returns one of the following errors depending on the reason:
53-
54-
- **Replicas did not match**: When samples are received from a non-elected replica, Mimir returns an HTTP `202 Accepted` response with the message `replicas did not match, rejecting sample: replica=<replica>, elected=<elected>`. This indicates that the samples were successfully deduplicated and can be safely ignored.
55-
- **Too many HA clusters**: When the number of HA clusters for a tenant exceeds the configured limit, Mimir returns an HTTP `400 Bad Request` response with the error ID `err-mimir-tenant-too-many-ha-clusters`. To adjust this limit, configure `-distributor.ha-tracker.max-clusters` or contact your service administrator.
48+
> Note: for performance reasons, the HA tracker only checks the cluster and replica label of the first series in the request to determine whether all series in the request should be deduplicated. This assumes that all series inside the request have the same cluster and replica labels, which is typically true when Prometheus is configured with external labels. Ensure this requirement is honored if you have a non-standard Prometheus setup (for example, you're using Prometheus federation or have a metrics proxy in between).
5649
5750
## Configuration
5851

0 commit comments

Comments
 (0)