operations: add per-zone config for enabling "ingester multi-az"#15000
Merged
operations: add per-zone config for enabling "ingester multi-az"#15000
Conversation
f3fee7d to
4f8ebf6
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Autofix Details
Bugbot Autofix prepared fixes for both issues found in the latest run.
- ✅ Fixed: Assertion bypass when per-zone flags used without global flag
- Updated ingester multi-AZ detection to OR the three per-zone flags so the assertion triggers whenever any zone multi-AZ flag is enabled.
- ✅ Fixed: Naming convention inconsistent with store-gateway per-zone configs
- Renamed ingester per-zone multi-AZ config keys to the
zone_x_multi_az_enabledpattern and updated the migration test override accordingly.
- Renamed ingester per-zone multi-AZ config keys to the
Or push these changes by commenting:
@cursor push 1778e09841
Preview (1778e09841)
diff --git a/operations/mimir-tests/test-multi-az-read-path-migration-step-1.jsonnet b/operations/mimir-tests/test-multi-az-read-path-migration-step-1.jsonnet
--- a/operations/mimir-tests/test-multi-az-read-path-migration-step-1.jsonnet
+++ b/operations/mimir-tests/test-multi-az-read-path-migration-step-1.jsonnet
@@ -22,6 +22,6 @@
// Enable multi-az config for the ingester zone-a to prevent a restart when
// enabling multi_zone_ingester_multi_az_enabled, but before a second zone
// is created.
- multi_zone_ingester_multi_az_zone_a_enabled: true,
+ multi_zone_ingester_zone_a_multi_az_enabled: true,
},
}
diff --git a/operations/mimir/multi-zone-ingester.libsonnet b/operations/mimir/multi-zone-ingester.libsonnet
--- a/operations/mimir/multi-zone-ingester.libsonnet
+++ b/operations/mimir/multi-zone-ingester.libsonnet
@@ -19,9 +19,9 @@
// Controls whether the multi (virtual) zone ingester should also be deployed multi-AZ.
multi_zone_ingester_multi_az_enabled: $._config.multi_zone_read_path_multi_az_enabled,
- multi_zone_ingester_multi_az_zone_a_enabled: self.multi_zone_ingester_multi_az_enabled,
- multi_zone_ingester_multi_az_zone_b_enabled: self.multi_zone_ingester_multi_az_enabled,
- multi_zone_ingester_multi_az_zone_c_enabled: self.multi_zone_ingester_multi_az_enabled,
+ multi_zone_ingester_zone_a_multi_az_enabled: self.multi_zone_ingester_multi_az_enabled,
+ multi_zone_ingester_zone_b_multi_az_enabled: self.multi_zone_ingester_multi_az_enabled,
+ multi_zone_ingester_zone_c_multi_az_enabled: self.multi_zone_ingester_multi_az_enabled,
},
local container = $.core.v1.container,
@@ -30,10 +30,10 @@
local service = $.core.v1.service,
local podAntiAffinity = $.apps.v1.deployment.mixin.spec.template.spec.affinity.podAntiAffinity,
- local isMultiAZEnabled = $._config.multi_zone_ingester_multi_az_enabled,
- local isZoneAEnabled = $._config.multi_zone_ingester_multi_az_zone_a_enabled && std.length($._config.multi_zone_availability_zones) >= 1,
- local isZoneBEnabled = $._config.multi_zone_ingester_multi_az_zone_b_enabled && std.length($._config.multi_zone_availability_zones) >= 2,
- local isZoneCEnabled = $._config.multi_zone_ingester_multi_az_zone_c_enabled && std.length($._config.multi_zone_availability_zones) >= 3,
+ local isMultiAZEnabled = $._config.multi_zone_ingester_zone_a_multi_az_enabled || $._config.multi_zone_ingester_zone_b_multi_az_enabled || $._config.multi_zone_ingester_zone_c_multi_az_enabled,
+ local isZoneAEnabled = $._config.multi_zone_ingester_zone_a_multi_az_enabled && std.length($._config.multi_zone_availability_zones) >= 1,
+ local isZoneBEnabled = $._config.multi_zone_ingester_zone_b_multi_az_enabled && std.length($._config.multi_zone_availability_zones) >= 2,
+ local isZoneCEnabled = $._config.multi_zone_ingester_zone_c_multi_az_enabled && std.length($._config.multi_zone_availability_zones) >= 3,
assert !isMultiAZEnabled || $._config.multi_zone_ingester_enabled : 'ingester multi-AZ deployment requires ingester multi-zone to be enabled',
assert !$._config.multi_zone_ingester_zpdb_enabled || $._config.rollout_operator_webhooks_enabled : 'zpdb configuration requires rollout_operator_webhooks_enabled=true',You can send follow-ups to the cloud agent here.
pracucci
approved these changes
Apr 13, 2026
Collaborator
pracucci
left a comment
There was a problem hiding this comment.
Overall LGTM, but please see my comments, thanks
4f8ebf6 to
45bac9f
Compare
When enabling `multi_zone_ingester_multi_az_enabled` in the step-3 of the migration, we change the spec of the ingester-zone-a to add the `nodeAffinity` and that cause a rollout. Since zone-b isn't restarted at this point, it can cause a period in which a ingester shard isn't available. This is only true when not using ingester-zone-c for the migrations. We add new configs that allows a more fine grain control and update the migration process to set those specs, before we stop the zone-b, preventing a zone-a rollout when we enable `multi_zone_ingester_multi_az_enabled`. Signed-off-by: Laurent Dufresne <laurent.dufresne@grafana.com>
45bac9f to
c865d59
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix prepared a fix for the issue found in the latest run.
- ✅ Fixed: Assertion uses computed zone values instead of raw configs
- Updated the ingester assertion to use a new raw-config
isMultiAZEnabledOR of per-zone flags so misconfigurations are caught even when availability zones are missing.
- Updated the ingester assertion to use a new raw-config
Or push these changes by commenting:
@cursor push 6654b8630b
Preview (6654b8630b)
diff --git a/operations/mimir/multi-zone-ingester.libsonnet b/operations/mimir/multi-zone-ingester.libsonnet
--- a/operations/mimir/multi-zone-ingester.libsonnet
+++ b/operations/mimir/multi-zone-ingester.libsonnet
@@ -30,12 +30,12 @@
local service = $.core.v1.service,
local podAntiAffinity = $.apps.v1.deployment.mixin.spec.template.spec.affinity.podAntiAffinity,
+ local isMultiAZEnabled = $._config.multi_zone_ingester_zone_a_multi_az_enabled || $._config.multi_zone_ingester_zone_b_multi_az_enabled || $._config.multi_zone_ingester_zone_c_multi_az_enabled,
local isZoneAEnabled = $._config.multi_zone_ingester_zone_a_multi_az_enabled && std.length($._config.multi_zone_availability_zones) >= 1,
local isZoneBEnabled = $._config.multi_zone_ingester_zone_b_multi_az_enabled && std.length($._config.multi_zone_availability_zones) >= 2,
local isZoneCEnabled = $._config.multi_zone_ingester_zone_c_multi_az_enabled && std.length($._config.multi_zone_availability_zones) >= 3,
- local isMultiAZAtLeastOnceEnabled = isZoneAEnabled || isZoneBEnabled || isZoneCEnabled,
- assert !isMultiAZAtLeastOnceEnabled || $._config.multi_zone_ingester_enabled : 'ingester multi-AZ deployment requires ingester multi-zone to be enabled',
+ assert !isMultiAZEnabled || $._config.multi_zone_ingester_enabled : 'ingester multi-AZ deployment requires ingester multi-zone to be enabled',
assert !$._config.multi_zone_ingester_zpdb_enabled || $._config.rollout_operator_webhooks_enabled : 'zpdb configuration requires rollout_operator_webhooks_enabled=true',
//You can send follow-ups to the cloud agent here.
Reviewed by Cursor Bugbot for commit c865d59. Configure here.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


What this PR does
When enabling
multi_zone_ingester_multi_az_enabledin the step-3 of the migration, we change the spec of the ingester-zone-a to add thenodeAffinityand that cause a rollout. Since zone-b isn't restarted at this point, it can cause a period in which a ingester shard isn't available. This is only true when not using ingester-zone-c for the migrations.We add new configs that allows a more fine grain control and update the migration process to set those specs, before we stop the zone-b, preventing a zone-a rollout when we enable
multi_zone_ingester_multi_az_enabled.Checklist
CHANGELOG.mdupdated - the order of entries should be[CHANGE],[FEATURE],[ENHANCEMENT],[BUGFIX]. If changelog entry is not needed, please add thechangelog-not-neededlabel to the PR.about-versioning.mdupdated with experimental features.Note
Medium Risk
Changes multi-zone ingester deployment configuration and scheduling behavior (node affinity/tolerations), which can affect rollout timing and shard availability during migrations if misconfigured.
Overview
Adds per-zone control for ingester multi-AZ. Introduces
$._config.multi_zone_ingester_zone_(a|b|c)_multi_az_enabled(defaulting tomulti_zone_ingester_multi_az_enabled) and switches multi-zone ingester AZ enablement logic to be driven per zone, with validation requiring multi-zone ingesters when any zone is enabled.Updates multi-AZ read-path migration tests/manifests. Migration step-1 now pre-enables zone-a’s multi-AZ setting, and generated test YAMLs add the resulting
nodeAffinity/tolerationsoningester-zone-aso enabling global ingester multi-AZ later doesn’t trigger an unexpected rollout.Docs/notes. Adds a
CHANGELOG.mdenhancement entry describing the new per-zone ingester multi-AZ config.Reviewed by Cursor Bugbot for commit c865d59. Bugbot is set up for automated code reviews on this repo. Configure here.