Skip to content

Commit bd557c4

Browse files
Abhishek KumarAbhishek Kumar
authored andcommitted
feat: implement strict multi-zone pod distribution for StatefulSets
1 parent c30575d commit bd557c4

10 files changed

+509
-0
lines changed

docs/CONFIG-VARS.md

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -332,6 +332,43 @@ Notes:
332332
- For example, defining `V4_CFG_VIYA_STOP_SCHEDULE` and not `V4_CFG_VIYA_START_SCHEDULE` will result in a Viya stop job that runs on a schedule and a suspended Viya start job that you will be able to manually trigger.
333333
- Defining both `V4_CFG_VIYA_START_SCHEDULE` and `V4_CFG_VIYA_STOP_SCHEDULE` will result in a non-suspended Viya start and stop job that runs on the schedule you defined.
334334

335+
## Multi-Zone Pod Distribution
336+
337+
| Name | Description | Type | Default | Required | Notes | Tasks |
338+
| :--- | ---: | ---: | ---: | ---: | ---: | ---: |
339+
| V4_CFG_MULTI_ZONE_ENABLED | Enable strict multi-zone pod distribution and anti-affinity for StatefulSets | bool | true | false | Adds restrictive topology spread constraints (maxSkew: 0) and required pod anti-affinity to prevent StatefulSet quorum loss during zone failures | viya |
340+
| V4_CFG_MULTI_ZONE_RABBITMQ_ENABLED | Enable strict multi-zone distribution for RabbitMQ StatefulSet | bool | true | false | Ensures RabbitMQ pods are strictly distributed across zones with required anti-affinity to maintain quorum during zone failures | viya |
341+
| V4_CFG_MULTI_ZONE_POSTGRES_ENABLED | Enable strict multi-zone distribution for PostgreSQL StatefulSet | bool | true | false | Ensures PostgreSQL pods are strictly distributed across zones for high availability. Only applies to internal PostgreSQL deployments | viya |
342+
| V4_CFG_STATEFUL_NODEPOOL_RESTRICTION | Restrict StatefulSets to dedicated stateful nodepools | bool | true | false | Adds node affinity to ensure StatefulSets only run on nodes labeled with workload.sas.com/class=stateful | viya |
343+
| V4_CFG_STATEFUL_NODEPOOL_LABEL | Label key used to identify stateful workload nodepools | string | workload.sas.com/class | false | Node label used for nodepool restriction. Stateful pods will require this label with value 'stateful' | viya |
344+
| V4_CFG_MULTI_ZONE_AUTO_DETECT | Automatically detect if cluster has multiple zones | bool | true | false | When enabled, automatically detects cluster topology and applies appropriate constraints. Prevents scheduling issues in single-zone clusters | viya |
345+
| V4_CFG_SINGLE_ZONE_FALLBACK | Enable relaxed constraints for single-zone clusters | bool | true | false | Applies relaxed topology constraints and preferred (not required) anti-affinity in single-zone deployments to avoid scheduling failures | viya |
346+
347+
**Notes:**
348+
349+
**Multi-Zone Clusters** (2+ zones detected):
350+
- **Zone Distribution**: `maxSkew: 0` with `DoNotSchedule` - pods must be evenly distributed across zones
351+
- **Required Pod Anti-Affinity**: Pods cannot be scheduled in the same zone as another pod of the same StatefulSet
352+
- **Nodepool Restriction**: StatefulSets are restricted to nodes with `workload.sas.com/class=stateful` label
353+
- **Node Distribution**: `maxSkew: 1` with `DoNotSchedule` - prevents multiple pods on the same node
354+
355+
**Single-Zone Clusters** (1 zone detected):
356+
- **Node Distribution**: `maxSkew: 1` with `ScheduleAnyway` - spreads pods across nodes when possible
357+
- **Preferred Pod Anti-Affinity**: Attempts to avoid co-location on same node (weight: 100)
358+
- **Preferred Node Affinity**: Prefers stateful nodepool but allows scheduling elsewhere if needed
359+
360+
**Automatic Behavior**:
361+
- Detects cluster zones by querying node labels (`topology.kubernetes.io/zone`)
362+
- Applies strict constraints only in true multi-zone environments
363+
- Falls back to relaxed constraints in single-zone clusters
364+
- Ensures compatibility across different infrastructure deployments
365+
366+
This configuration ensures:
367+
- No StatefulSet quorum loss during zone failures in multi-zone clusters
368+
- No scheduling failures in single-zone deployments
369+
- Optimal resource distribution based on cluster topology
370+
- Supports AKS, EKS, and GKE clusters
371+
335372
## Third-Party Tools
336373

337374
### Cert-manager

docs/user/MultiZoneDistribution.md

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
# Strict Multi-Zone Pod Distribution - Implementation Guide
2+
3+
## Overview
4+
This implementation provides strict multi-zone pod distribution for StatefulSets in AKS, EKS, and GKE clusters to prevent quorum loss during zone failures.
5+
6+
## Features Added
7+
8+
### 1. Configuration Variables (roles/vdm/defaults/main.yaml)
9+
- `V4_CFG_MULTI_ZONE_ENABLED`: Master switch for strict multi-zone distribution (default: true)
10+
- `V4_CFG_MULTI_ZONE_RABBITMQ_ENABLED`: RabbitMQ-specific distribution (default: true)
11+
- `V4_CFG_MULTI_ZONE_POSTGRES_ENABLED`: PostgreSQL-specific distribution (default: true)
12+
- `V4_CFG_STATEFUL_NODEPOOL_RESTRICTION`: Restrict StatefulSets to dedicated nodepools (default: true)
13+
- `V4_CFG_STATEFUL_NODEPOOL_LABEL`: Label for stateful nodepool identification (default: workload.sas.com/class)
14+
15+
### 2. Transformers Created
16+
- `rabbitmq-zone-distribution.yaml`: RabbitMQ strict zone distribution and nodepool restriction
17+
- `postgres-zone-distribution.yaml`: PostgreSQL strict zone distribution and nodepool restriction
18+
- `multi-zone-pod-distribution.yaml`: General StatefulSet strict distribution rules
19+
20+
### 3. Restrictive Implementation Details
21+
22+
#### Strict Topology Spread Constraints
23+
- **Zone Distribution**: `maxSkew: 0` on `topology.kubernetes.io/zone` with `DoNotSchedule`
24+
- Ensures perfectly even distribution across zones
25+
- Prevents scheduling if it would create imbalance
26+
- **Node Distribution**: `maxSkew: 1` on `kubernetes.io/hostname` with `DoNotSchedule`
27+
- Prevents multiple pods on same node
28+
29+
#### Required Pod Anti-Affinity Rules
30+
- **Zone-level**: Required anti-affinity to prevent pods in same zone
31+
- **Node-level**: Preferred anti-affinity to avoid same node (weight: 100)
32+
33+
#### Nodepool Restrictions
34+
- **Required Node Affinity**: Stateful workloads must run on nodes with:
35+
- `workload.sas.com/class: stateful` label
36+
- `agentpool` label (ensures managed nodepool)
37+
38+
### 4. Acceptance Criteria Compliance
39+
40+
**More Restrictive Pod Topology Constraints**:
41+
- `maxSkew: 0` for zone distribution (most restrictive)
42+
- `DoNotSchedule` for both zone and node constraints
43+
- Required anti-affinity rules
44+
45+
**Restrict StatefulSets to One Nodepool**:
46+
- Node affinity requires `workload.sas.com/class=stateful`
47+
- Ensures StatefulSets run only on designated stateful nodepool
48+
49+
### 5. Benefits
50+
- **Zero Quorum Loss Risk**: Strict zone distribution prevents cluster failures
51+
- **Dedicated Resources**: Stateful workloads isolated to specific nodepool
52+
- **Predictable Scheduling**: Clear constraints for placement decisions
53+
- **Multi-Cloud Support**: Works with AKS, EKS, and GKE
54+
55+
## Usage
56+
57+
Enable in your ansible-vars.yaml:
58+
```yaml
59+
V4_CFG_MULTI_ZONE_ENABLED: true
60+
V4_CFG_MULTI_ZONE_RABBITMQ_ENABLED: true
61+
V4_CFG_MULTI_ZONE_POSTGRES_ENABLED: true
62+
V4_CFG_STATEFUL_NODEPOOL_RESTRICTION: true
63+
```
64+
65+
## Nodepool Requirements
66+
Ensure your stateful nodepool is labeled:
67+
```bash
68+
kubectl label nodes <stateful-node> workload.sas.com/class=stateful
69+
```

roles/vdm/defaults/main.yaml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -114,3 +114,12 @@ V4_WORKLOAD_ORCHESTRATOR_ENABLED: true
114114

115115
## NIST Features
116116
V4_CFG_NIST_FEATURES_ENABLED: false
117+
118+
## Multi-Zone Pod Distribution
119+
V4_CFG_MULTI_ZONE_ENABLED: true
120+
V4_CFG_MULTI_ZONE_RABBITMQ_ENABLED: true
121+
V4_CFG_MULTI_ZONE_POSTGRES_ENABLED: true
122+
V4_CFG_STATEFUL_NODEPOOL_RESTRICTION: true
123+
V4_CFG_STATEFUL_NODEPOOL_LABEL: "workload.sas.com/class"
124+
V4_CFG_MULTI_ZONE_AUTO_DETECT: true
125+
V4_CFG_SINGLE_ZONE_FALLBACK: true

roles/vdm/tasks/main.yaml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -213,6 +213,14 @@
213213
- uninstall
214214
- update
215215

216+
# Include Multi-Zone Pod Distribution configuration
217+
- name: Include Multi-Zone Distribution
218+
include_tasks: multi_zone_distribution.yaml
219+
tags:
220+
- install
221+
- uninstall
222+
- update
223+
216224
# Include Sizing configuration and resources
217225
- name: Include Sizing
218226
include_tasks: sizing.yaml
Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
# Copyright © 2020-2025, SAS Institute Inc., Cary, NC, USA. All Rights Reserved.
2+
# SPDX-License-Identifier: Apache-2.0
3+
4+
---
5+
# This file contains tasks for configuring multi-zone pod distribution and anti-affinity rules
6+
# to prevent StatefulSet quorum loss during zone failures in multi-zone clusters.
7+
# For single-zone deployments, applies relaxed constraints to avoid scheduling issues.
8+
9+
# Detect if cluster has multiple zones
10+
- name: Multi-Zone - Detect cluster zones
11+
kubernetes.core.k8s_info:
12+
api_version: v1
13+
kind: Node
14+
kubeconfig: "{{ KUBECONFIG }}"
15+
register: cluster_nodes
16+
when:
17+
- V4_CFG_MULTI_ZONE_ENABLED | bool
18+
- V4_CFG_MULTI_ZONE_AUTO_DETECT | bool
19+
tags:
20+
- install
21+
- uninstall
22+
- update
23+
24+
# Set fact for zone count
25+
- name: Multi-Zone - Calculate zone count
26+
set_fact:
27+
cluster_zone_count: "{{ cluster_nodes.resources | map(attribute='metadata.labels') | map('dict2items') | flatten | selectattr('key', 'equalto', 'topology.kubernetes.io/zone') | map(attribute='value') | unique | length }}"
28+
is_multi_zone: "{{ (cluster_nodes.resources | map(attribute='metadata.labels') | map('dict2items') | flatten | selectattr('key', 'equalto', 'topology.kubernetes.io/zone') | map(attribute='value') | unique | length | int) > 1 }}"
29+
when:
30+
- V4_CFG_MULTI_ZONE_ENABLED | bool
31+
- V4_CFG_MULTI_ZONE_AUTO_DETECT | bool
32+
- cluster_nodes.resources is defined
33+
tags:
34+
- install
35+
- uninstall
36+
- update
37+
38+
# Add multi-zone distribution overlay for RabbitMQ StatefulSet (multi-zone clusters)
39+
- name: Multi-Zone - RabbitMQ zone distribution with restrictive constraints
40+
overlay_facts:
41+
cadence_name: "{{ V4_CFG_CADENCE_NAME }}"
42+
cadence_number: "{{ V4_CFG_CADENCE_VERSION }}"
43+
existing: "{{ vdm_overlays }}"
44+
add:
45+
- { transformers: rabbitmq-zone-distribution.yaml, vdm: true }
46+
when:
47+
- V4_CFG_MULTI_ZONE_ENABLED | bool
48+
- V4_CFG_MULTI_ZONE_RABBITMQ_ENABLED | bool
49+
- PROVIDER in ["AKS", "azure", "EKS", "aws", "GKE", "gcp"]
50+
- not V4_CFG_MULTI_ZONE_AUTO_DETECT | bool or (is_multi_zone | default(true) | bool)
51+
tags:
52+
- install
53+
- uninstall
54+
- update
55+
56+
# Add relaxed distribution overlay for RabbitMQ StatefulSet (single-zone clusters)
57+
- name: Multi-Zone - RabbitMQ single-zone distribution fallback
58+
overlay_facts:
59+
cadence_name: "{{ V4_CFG_CADENCE_NAME }}"
60+
cadence_number: "{{ V4_CFG_CADENCE_VERSION }}"
61+
existing: "{{ vdm_overlays }}"
62+
add:
63+
- { transformers: rabbitmq-single-zone-distribution.yaml, vdm: true }
64+
when:
65+
- V4_CFG_MULTI_ZONE_ENABLED | bool
66+
- V4_CFG_MULTI_ZONE_RABBITMQ_ENABLED | bool
67+
- V4_CFG_SINGLE_ZONE_FALLBACK | bool
68+
- PROVIDER in ["AKS", "azure", "EKS", "aws", "GKE", "gcp"]
69+
- V4_CFG_MULTI_ZONE_AUTO_DETECT | bool and not (is_multi_zone | default(true) | bool)
70+
tags:
71+
- install
72+
- uninstall
73+
- update
74+
75+
# Add multi-zone distribution overlay for PostgreSQL StatefulSet (multi-zone clusters)
76+
- name: Multi-Zone - PostgreSQL zone distribution with restrictive constraints
77+
overlay_facts:
78+
cadence_name: "{{ V4_CFG_CADENCE_NAME }}"
79+
cadence_number: "{{ V4_CFG_CADENCE_VERSION }}"
80+
existing: "{{ vdm_overlays }}"
81+
add:
82+
- { transformers: postgres-zone-distribution.yaml, vdm: true }
83+
when:
84+
- V4_CFG_MULTI_ZONE_ENABLED | bool
85+
- V4_CFG_MULTI_ZONE_POSTGRES_ENABLED | bool
86+
- V4_CFG_POSTGRES_SERVERS.default.internal | bool
87+
- PROVIDER in ["AKS", "azure", "EKS", "aws", "GKE", "gcp"]
88+
- not V4_CFG_MULTI_ZONE_AUTO_DETECT | bool or (is_multi_zone | default(true) | bool)
89+
tags:
90+
- install
91+
- uninstall
92+
- update
93+
94+
# Add relaxed distribution overlay for PostgreSQL StatefulSet (single-zone clusters)
95+
- name: Multi-Zone - PostgreSQL single-zone distribution fallback
96+
overlay_facts:
97+
cadence_name: "{{ V4_CFG_CADENCE_NAME }}"
98+
cadence_number: "{{ V4_CFG_CADENCE_VERSION }}"
99+
existing: "{{ vdm_overlays }}"
100+
add:
101+
- { transformers: postgres-single-zone-distribution.yaml, vdm: true }
102+
when:
103+
- V4_CFG_MULTI_ZONE_ENABLED | bool
104+
- V4_CFG_MULTI_ZONE_POSTGRES_ENABLED | bool
105+
- V4_CFG_POSTGRES_SERVERS.default.internal | bool
106+
- V4_CFG_SINGLE_ZONE_FALLBACK | bool
107+
- PROVIDER in ["AKS", "azure", "EKS", "aws", "GKE", "gcp"]
108+
- V4_CFG_MULTI_ZONE_AUTO_DETECT | bool and not (is_multi_zone | default(true) | bool)
109+
tags:
110+
- install
111+
- uninstall
112+
- update
113+
114+
# Add general multi-zone distribution overlay for other StatefulSets (multi-zone clusters)
115+
- name: Multi-Zone - General StatefulSet zone distribution with restrictive constraints
116+
overlay_facts:
117+
cadence_name: "{{ V4_CFG_CADENCE_NAME }}"
118+
cadence_number: "{{ V4_CFG_CADENCE_VERSION }}"
119+
existing: "{{ vdm_overlays }}"
120+
add:
121+
- { transformers: multi-zone-pod-distribution.yaml, vdm: true }
122+
when:
123+
- V4_CFG_MULTI_ZONE_ENABLED | bool
124+
- V4_CFG_STATEFUL_NODEPOOL_RESTRICTION | bool
125+
- PROVIDER in ["AKS", "azure", "EKS", "aws", "GKE", "gcp"]
126+
- not V4_CFG_MULTI_ZONE_AUTO_DETECT | bool or (is_multi_zone | default(true) | bool)
127+
tags:
128+
- install
129+
- uninstall
130+
- update
Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
# Copyright © 2020-2025, SAS Institute Inc., Cary, NC, USA. All Rights Reserved.
2+
# SPDX-License-Identifier: Apache-2.0
3+
4+
# Transformer to add restrictive pod topology spread constraints and anti-affinity rules
5+
# for StatefulSets to ensure strict zone distribution and quorum protection
6+
apiVersion: builtin
7+
kind: PatchTransformer
8+
metadata:
9+
name: multi-zone-pod-distribution
10+
patch: |-
11+
- op: add
12+
path: /spec/template/spec/topologySpreadConstraints
13+
value:
14+
- maxSkew: 0
15+
topologyKey: topology.kubernetes.io/zone
16+
whenUnsatisfiable: DoNotSchedule
17+
labelSelector:
18+
matchLabels:
19+
app.kubernetes.io/name: PLACEHOLDER_APP_NAME
20+
- maxSkew: 1
21+
topologyKey: kubernetes.io/hostname
22+
whenUnsatisfiable: DoNotSchedule
23+
labelSelector:
24+
matchLabels:
25+
app.kubernetes.io/name: PLACEHOLDER_APP_NAME
26+
- op: add
27+
path: /spec/template/spec/affinity
28+
value:
29+
nodeAffinity:
30+
requiredDuringSchedulingIgnoredDuringExecution:
31+
nodeSelectorTerms:
32+
- matchExpressions:
33+
- key: workload.sas.com/class
34+
operator: In
35+
values:
36+
- stateful
37+
- key: agentpool
38+
operator: Exists
39+
podAntiAffinity:
40+
requiredDuringSchedulingIgnoredDuringExecution:
41+
- labelSelector:
42+
matchLabels:
43+
app.kubernetes.io/name: PLACEHOLDER_APP_NAME
44+
topologyKey: topology.kubernetes.io/zone
45+
preferredDuringSchedulingIgnoredDuringExecution:
46+
- weight: 100
47+
podAffinityTerm:
48+
labelSelector:
49+
matchLabels:
50+
app.kubernetes.io/name: PLACEHOLDER_APP_NAME
51+
topologyKey: kubernetes.io/hostname
52+
target:
53+
kind: StatefulSet
54+
name: ".*rabbitmq.*|.*postgres.*|.*cas.*"
55+
version: v1
56+
replacements:
57+
- source:
58+
kind: StatefulSet
59+
fieldPath: metadata.labels['app.kubernetes.io/name']
60+
targets:
61+
- select:
62+
kind: PatchTransformer
63+
fieldPaths:
64+
- patch
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
# Copyright © 2020-2025, SAS Institute Inc., Cary, NC, USA. All Rights Reserved.
2+
# SPDX-License-Identifier: Apache-2.0
3+
4+
# Transformer for PostgreSQL StatefulSet in single-zone clusters with relaxed constraints
5+
apiVersion: builtin
6+
kind: PatchTransformer
7+
metadata:
8+
name: postgres-single-zone-distribution
9+
patch: |-
10+
- op: add
11+
path: /spec/template/spec/topologySpreadConstraints
12+
value:
13+
- maxSkew: 1
14+
topologyKey: kubernetes.io/hostname
15+
whenUnsatisfiable: ScheduleAnyway
16+
labelSelector:
17+
matchLabels:
18+
postgres-operator.crunchydata.com/cluster: shared-postgres
19+
- op: add
20+
path: /spec/template/spec/affinity
21+
value:
22+
nodeAffinity:
23+
preferredDuringSchedulingIgnoredDuringExecution:
24+
- weight: 100
25+
preference:
26+
matchExpressions:
27+
- key: workload.sas.com/class
28+
operator: In
29+
values:
30+
- stateful
31+
podAntiAffinity:
32+
preferredDuringSchedulingIgnoredDuringExecution:
33+
- weight: 100
34+
podAffinityTerm:
35+
labelSelector:
36+
matchLabels:
37+
postgres-operator.crunchydata.com/cluster: shared-postgres
38+
topologyKey: kubernetes.io/hostname
39+
target:
40+
kind: StatefulSet
41+
name: ".*postgres.*"
42+
version: v1

0 commit comments

Comments
 (0)