This guide explains how to configure topology placement for Neo4j Enterprise clusters to ensure high availability, optimal performance, and fault tolerance across different failure domains in your Kubernetes cluster.
The Neo4j Kubernetes Operator provides sophisticated topology placement capabilities to distribute Neo4j cluster nodes across different failure domains (zones, regions, or custom topology domains). This ensures your database remains available even when entire zones or racks fail.
- Zones: Physical or logical isolation boundaries (e.g., AWS availability zones, data center racks)
- Nodes: Individual Kubernetes worker nodes
- Regions: Larger geographical boundaries containing multiple zones
Standard Kubernetes topology labels used for placement:
topology.kubernetes.io/zone- Availability zone placementkubernetes.io/hostname- Node-level anti-affinitytopology.kubernetes.io/region- Regional placement- Custom labels defined by your infrastructure team
For most production deployments, distributing across availability zones is recommended:
apiVersion: neo4j.neo4j.com/v1alpha1
kind: Neo4jEnterpriseCluster
metadata:
name: production-cluster
spec:
topology:
servers: 3 # 3 servers will self-organize into primary/secondary roles
placement:
topologySpread:
enabled: true
topologyKey: "topology.kubernetes.io/zone"
maxSkew: 1
whenUnsatisfiable: "DoNotSchedule"This configuration ensures:
- Server pods are evenly distributed across availability zones
- Maximum difference of 1 pod between any two zones
- Pods won't be scheduled if distribution requirements can't be met
- Servers self-organize to host databases with appropriate primary/secondary topologies
To prevent multiple Neo4j server pods from running on the same node:
spec:
topology:
servers: 5 # 5 servers will self-organize into primary/secondary roles
placement:
antiAffinity:
enabled: true
topologyKey: "kubernetes.io/hostname"
type: "preferred" # or "required" for strict enforcementAnti-affinity types:
preferred: Best effort - scheduler tries to avoid co-locationrequired: Strict - pods will remain unscheduled if constraints can't be met
For maximum resilience, combine zone distribution with node anti-affinity:
spec:
topology:
servers: 3 # 3 servers will self-organize into primary/secondary roles
placement:
topologySpread:
enabled: true
topologyKey: "topology.kubernetes.io/zone"
maxSkew: 1
whenUnsatisfiable: "ScheduleAnyway" # More flexible for maintenance
antiAffinity:
enabled: true
topologyKey: "kubernetes.io/hostname"
type: "preferred"Explicitly define which zones to use:
spec:
topology:
servers: 3 # 3 servers will self-organize into primary/secondary roles
availabilityZones:
- us-east-1a
- us-east-1b
- us-east-1c
enforceDistribution: true # Ensures servers are distributed across zones
placement:
topologySpread:
enabled: true
topologyKey: "topology.kubernetes.io/zone"Ensure scheduling only when sufficient domains are available:
spec:
topology:
servers: 6 # 6 servers will self-organize into appropriate roles
placement:
topologySpread:
enabled: true
topologyKey: "topology.kubernetes.io/zone"
maxSkew: 1
minDomains: 3 # Require at least 3 zones
whenUnsatisfiable: "DoNotSchedule"The operator also supports standard Kubernetes placement options for fine-grained control:
Target specific node pools:
spec:
nodeSelector:
node-type: "neo4j-optimized"
storage-type: "nvme"Allow pods to schedule on tainted nodes:
spec:
tolerations:
- key: "neo4j-dedicated"
operator: "Equal"
value: "true"
effect: "NoSchedule"For complex placement requirements:
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node.kubernetes.io/instance-type
operator: In
values:
- m5.2xlarge
- m5.4xlarge
podAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- monitoring
topologyKey: topology.kubernetes.io/zoneFor production clusters requiring maximum availability:
spec:
topology:
servers: 5 # 5 servers for high availability (odd number recommended)
enforceDistribution: true
placement:
topologySpread:
enabled: true
topologyKey: "topology.kubernetes.io/zone"
maxSkew: 1
whenUnsatisfiable: "DoNotSchedule"
antiAffinity:
enabled: true
topologyKey: "kubernetes.io/hostname"
type: "required"Servers will automatically organize to host databases with appropriate primary/secondary topologies based on database requirements.
For development or cost-sensitive deployments:
spec:
topology:
servers: 3 # 3 servers for basic high availability
placement:
topologySpread:
enabled: true
topologyKey: "topology.kubernetes.io/zone"
maxSkew: 2 # Allow more imbalance
whenUnsatisfiable: "ScheduleAnyway"
antiAffinity:
enabled: true
topologyKey: "kubernetes.io/hostname"
type: "preferred" # Soft constraintServers will self-organize and databases can be created with varying topologies as needed.
For development environments without zone requirements:
spec:
topology:
servers: 2 # Minimum 2 servers for clustering
# No placement configuration needed for single-zoneFor true single-node development, use Neo4jEnterpriseStandalone instead of Neo4jEnterpriseCluster.
Check topology constraints:
# Describe the pending pod
kubectl describe pod <pod-name>
# Check available nodes and their zones
kubectl get nodes -L topology.kubernetes.io/zone
# Verify node capacity
kubectl top nodesVerify zone labels and capacity:
# Count pods per zone
kubectl get pods -l app.kubernetes.io/instance=<cluster-name> \
-o custom-columns=NAME:.metadata.name,ZONE:.spec.nodeName \
| xargs -I {} sh -c 'echo {} $(kubectl get node $(echo {} | awk "{print \$2}") -L topology.kubernetes.io/zone --no-headers | awk "{print \$6}")'
# Check topology spread constraints
kubectl get pod <pod-name> -o yaml | grep -A10 topologySpreadConstraintsThe operator emits warnings for suboptimal configurations:
# Check operator events
kubectl get events --field-selector reason=TopologyWarning
# View cluster status
kubectl describe neo4jenterprisecluster <cluster-name>- Use Odd Numbers of Servers: 3, 5, or 7 servers provide optimal fault tolerance for database quorum behavior
- Plan for Zone Distribution: Ensure you have enough zones to satisfy your topology spread constraints
- Enable enforceDistribution: Ensures servers are distributed across zones for maximum availability
- Start with Soft Constraints: Use
preferredanti-affinity andScheduleAnywayduring initial deployment - Monitor Zone Capacity: Ensure each zone has sufficient resources for your server topology
- Test Failure Scenarios: Verify cluster behavior and database availability when zones become unavailable
- Consider Database Requirements: Plan server count based on expected database topology requirements
Complete example for a production-grade deployment:
apiVersion: neo4j.neo4j.com/v1alpha1
kind: Neo4jEnterpriseCluster
metadata:
name: production-neo4j
spec:
image:
repo: neo4j
tag: "5.26-enterprise"
topology:
servers: 5 # 5 servers for high availability
availabilityZones:
- us-east-1a
- us-east-1b
- us-east-1c
enforceDistribution: true
placement:
topologySpread:
enabled: true
topologyKey: "topology.kubernetes.io/zone"
maxSkew: 1
whenUnsatisfiable: "DoNotSchedule"
minDomains: 3
antiAffinity:
enabled: true
topologyKey: "kubernetes.io/hostname"
type: "required"
# Authentication
auth:
provider: native
adminSecret: neo4j-admin-secret
# TLS for production
tls:
mode: cert-manager
issuerRef:
name: ca-cluster-issuer
kind: ClusterIssuer
# Node selection for database workloads
nodeSelector:
workload-type: "database"
# Tolerate database-dedicated nodes
tolerations:
- key: "dedicated"
operator: "Equal"
value: "database"
effect: "NoSchedule"
storage:
className: "fast-ssd"
size: "100Gi"
resources:
requests:
memory: "8Gi"
cpu: "2"
limits:
memory: "16Gi"
cpu: "4"This creates a cluster with 5 servers that will automatically organize to host databases with appropriate primary/secondary topologies.
If you have existing clusters without topology placement:
-
Add Soft Constraints First:
placement: antiAffinity: enabled: true type: "preferred"
-
Gradually Introduce Zone Spreading:
placement: topologySpread: enabled: true whenUnsatisfiable: "ScheduleAnyway"
-
Tighten Constraints After Validation:
- Change
preferredtorequired - Change
ScheduleAnywaytoDoNotSchedule
- Change