This guide provides a comprehensive overview of the Neo4j Enterprise Operator's architecture, design principles, and current implementation status as of August 2025.
The Neo4j Enterprise Operator follows cloud-native best practices with a focus on:
- Production Stability: Optimized reconciliation frequency and efficient resource management
- Performance: Intelligent rate limiting and status update optimization
- Server-Based Architecture: Unified server deployments with self-organizing roles
- Resource Efficiency: Centralized backup system (70% resource reduction)
- Observability: Comprehensive monitoring and operational insights
- Validation: Proactive resource validation and recommendations
The operator has evolved to use a unified server-based architecture where Neo4j servers self-organize into primary/secondary roles:
- Before: Separate primary/secondary StatefulSets with complex orchestration
- After: Single
{cluster-name}-serverStatefulSet with self-organizing servers - Benefit: Simplified resource management, improved scaling, reduced complexity
# Neo4jEnterpriseCluster topology
topology:
servers: 3 # Creates: my-cluster-server StatefulSet (replicas: 3)
# Pods: my-cluster-server-0, my-cluster-server-1, my-cluster-server-2# Neo4jEnterpriseStandalone deployment
# Creates: my-standalone StatefulSet (replicas: 1)
# Pod: my-standalone-0Major Efficiency Improvement: Replaced expensive per-pod backup sidecars with centralized backup architecture:
- Resource Efficiency: 100m CPU/256Mi memory per cluster vs N×200m CPU/512Mi per sidecar
- Resource Savings: ~70% reduction in backup-related resource usage
- Architecture: Single
{cluster-name}-backup-0StatefulSet per cluster - Connectivity: Connects to cluster via client service using Bolt protocol
- Neo4j 5.26+ Support: Modern backup syntax with automated path creation
The operator defines six core CRDs located in api/v1alpha1/:
- Purpose: High-availability clustered Neo4j Enterprise deployments
- Architecture: Server-based with
{cluster-name}-serverStatefulSet - Minimum Topology: 2+ servers (enforced by validation)
- Server Organization: Servers self-organize into primary/secondary roles for databases
- Scaling: Horizontal scaling supported with topology validation
- Discovery: LIST resolver with static pod FQDNs; V2_ONLY explicitly set for 5.26.x, implicit for 2025.x+
- Resource Pattern: Single StatefulSet replaces complex multi-StatefulSet architecture
Key Fields:
type Neo4jEnterpriseClusterSpec struct {
Image ImageSpec `json:"image"`
Topology TopologyConfiguration `json:"topology"` // servers: N
Storage StorageSpec `json:"storage"`
// ... additional fields
}- Purpose: Single-node Neo4j Enterprise deployments
- Architecture: Uses clustering infrastructure but fixed at 1 replica
- Use Cases: Development, testing, simple production workloads
- StatefulSet:
{standalone-name}(no "-server" suffix) - Configuration: Modern clustering approach with single member (Neo4j 5.26+)
- Restrictions: Cannot scale beyond 1 replica
- Purpose: Manages database lifecycle within clusters and standalone deployments
- Dual Support: Works with both Neo4jEnterpriseCluster and Neo4jEnterpriseStandalone
- Enhanced Validation: DatabaseValidator supports automatic deployment type detection
- Neo4j 5.26+ Syntax: Uses modern
TOPOLOGYclause for database creation - Standalone Fix: Added NEO4J_AUTH environment variable for automatic authentication
Key Features:
type Neo4jDatabaseSpec struct {
ClusterRef string `json:"clusterRef"` // References cluster OR standalone
Name string `json:"name"` // Database name
Topology DatabaseTopology `json:"topology"` // Primary/secondary counts
IfNotExists bool `json:"ifNotExists"` // CREATE IF NOT EXISTS
}- Purpose: Manages Neo4j plugin installation and configuration
- Dual Architecture Support: Enhanced for server-based cluster + standalone compatibility
- Deployment Detection: Automatic cluster vs standalone recognition
- Resource Naming: Handles
{cluster-name}-servervs{standalone-name}patterns - Plugin Sources: Official, community, custom registry, direct URL support
- Purpose: Manages backup operations for both clusters and standalone deployments
- Centralized Architecture: Uses single backup pod per cluster (not sidecars)
- Target Support: Can backup both cluster and standalone deployments
- Neo4j 5.26+ Support: Modern backup syntax with
--to-pathparameter
- Purpose: Manages database restoration from backups
- Point-in-Time Recovery: Supports
--restore-untilfor precise recovery - Cross-Deployment Support: Can restore to different deployment types
Primary cluster management controller with server-based architecture:
Performance Optimizations:
- Efficient Reconciliation: Reduced from ~18,000 to ~34 reconciliations per minute
- Smart Status Updates: Only updates when cluster state changes
- ConfigMap Debouncing: 2-minute debounce prevents restart loops
- Resource Version Conflict Handling: Retry logic for concurrent updates
Server-Based Implementation:
- Single StatefulSet: Creates
{cluster-name}-serverinstead of separate primary/secondary - Self-Organizing Servers: Neo4j servers automatically assign database hosting roles
- Simplified Resource Management: Unified pod templates and configuration
- Certificate DNS: Includes all server pod names in TLS certificates
Split-Brain Detection:
- Location:
internal/controller/splitbrain_detector.go - Multi-Pod Analysis: Connects to each server to compare cluster views
- Automatic Repair: Restarts orphaned pods to rejoin main cluster
- Production Ready: Comprehensive logging and fallback mechanisms
Single-node deployment controller:
Key Features:
- Clustering Infrastructure: Uses same infrastructure as clusters (Neo4j 5.26+ approach)
- Single Member Configuration: Sets up clustering with single server
- Resource Management: Handles ConfigMap, Service, and StatefulSet
- Status Tracking: Comprehensive status updates for standalone instances
Enhanced for dual deployment support:
- Automatic Detection: Tries cluster lookup first, then standalone fallback
- Neo4j Client Creation:
NewClientForEnterprise()vsNewClientForEnterpriseStandalone() - Authentication Handling: Manages NEO4J_AUTH for standalone deployments
- Syntax Support: Neo4j 5.26+ and 2025.x database creation syntax
Manages plugin lifecycle with architecture compatibility:
- DeploymentInfo Abstraction: Unified handling of cluster/standalone types
- Resource Naming: Correct StatefulSet names (
{cluster-name}-servervs{standalone-name}) - Pod Labels: Applies appropriate labels for each deployment type
- Plugin Sources: Official, community, custom registries, direct URLs
Centralized backup management:
- Architecture: Single backup StatefulSet per cluster
- Resource Efficiency: 70% reduction in backup resource usage
- Cross-Deployment Support: Backs up both clusters and standalone deployments
- Modern Syntax: Neo4j 5.26+ compatible backup commands
Database restoration management:
- Point-in-Time Recovery: Supports precise timestamp restoration
- Flexible Targets: Can restore to different deployment types
- Validation: Ensures target deployment compatibility
- TopologyValidator (
topology_validator.go): Cluster topology and server count validation - ClusterValidator (
cluster_validator.go): Cluster-specific configuration validation - MemoryValidator (
memory_validator.go): Neo4j memory settings vs container limits - ResourceValidator (
resource_validator.go): CPU, memory, and storage validation - TLSValidator (
tls_validator.go): TLS/SSL configuration validation - DatabaseValidator (
database_validator.go): Database creation and topology validation
- Dual CRD Validation: Separate validation rules for cluster vs standalone
- Server-Based Topology: Validates server counts instead of primary/secondary counts
- Resource Recommendations: Suggests optimal resource allocation
- Configuration Restrictions: Prevents clustering settings in standalone deployments
- Neo4j Version Compatibility: Validates settings against Neo4j 5.26+ and 2025.x
- Automatic Deployment Detection: Tries cluster first, then standalone
- Appropriate Client Creation: Uses correct client type for deployment
- Clear Error Messages: Distinguishes between cluster and standalone validation failures
- Neo4j 5.26.x: Last semver LTS release (5.26.0, 5.26.1, etc.) — no 5.27+ semver versions exist
- Neo4j 2025.x+: Calver format (2025.01.0, 2025.02.0, etc.)
| Setting | 5.26.x (SemVer) | 2025.x+ / 2026.x+ (CalVer) |
|---|---|---|
dbms.cluster.discovery.resolver_type |
LIST |
LIST |
dbms.cluster.discovery.version |
V2_ONLY (explicit) |
(omitted — V2 is only protocol) |
| Endpoints key | dbms.cluster.discovery.v2.endpoints |
dbms.cluster.endpoints |
| Endpoint port | 6000 (tcp-tx) | 6000 (tcp-tx) |
| Bootstrap hint | internal.dbms.cluster.discovery.system_bootstrapping_strategy=me/other |
(not used) |
Port 5000 (tcp-discovery) is the deprecated V1 discovery port — never used by this operator.
CalVer detection: ParseVersion() → IsCalver (major >= 2025) covers 2026.x+ automatically.
- Memory:
server.memory.*(not deprecateddbms.memory.*) - TLS/SSL:
server.https.*andserver.bolt.*(notdbms.connector.*) - Database Format:
db.format: "block"(not deprecated formats) - Discovery: managed entirely by operator startup script — do not set in
spec.config
CREATE DATABASE name [IF NOT EXISTS]
[TOPOLOGY n PRIMAR{Y|IES} [m SECONDAR{Y|IES}]]
[OPTIONS "{" option: value[, ...] "}"]
[WAIT [n [SEC[OND[S]]]]|NOWAIT]CREATE DATABASE name [IF NOT EXISTS]
[[SET] DEFAULT LANGUAGE CYPHER {5|25}]
[[SET] TOPOLOGY n PRIMARIES [m SECONDARIES]]
[OPTIONS "{" option: value[, ...] "}"]
[WAIT [n [SEC[OND[S]]]]|NOWAIT]- ClusterBuilder (
cluster.go): Server-based StatefulSet creation - StandaloneBuilder (
standalone.go): Single-node deployment resources - ConfigMapBuilder: Unified configuration for both deployment types
- ServiceBuilder: Client and discovery services
- BackupBuilder: Centralized backup StatefulSet
- StatefulSet Naming:
{cluster-name}-serverfor clusters,{standalone-name}for standalone - Pod Naming:
{cluster-name}-server-0,{cluster-name}-server-1, etc. - Service Names:
{cluster-name}-client,{cluster-name}-discovery - Backup Resources:
{cluster-name}-backup-0(centralized)
- Rate Limiting: Intelligent rate limiting prevents API server overload
- Status Update Efficiency: Only updates when state actually changes
- Event Filtering: Reduces unnecessary reconciliation triggers
- ConfigMap Hashing: Hash-based change detection prevents unnecessary updates
- Parallel Pod Management: All server pods start simultaneously
- Minimum Primaries = 1: First pod forms cluster immediately
- PublishNotReadyAddresses: Discovery includes pending pods
- Resource Version Conflict Retry: Handles concurrent updates gracefully
- Principle of Least Privilege: Minimal required permissions
- ClusterRole Design: Cross-namespace operations support
- Service Account Security: Dedicated accounts with specific roles
Each cluster gets automatic RBAC creation:
- ServiceAccount:
{cluster-name}-discovery - Role: Services and endpoints permissions
- RoleBinding: Links account to role
- Endpoints Permission: CRITICAL for cluster formation
- Cert-Manager Integration: Automatic certificate provisioning
- SSL Policy Configuration: Separate policies for
https,bolt, andclusterscopes - Trust All for Cluster:
dbms.ssl.policy.cluster.trust_all=truefor formation - Certificate DNS Names: Includes all server pod names
- ResourceMonitor (
resource_monitor.go): Real-time utilization tracking - Performance Metrics: Controller performance and reconciliation efficiency
- Operational Insights: ConfigMap update patterns and debounce effectiveness
- Enhanced Status Updates: Detailed cluster state tracking
- Condition Management: Comprehensive status conditions with proper transitions
- Event Recording: Structured events for debugging and monitoring
- Connection Examples: Automatic generation of connection strings
- Cert-Manager: TLS certificate lifecycle management
- Prometheus: Metrics collection and alerting
- External Secrets: Secret management integration
- Storage Classes: Persistent volume provisioning
- Cloud Providers: AWS, GCP, Azure LoadBalancer optimizations
- Network Policies: Pod-to-pod communication security
- Service Mesh: Istio/Linkerd compatibility
- Ingress Controllers: External traffic routing with connection examples
- Node Affinity: Topology spread and anti-affinity rules
- Unit Tests: Controller logic and helper functions
- Integration Tests: Full workflow testing with envtest
- End-to-End Tests: Real cluster testing with Kind
- Performance Tests: Reconciliation efficiency validation
- Ginkgo/Gomega: BDD-style testing framework
- Envtest: Kubernetes API server for integration testing
- Kind Clusters: Development and test cluster automation
- Test Cleanup: Automatic finalizer removal and namespace cleanup
- Backward Compatibility: Existing clusters continue to work
- Gradual Migration: No breaking changes for existing deployments
- Resource Name Updates: New deployments use server-based naming
- Configuration Migration: Automatic handling of deprecated settings
- Plugin System: Neo4j plugin management framework
- Custom Metrics: Extensible monitoring capabilities
- Event Handling: Pluggable event system for custom integrations
- Multi-Architecture: Support for different deployment patterns
- Controller Pattern: Standard Kubernetes controller pattern
- Builder Pattern: Resource builders for clean separation
- Validation Framework: Centralized validation with clear error messages
- Testing Strategy: Comprehensive test coverage with multiple levels
- Memory Usage: Optimized for large-scale deployments
- API Efficiency: Minimal API calls with intelligent caching
- Resource Creation: Parallel resource creation where possible
- Error Handling: Graceful error handling with proper recovery
This architecture provides a solid foundation for managing Neo4j Enterprise deployments in Kubernetes with high performance, reliability, and operational simplicity.