Skip to content

Latest commit

 

History

History
369 lines (290 loc) · 16.6 KB

File metadata and controls

369 lines (290 loc) · 16.6 KB

Architecture Overview

This guide provides a comprehensive overview of the Neo4j Enterprise Operator's architecture, design principles, and current implementation status as of August 2025.

Core Design Principles

The Neo4j Enterprise Operator follows cloud-native best practices with a focus on:

  • Production Stability: Optimized reconciliation frequency and efficient resource management
  • Performance: Intelligent rate limiting and status update optimization
  • Server-Based Architecture: Unified server deployments with self-organizing roles
  • Resource Efficiency: Centralized backup system (70% resource reduction)
  • Observability: Comprehensive monitoring and operational insights
  • Validation: Proactive resource validation and recommendations

Current Architecture (August 2025)

Server-Based Architecture

The operator has evolved to use a unified server-based architecture where Neo4j servers self-organize into primary/secondary roles:

Key Changes from Legacy Architecture

  • Before: Separate primary/secondary StatefulSets with complex orchestration
  • After: Single {cluster-name}-server StatefulSet with self-organizing servers
  • Benefit: Simplified resource management, improved scaling, reduced complexity

Current Implementation

# Neo4jEnterpriseCluster topology
topology:
  servers: 3  # Creates: my-cluster-server StatefulSet (replicas: 3)
  # Pods: my-cluster-server-0, my-cluster-server-1, my-cluster-server-2
# Neo4jEnterpriseStandalone deployment
# Creates: my-standalone StatefulSet (replicas: 1)
# Pod: my-standalone-0

Centralized Backup System

Major Efficiency Improvement: Replaced expensive per-pod backup sidecars with centralized backup architecture:

  • Resource Efficiency: 100m CPU/256Mi memory per cluster vs N×200m CPU/512Mi per sidecar
  • Resource Savings: ~70% reduction in backup-related resource usage
  • Architecture: Single {cluster-name}-backup-0 StatefulSet per cluster
  • Connectivity: Connects to cluster via client service using Bolt protocol
  • Neo4j 5.26+ Support: Modern backup syntax with automated path creation

Custom Resource Definitions (CRDs)

The operator defines six core CRDs located in api/v1alpha1/:

Core Deployment CRDs

Neo4jEnterpriseCluster (neo4jenterprisecluster_types.go)

  • Purpose: High-availability clustered Neo4j Enterprise deployments
  • Architecture: Server-based with {cluster-name}-server StatefulSet
  • Minimum Topology: 2+ servers (enforced by validation)
  • Server Organization: Servers self-organize into primary/secondary roles for databases
  • Scaling: Horizontal scaling supported with topology validation
  • Discovery: V2_ONLY mode exclusively for Neo4j 5.26+ and 2025.x
  • Resource Pattern: Single StatefulSet replaces complex multi-StatefulSet architecture

Key Fields:

type Neo4jEnterpriseClusterSpec struct {
    Image    ImageSpec              `json:"image"`
    Topology TopologyConfiguration  `json:"topology"`  // servers: N
    Storage  StorageSpec           `json:"storage"`
    // ... additional fields
}

Neo4jEnterpriseStandalone (neo4jenterprisestandalone_types.go)

  • Purpose: Single-node Neo4j Enterprise deployments
  • Architecture: Uses clustering infrastructure but fixed at 1 replica
  • Use Cases: Development, testing, simple production workloads
  • StatefulSet: {standalone-name} (no "-server" suffix)
  • Configuration: Modern clustering approach with single member (Neo4j 5.26+)
  • Restrictions: Cannot scale beyond 1 replica

Database Management CRDs

Neo4jDatabase (neo4jdatabase_types.go)

  • Purpose: Manages database lifecycle within clusters and standalone deployments
  • Dual Support: Works with both Neo4jEnterpriseCluster and Neo4jEnterpriseStandalone
  • Enhanced Validation: DatabaseValidator supports automatic deployment type detection
  • Neo4j 5.26+ Syntax: Uses modern TOPOLOGY clause for database creation
  • Standalone Fix: Added NEO4J_AUTH environment variable for automatic authentication

Key Features:

type Neo4jDatabaseSpec struct {
    ClusterRef string           `json:"clusterRef"`     // References cluster OR standalone
    Name       string           `json:"name"`           // Database name
    Topology   DatabaseTopology `json:"topology"`       // Primary/secondary counts
    IfNotExists bool            `json:"ifNotExists"`    // CREATE IF NOT EXISTS
}

Neo4jPlugin (neo4jplugin_types.go)

  • Purpose: Manages Neo4j plugin installation and configuration
  • Dual Architecture Support: Enhanced for server-based cluster + standalone compatibility
  • Deployment Detection: Automatic cluster vs standalone recognition
  • Resource Naming: Handles {cluster-name}-server vs {standalone-name} patterns
  • Plugin Sources: Official, community, custom registry, direct URL support

Backup & Restore CRDs

Neo4jBackup (neo4jbackup_types.go)

  • Purpose: Manages backup operations for both clusters and standalone deployments
  • Centralized Architecture: Uses single backup pod per cluster (not sidecars)
  • Target Support: Can backup both cluster and standalone deployments
  • Neo4j 5.26+ Support: Modern backup syntax with --to-path parameter

Neo4jRestore (neo4jrestore_types.go)

  • Purpose: Manages database restoration from backups
  • Point-in-Time Recovery: Supports --restore-until for precise recovery
  • Cross-Deployment Support: Can restore to different deployment types

Controllers Architecture

Core Controllers (internal/controller/)

Neo4jEnterpriseCluster Controller (neo4jenterprisecluster_controller.go)

Primary cluster management controller with server-based architecture:

Performance Optimizations:

  • Efficient Reconciliation: Reduced from ~18,000 to ~34 reconciliations per minute
  • Smart Status Updates: Only updates when cluster state changes
  • ConfigMap Debouncing: 2-minute debounce prevents restart loops
  • Resource Version Conflict Handling: Retry logic for concurrent updates

Server-Based Implementation:

  • Single StatefulSet: Creates {cluster-name}-server instead of separate primary/secondary
  • Self-Organizing Servers: Neo4j servers automatically assign database hosting roles
  • Simplified Resource Management: Unified pod templates and configuration
  • Certificate DNS: Includes all server pod names in TLS certificates

Split-Brain Detection:

  • Location: internal/controller/splitbrain_detector.go
  • Multi-Pod Analysis: Connects to each server to compare cluster views
  • Automatic Repair: Restarts orphaned pods to rejoin main cluster
  • Production Ready: Comprehensive logging and fallback mechanisms

Neo4jEnterpriseStandalone Controller (neo4jenterprisestandalone_controller.go)

Single-node deployment controller:

Key Features:

  • Clustering Infrastructure: Uses same infrastructure as clusters (Neo4j 5.26+ approach)
  • Single Member Configuration: Sets up clustering with single server
  • Resource Management: Handles ConfigMap, Service, and StatefulSet
  • Status Tracking: Comprehensive status updates for standalone instances

Database Controller (neo4jdatabase_controller.go)

Enhanced for dual deployment support:

  • Automatic Detection: Tries cluster lookup first, then standalone fallback
  • Neo4j Client Creation: NewClientForEnterprise() vs NewClientForEnterpriseStandalone()
  • Authentication Handling: Manages NEO4J_AUTH for standalone deployments
  • Syntax Support: Neo4j 5.26+ and 2025.x database creation syntax

Plugin Controller (plugin_controller.go)

Manages plugin lifecycle with architecture compatibility:

  • DeploymentInfo Abstraction: Unified handling of cluster/standalone types
  • Resource Naming: Correct StatefulSet names ({cluster-name}-server vs {standalone-name})
  • Pod Labels: Applies appropriate labels for each deployment type
  • Plugin Sources: Official, community, custom registries, direct URLs

Backup Controller (neo4jbackup_controller.go)

Centralized backup management:

  • Architecture: Single backup StatefulSet per cluster
  • Resource Efficiency: 70% reduction in backup resource usage
  • Cross-Deployment Support: Backs up both clusters and standalone deployments
  • Modern Syntax: Neo4j 5.26+ compatible backup commands

Restore Controller (neo4jrestore_controller.go)

Database restoration management:

  • Point-in-Time Recovery: Supports precise timestamp restoration
  • Flexible Targets: Can restore to different deployment types
  • Validation: Ensures target deployment compatibility

Validation Framework (internal/validation/)

Comprehensive Validation Architecture

Core Validators:

  • TopologyValidator (topology_validator.go): Cluster topology and server count validation
  • ClusterValidator (cluster_validator.go): Cluster-specific configuration validation
  • MemoryValidator (memory_validator.go): Neo4j memory settings vs container limits
  • ResourceValidator (resource_validator.go): CPU, memory, and storage validation
  • TLSValidator (tls_validator.go): TLS/SSL configuration validation
  • DatabaseValidator (database_validator.go): Database creation and topology validation

Enhanced Validation Features:

  • Dual CRD Validation: Separate validation rules for cluster vs standalone
  • Server-Based Topology: Validates server counts instead of primary/secondary counts
  • Resource Recommendations: Suggests optimal resource allocation
  • Configuration Restrictions: Prevents clustering settings in standalone deployments
  • Neo4j Version Compatibility: Validates settings against Neo4j 5.26+ and 2025.x

Database Validator Enhancements

  • Automatic Deployment Detection: Tries cluster first, then standalone
  • Appropriate Client Creation: Uses correct client type for deployment
  • Clear Error Messages: Distinguishes between cluster and standalone validation failures

Neo4j Version Compatibility

Supported Versions

  • Neo4j 5.26+: Semver format (5.26.0, 5.27.1, etc.)
  • Neo4j 2025.x+: Calver format (2025.01.0, 2025.02.0, etc.)

Version-Specific Configuration

Discovery Configuration:

  • Neo4j 5.26.x: dbms.kubernetes.discovery.v2.service_port_name=tcp-discovery
  • Neo4j 2025.x: dbms.kubernetes.discovery.service_port_name=tcp-discovery
  • Auto-Detection: getKubernetesDiscoveryParameter() in cluster.go

Modern Configuration Standards:

  • Memory: server.memory.* (not deprecated dbms.memory.*)
  • TLS/SSL: server.https.* and server.bolt.* (not dbms.connector.*)
  • Database Format: db.format: "block" (not deprecated formats)
  • Discovery: dbms.cluster.discovery.resolver_type: "K8S"

Database Creation Syntax

Neo4j 5.26+ (Cypher 5):

CREATE DATABASE name [IF NOT EXISTS]
[TOPOLOGY n PRIMAR{Y|IES} [m SECONDAR{Y|IES}]]
[OPTIONS "{" option: value[, ...] "}"]
[WAIT [n [SEC[OND[S]]]]|NOWAIT]

Neo4j 2025.x (Cypher 25):

CREATE DATABASE name [IF NOT EXISTS]
[[SET] DEFAULT LANGUAGE CYPHER {5|25}]
[[SET] TOPOLOGY n PRIMARIES [m SECONDARIES]]
[OPTIONS "{" option: value[, ...] "}"]
[WAIT [n [SEC[OND[S]]]]|NOWAIT]

Resource Management Architecture

Intelligent Resource Handling

Resource Builders (internal/resources/):

  • ClusterBuilder (cluster.go): Server-based StatefulSet creation
  • StandaloneBuilder (standalone.go): Single-node deployment resources
  • ConfigMapBuilder: Unified configuration for both deployment types
  • ServiceBuilder: Client and discovery services
  • BackupBuilder: Centralized backup StatefulSet

Server-Based Resource Patterns:

  • StatefulSet Naming: {cluster-name}-server for clusters, {standalone-name} for standalone
  • Pod Naming: {cluster-name}-server-0, {cluster-name}-server-1, etc.
  • Service Names: {cluster-name}-client, {cluster-name}-discovery
  • Backup Resources: {cluster-name}-backup-0 (centralized)

Performance Optimizations

Reconciliation Efficiency:

  • Rate Limiting: Intelligent rate limiting prevents API server overload
  • Status Update Efficiency: Only updates when state actually changes
  • Event Filtering: Reduces unnecessary reconciliation triggers
  • ConfigMap Hashing: Hash-based change detection prevents unnecessary updates

Startup Optimization:

  • Parallel Pod Management: All server pods start simultaneously
  • Minimum Primaries = 1: First pod forms cluster immediately
  • PublishNotReadyAddresses: Discovery includes pending pods
  • Resource Version Conflict Retry: Handles concurrent updates gracefully

Security Architecture

RBAC Configuration (config/rbac/)

Core RBAC Resources:

  • Principle of Least Privilege: Minimal required permissions
  • ClusterRole Design: Cross-namespace operations support
  • Service Account Security: Dedicated accounts with specific roles

Discovery RBAC (Critical):

Each cluster gets automatic RBAC creation:

  • ServiceAccount: {cluster-name}-discovery
  • Role: Services and endpoints permissions
  • RoleBinding: Links account to role
  • Endpoints Permission: CRITICAL for cluster formation

TLS/SSL Support:

  • Cert-Manager Integration: Automatic certificate provisioning
  • SSL Policy Configuration: Separate policies for https, bolt, and cluster scopes
  • Trust All for Cluster: dbms.ssl.policy.cluster.trust_all=true for formation
  • Certificate DNS Names: Includes all server pod names

Monitoring & Observability

Resource Monitoring (internal/monitoring/):

  • ResourceMonitor (resource_monitor.go): Real-time utilization tracking
  • Performance Metrics: Controller performance and reconciliation efficiency
  • Operational Insights: ConfigMap update patterns and debounce effectiveness

Status Management:

  • Enhanced Status Updates: Detailed cluster state tracking
  • Condition Management: Comprehensive status conditions with proper transitions
  • Event Recording: Structured events for debugging and monitoring
  • Connection Examples: Automatic generation of connection strings

Integration Architecture

External System Integration:

  • Cert-Manager: TLS certificate lifecycle management
  • Prometheus: Metrics collection and alerting
  • External Secrets: Secret management integration
  • Storage Classes: Persistent volume provisioning
  • Cloud Providers: AWS, GCP, Azure LoadBalancer optimizations

Kubernetes Integration:

  • Network Policies: Pod-to-pod communication security
  • Service Mesh: Istio/Linkerd compatibility
  • Ingress Controllers: External traffic routing with connection examples
  • Node Affinity: Topology spread and anti-affinity rules

Testing Architecture

Test Strategy:

  • Unit Tests: Controller logic and helper functions
  • Integration Tests: Full workflow testing with envtest
  • End-to-End Tests: Real cluster testing with Kind
  • Performance Tests: Reconciliation efficiency validation

Test Infrastructure:

  • Ginkgo/Gomega: BDD-style testing framework
  • Envtest: Kubernetes API server for integration testing
  • Kind Clusters: Development and test cluster automation
  • Test Cleanup: Automatic finalizer removal and namespace cleanup

Migration & Compatibility

Legacy Architecture Migration:

  • Backward Compatibility: Existing clusters continue to work
  • Gradual Migration: No breaking changes for existing deployments
  • Resource Name Updates: New deployments use server-based naming
  • Configuration Migration: Automatic handling of deprecated settings

Future Extensibility:

  • Plugin System: Neo4j plugin management framework
  • Custom Metrics: Extensible monitoring capabilities
  • Event Handling: Pluggable event system for custom integrations
  • Multi-Architecture: Support for different deployment patterns

Development Best Practices

Code Organization:

  • Controller Pattern: Standard Kubernetes controller pattern
  • Builder Pattern: Resource builders for clean separation
  • Validation Framework: Centralized validation with clear error messages
  • Testing Strategy: Comprehensive test coverage with multiple levels

Performance Considerations:

  • Memory Usage: Optimized for large-scale deployments
  • API Efficiency: Minimal API calls with intelligent caching
  • Resource Creation: Parallel resource creation where possible
  • Error Handling: Graceful error handling with proper recovery

This architecture provides a solid foundation for managing Neo4j Enterprise deployments in Kubernetes with high performance, reliability, and operational simplicity.