Testing Guide

This guide explains the comprehensive testing strategy for the Neo4j Enterprise Operator, covering unit tests, integration tests, and end-to-end testing practices.

Testing Strategy Overview

The operator uses a multi-layered testing approach:

Unit Tests: Fast tests for individual functions and components
Integration Tests: Full workflow testing with Kubernetes API server
End-to-End Tests: Real cluster testing with Kind clusters
Performance Tests: Reconciliation efficiency and resource usage validation

Test Infrastructure

Testing Framework

Ginkgo/Gomega: BDD-style testing framework for integration tests
Envtest: Kubernetes API server for integration testing
Kind: Kubernetes in Docker for real cluster testing
Go Testing: Standard Go testing for unit tests

Test Environments

Development: neo4j-operator-dev Kind cluster
Testing: neo4j-operator-test Kind cluster
CI/CD: Automated testing in GitHub Actions

Unit Tests

Unit tests are fast, require no Kubernetes cluster, and test individual functions and components.

Running Unit Tests

# Run all unit tests (no cluster required)
make test-unit

# Run specific package tests
go test ./internal/controller -v
go test ./internal/validation -v
go test ./api/v1alpha1 -v

# Run specific test functions
go test ./internal/controller -run TestGetStatefulSetName -v
go test ./internal/validation -run TestTopologyValidator -v

Unit Test Structure

Unit tests are located alongside the code they test:

internal/controller/
├── neo4jenterprisecluster_controller.go
├── neo4jenterprisecluster_controller_test.go
├── plugin_controller.go
├── plugin_controller_unit_test.go        # Unexported method tests
└── plugin_controller_test.go             # Integration-style tests

Writing Unit Tests

func TestGetStatefulSetName(t *testing.T) {
    r := &Neo4jPluginReconciler{}

    tests := []struct {
        name       string
        deployment *DeploymentInfo
        expected   string
    }{
        {
            name: "cluster deployment",
            deployment: &DeploymentInfo{
                Type: "cluster",
                Name: "my-cluster",
            },
            expected: "my-cluster-server",
        },
        // Add more test cases...
    }

    for _, tt := range tests {
        t.Run(tt.name, func(t *testing.T) {
            result := r.getStatefulSetName(tt.deployment)
            assert.Equal(t, tt.expected, result)
        })
    }
}

Integration Tests

Integration tests use envtest to provide a real Kubernetes API server without requiring a full cluster.

Test Cluster Management

# Create test cluster (includes cert-manager for TLS tests)
make test-cluster

# Clean operator resources (keep cluster running)
make test-cluster-clean

# Reset cluster (delete and recreate)
make test-cluster-reset

# Delete test cluster entirely
make test-cluster-delete

# Complete test environment cleanup
make test-destroy

Running Integration Tests

# Full integration test suite (automatically creates cluster and deploys operator)
make test-integration

# Alternative: step-by-step approach
make test-cluster         # Create test cluster
make test-integration     # Run tests (uses existing cluster)
make test-cluster-delete  # Clean up cluster

# Run specific test suites
ginkgo run -focus "Neo4jEnterpriseCluster" ./test/integration
ginkgo run -focus "should create backup" ./test/integration
ginkgo run -focus "Plugin Installation" ./test/integration

# CI-optimized test commands (for advanced use)
make test-integration-ci     # Assumes cluster and operator already deployed
make test-integration-ci-full # Full suite in CI environment

Integration Test Structure

Integration tests are located in test/integration/ and follow consistent patterns:

var _ = Describe("Neo4jPlugin Integration Tests", func() {
    const (
        timeout  = time.Second * 300  // 5-minute timeout for CI
        interval = time.Second * 5
    )

    Context("Plugin Installation on Cluster", func() {
        It("Should install APOC plugin on Neo4jEnterpriseCluster", func() {
            ctx := context.Background()
            namespace := createUniqueNamespace()

            By("Creating namespace")
            Expect(k8sClient.Create(ctx, namespace)).Should(Succeed())

            By("Creating admin secret")
            // Create required secrets...

            By("Creating Neo4jEnterpriseCluster")
            cluster := &neo4jv1alpha1.Neo4jEnterpriseCluster{
                ObjectMeta: metav1.ObjectMeta{
                    Name:      "plugin-test-cluster",
                    Namespace: namespace.Name,
                },
                Spec: neo4jv1alpha1.Neo4jEnterpriseClusterSpec{
                    Image: neo4jv1alpha1.ImageSpec{
                        Repo: "neo4j",
                        Tag:  "5.26.0-enterprise",
                    },
                    Topology: neo4jv1alpha1.TopologyConfiguration{
                        Servers: 2,
                    },
                    // Resource constraints for CI compatibility
                    Resources: &corev1.ResourceRequirements{
                        Requests: corev1.ResourceList{
                            corev1.ResourceCPU:    resource.MustParse("100m"),
                            corev1.ResourceMemory: resource.MustParse("1.5Gi"),
                        },
                        Limits: corev1.ResourceList{
                            corev1.ResourceCPU:    resource.MustParse("500m"),
                            corev1.ResourceMemory: resource.MustParse("1.5Gi"),
                        },
                    },
                    Storage: neo4jv1alpha1.StorageSpec{
                        Size:      "1Gi",
                        ClassName: "standard",
                    },
                },
            }
            Expect(k8sClient.Create(ctx, cluster)).Should(Succeed())

            By("Waiting for cluster to be ready")
            Eventually(func() string {
                currentCluster := &neo4jv1alpha1.Neo4jEnterpriseCluster{}
                err := k8sClient.Get(ctx, types.NamespacedName{
                    Name:      "plugin-test-cluster",
                    Namespace: namespace.Name,
                }, currentCluster)
                if err != nil {
                    return ""
                }
                return currentCluster.Status.Phase
            }, timeout, interval).Should(Equal("Ready"))

            // Continue with plugin testing...
        })
    })
})

Current Architecture Testing (August 2025)

Server-Based Architecture Tests

Tests verify the new server-based architecture:

By("Verifying server StatefulSet exists with correct name")
serverSts := &appsv1.StatefulSet{}
Eventually(func() error {
    return k8sClient.Get(ctx, types.NamespacedName{
        Name:      clusterName + "-server",  // Server-based naming
        Namespace: namespace.Name,
    }, serverSts)
}, timeout, interval).Should(Succeed())
Expect(*serverSts.Spec.Replicas).To(Equal(int32(2)))

Centralized Backup Testing

Tests verify centralized backup architecture:

By("Verifying centralized backup StatefulSet")
backupSts := &appsv1.StatefulSet{}
Eventually(func() error {
    return k8sClient.Get(ctx, types.NamespacedName{
        Name:      clusterName + "-backup",  // Centralized backup
        Namespace: namespace.Name,
    }, backupSts)
}, timeout, interval).Should(Succeed())
Expect(*backupSts.Spec.Replicas).To(Equal(int32(1)))  // Single backup pod

Dual Deployment Support Testing

Tests verify both cluster and standalone support:

Context("Plugin Installation on Standalone", func() {
    It("Should install GDS plugin on Neo4jEnterpriseStandalone", func() {
        // Test standalone deployment with plugin installation
        standalone := &neo4jv1alpha1.Neo4jEnterpriseStandalone{
            ObjectMeta: metav1.ObjectMeta{
                Name:      standaloneName,
                Namespace: namespace.Name,
            },
            Spec: neo4jv1alpha1.Neo4jEnterpriseStandaloneSpec{
                Image: neo4jv1alpha1.ImageSpec{
                    Repo: "neo4j",
                    Tag:  "5.26.0-enterprise",
                },
                // Standalone-specific configuration...
            },
        }
        // Test plugin installation on standalone...
    })
})

Test Configuration Guidelines

Resource Requirements for CI

All integration tests use minimal resources to avoid CI scheduling issues:

resources:
  requests:
    cpu: "100m"           # Minimal CPU for CI compatibility
    memory: "1.5Gi"       # Required for Neo4j Enterprise database operations
  limits:
    cpu: "500m"           # Reasonable limit for testing
    memory: "1.5Gi"       # Neo4j Enterprise minimum for database operations

Storage Configuration

storage:
  size: "1Gi"            # Minimal size for testing
  className: "standard"  # Default storage class in Kind

Timeout Configuration

const (
    timeout  = time.Second * 300  // 5-minute timeout for CI environments
    interval = time.Second * 5    // Check every 5 seconds
)

Resource Cleanup Patterns

Critical Cleanup Requirements

Proper resource cleanup is critical to prevent CI failures and resource exhaustion:

1. MANDATORY AfterEach Pattern

All integration tests MUST include AfterEach blocks to prevent resource leaks:

AfterEach(func() {
    // Critical: Clean up resources immediately to prevent CI resource exhaustion
    if cluster != nil {
        By("Cleaning up cluster resource")
        // Remove finalizers first
        if len(cluster.GetFinalizers()) > 0 {
            cluster.SetFinalizers([]string{})
            _ = k8sClient.Update(ctx, cluster)
        }
        // Delete the resource
        _ = k8sClient.Delete(ctx, cluster)
        cluster = nil
    }
    // Clean up any remaining resources in namespace
    if testNamespace != "" {
        cleanupCustomResourcesInNamespace(testNamespace)
    }
})

Why This Pattern Is Critical:

Prevents resource accumulation that causes "Insufficient memory" errors
Ensures cleanup even if tests fail (inline cleanup won't run on failure)
Removes finalizers to prevent resources stuck in Terminating state
Cleans namespace resources that might not have owner references

2. Common Mistakes to Avoid

❌ No AfterEach block - Causes resource leaks if tests fail ❌ Inline cleanup only - Won't execute if test panics or fails ❌ Missing namespace cleanup - Leaves behind ConfigMaps, Services, etc. ❌ Not removing finalizers - Resources stay in Terminating state ❌ Relying on test suite cleanup - Not sufficient for resource-intensive tests

2. Handle All Resource Types

Clean up all resources that might have finalizers:

// Neo4j resources
Expect(k8sClient.Delete(ctx, cluster)).Should(Succeed())
Expect(k8sClient.Delete(ctx, standalone)).Should(Succeed())
Expect(k8sClient.Delete(ctx, plugin)).Should(Succeed())
Expect(k8sClient.Delete(ctx, database)).Should(Succeed())
Expect(k8sClient.Delete(ctx, backup)).Should(Succeed())

// Kubernetes resources (usually auto-cleaned by owner references)
// PVCs, Services, StatefulSets are cleaned automatically

3. Use Helper Functions

// Helper function to create unique namespace
func createUniqueNamespace() *corev1.Namespace {
    return &corev1.Namespace{
        ObjectMeta: metav1.ObjectMeta{
            Name: fmt.Sprintf("test-%d", time.Now().UnixNano()),
        },
    }
}

Test Suite Cleanup Helpers

The integration test suite provides cleanup utilities:

// Clean up all custom resources in namespace
cleanupCustomResourcesInNamespace(namespace)

// Force remove finalizers if needed
forceRemoveFinalizers(resource)

Testing Best Practices

Resource Naming Patterns

Test resources should use predictable naming:

// Cluster naming
clusterName := "test-cluster-" + unique-suffix

// Expected StatefulSet names (server-based architecture)
expectedServerSts := clusterName + "-server"
expectedBackupSts := clusterName + "-backup"

// Standalone naming
standaloneName := "test-standalone-" + unique-suffix
expectedStandaloneSts := standaloneName  // No suffix for standalone

Memory Requirements

Critical for Neo4j Enterprise: Tests must allocate sufficient memory:

Resources: &corev1.ResourceRequirements{
    Requests: corev1.ResourceList{
        corev1.ResourceMemory: resource.MustParse("1.5Gi"),  // MINIMUM for Enterprise
    },
    Limits: corev1.ResourceList{
        corev1.ResourceMemory: resource.MustParse("1.5Gi"),  // Prevent OOMKill
    },
},

Why 1.5Gi is required:

Neo4j Enterprise needs minimum memory for database operations
Lower values cause Out of Memory kills (exit code 137)
Database creation and topology operations fail with insufficient memory

Waiting Patterns

Cluster Readiness (Condition-Based)

Eventually(func() string {
    cluster := &neo4jv1alpha1.Neo4jEnterpriseCluster{}
    err := k8sClient.Get(ctx, clusterKey, cluster)
    if err != nil {
        return ""
    }
    return cluster.Status.Phase
}, timeout, interval).Should(Equal("Ready"))

Standalone Readiness (Boolean-Based)

Eventually(func() bool {
    standalone := &neo4jv1alpha1.Neo4jEnterpriseStandalone{}
    err := k8sClient.Get(ctx, standaloneKey, standalone)
    if err != nil {
        return false
    }
    return standalone.Status.Ready
}, timeout, interval).Should(BeTrue())

Neo4j Cluster Formation Verification

By("Verifying Neo4j cluster formation")
Eventually(func() error {
    // Connect to first server and check cluster status
    return exec.Command("kubectl", "exec",
        clusterName+"-server-0", "--",
        "cypher-shell", "-u", "neo4j", "-p", password,
        "SHOW SERVERS").Run()
}, timeout, interval).Should(Succeed())

Performance Testing

Reconciliation Efficiency Tests

It("Should maintain efficient reconciliation rates", func() {
    // Monitor reconciliation frequency
    // Verify <100 reconciliations per minute under normal conditions
})

Resource Usage Tests

It("Should use optimal resource patterns", func() {
    // Verify centralized backup uses <30% resources of sidecar approach
    // Check server-based StatefulSet efficiency
})

CI/CD Testing

GitHub Actions Integration

Tests run automatically in CI with:

Parallel Execution: Multiple test suites run concurrently
Resource Constraints: CI-optimized resource limits
Timeout Handling: Extended timeouts for image pull delays
Cleanup Automation: Automatic test environment cleanup

CI-Specific Configuration

# Environment variables for CI
export CI=true
export KUBEBUILDER_ASSETS="$(pwd)/bin/k8s/1.31.0-linux-amd64"
export KUBECONFIG=~/.kube/config

Troubleshooting Test Failures

Common Test Issues

1. Namespace Stuck in Terminating

Symptoms: Test namespaces remain in "Terminating" state indefinitely

Diagnosis:

# Check for resources with finalizers
kubectl get all,neo4jenterpriseclusters,neo4jenterprisestandalones,neo4jplugins -n <namespace> -o yaml | grep -A5 finalizers

# Check for PVCs
kubectl get pvc -n <namespace>

Solutions:

# Force cleanup test resources
make test-cluster-clean

# Reset test cluster entirely
make test-cluster-reset

# Manual finalizer removal (if needed)
kubectl patch neo4jenterprisecluster <name> -n <namespace> \
  -p '{"metadata":{"finalizers":[]}}' --type=merge

2. Out of Memory (OOMKilled) Failures

Symptoms: Pods exit with code 137, "OOMKilled" in pod status

Diagnosis:

# Check pod status
kubectl describe pod <pod-name> | grep -E "(OOMKilled|Memory|Exit.*137)"

# Monitor memory usage
kubectl top pod <pod-name> --containers

Solutions:

Increase memory limits to minimum 1.5Gi for Neo4j Enterprise
Reduce concurrent test execution
Use minimal storage and CPU allocations

3. Test Timeouts

Symptoms: Tests fail with "Timed out after 300s"

Common Causes:

Image pull delays in CI environments
Insufficient resources for cluster formation
Missing RBAC permissions

Solutions:

# Check operator status
kubectl get pods -n neo4j-operator

# Check operator logs
kubectl logs -n neo4j-operator deployment/neo4j-operator-controller-manager

# Verify cert-manager (required for TLS tests)
kubectl get pods -n cert-manager

# Check cluster formation
kubectl get events --sort-by='.firstTimestamp'

4. Ginkgo Test Suite Conflicts

Symptoms: "Ginkgo does not support rerunning suites" error

Cause: Multiple RunSpecs() calls in same package

Solution: Ensure only one test suite per package:

// Correct: One RunSpecs per package
func TestControllers(t *testing.T) {
    RegisterFailHandler(Fail)
    RunSpecs(t, "Controller Suite")
}

// Include all tests in the same suite via Describe blocks

Test Coverage and Quality

Coverage Targets

# Generate coverage report
make test-coverage

# View coverage in browser
go tool cover -html=coverage.out

Coverage Goals

Unit Tests: >80% coverage for controller logic
Integration Tests: All major workflows covered
E2E Tests: Critical user journeys verified

Quality Checks

Integration tests should verify:

Resource Creation: All expected Kubernetes resources created
Status Updates: Proper status conditions and phase transitions
Error Handling: Graceful handling of failure scenarios
Resource Cleanup: Proper finalizer handling and cleanup
Performance: Efficient reconciliation and resource usage

CI Workflow Emulation for Troubleshooting (Added 2025-08-22)

When encountering CI failures or testing memory-constrained environments, use the comprehensive CI workflow emulation:

Quick Start

# Emulate complete CI workflow with debug logging
make test-ci-local

What It Does

The test-ci-local target provides a complete emulation of the GitHub Actions CI workflow:

Environment Setup
- Sets CI=true GITHUB_ACTIONS=true environment variables
- Creates logs/ directory for comprehensive debug output
- Cleans up any previous test environment
Unit Test Phase
- Runs unit tests with CI environment variables
- Logs Go version, kubectl version, and environment details
- Saves output to logs/ci-local-unit.log
Integration Test Phase
- Creates test cluster with CI-appropriate resource constraints
- Deploys Neo4j operator
- Runs integration tests with 512Mi memory limits (same as CI)
- Saves output to logs/ci-local-integration.log
Cleanup Phase
- Complete environment destruction
- Saves cleanup output to logs/ci-local-cleanup.log

Key Differences from Local Testing

Aspect	Local Development	CI Environment	CI Emulation
Memory Limits	1.5Gi	512Mi	512Mi ✅
Environment Variables	Local defaults	CI=true, GITHUB_ACTIONS=true	CI=true, GITHUB_ACTIONS=true ✅
Resource Constraints	Generous	Limited (~7GB total)	Limited ✅
Debug Logging	Console only	Limited	Comprehensive files ✅
Troubleshooting	Manual	Minimal	Auto-provided commands ✅

Debug Output Files

Generated debug files provide comprehensive troubleshooting information:

logs/ci-local-unit.log
- Unit test output with environment information
- Go version and tool versions
- Complete test execution logs
logs/ci-local-integration.log
- Test cluster creation and operator deployment
- Integration test execution with CI constraints
- Resource allocation and memory limit information
logs/ci-local-cleanup.log
- Environment cleanup operations
- Resource removal confirmation

Automatic Troubleshooting Commands

If integration tests fail, the target automatically provides troubleshooting commands:

# Check operator logs
kubectl logs -n neo4j-operator-system deployment/neo4j-operator-controller-manager

# Check pod status
kubectl get pods --all-namespaces

# Check events
kubectl get events --all-namespaces --sort-by='.lastTimestamp'

Usage Scenarios

1. Debugging CI Failures

# CI failed with memory issues? Reproduce locally:
make test-ci-local

# Check specific integration logs
cat logs/ci-local-integration.log | grep -E "(OOMKilled|Memory|Insufficient)"

2. Testing Resource Constraints

# Test with CI memory limits before pushing
make test-ci-local

# Verify resource requirements are appropriate
grep -A5 "memory" logs/ci-local-integration.log

3. Validating CI Fixes

# After fixing CI issues, validate locally
make test-ci-local

# Confirm tests pass with CI constraints
echo "Exit code: $?"

Performance Analysis

The CI emulation includes performance timing information:

# View test execution timeline
grep "Started at\|Finished at" logs/ci-local-*.log

# Analyze test duration by phase
grep -E "PHASE|✅|❌" logs/ci-local-integration.log

Best Practices

Use Before CI Push: Run make test-ci-local before pushing changes that affect tests
Review All Logs: Check all three log files for complete understanding
Memory Optimization: Use findings to optimize resource requirements
Document Issues: Add findings to troubleshooting guides

Writing New Tests

Adding Unit Tests

Create test file alongside source code
Follow naming conventions: *_test.go for integration, *_unit_test.go for unit tests
Test unexported methods from within package
Use table-driven tests for multiple scenarios

Adding Integration Tests

Add to test/integration/ directory
Use Ginkgo BDD style for readability
Include proper cleanup with finalizer removal
Set appropriate timeouts (5 minutes for CI)
Use minimal resources for CI compatibility
Test both success and failure scenarios

Test Documentation

Document test scenarios:

Purpose: What functionality is being tested
Setup: Required resources and configuration
Expected Results: What should happen in success case
Cleanup: How resources are cleaned up
CI Considerations: Any special requirements for CI

This comprehensive testing strategy ensures the Neo4j Enterprise Operator works reliably across different environments and deployment scenarios.

FilesExpand file tree

testing.md

Latest commit

History

testing.md

File metadata and controls

Testing Guide

Testing Strategy Overview

Test Infrastructure

Testing Framework

Test Environments

Unit Tests

Running Unit Tests

Unit Test Structure

Writing Unit Tests

Integration Tests

Test Cluster Management

Running Integration Tests

Integration Test Structure

Current Architecture Testing (August 2025)

Server-Based Architecture Tests

Centralized Backup Testing

Dual Deployment Support Testing

Test Configuration Guidelines

Resource Requirements for CI

Storage Configuration

Timeout Configuration

Resource Cleanup Patterns

Critical Cleanup Requirements

1. MANDATORY AfterEach Pattern

2. Common Mistakes to Avoid

2. Handle All Resource Types

3. Use Helper Functions

Test Suite Cleanup Helpers

Testing Best Practices

Resource Naming Patterns

Memory Requirements

Waiting Patterns

Cluster Readiness (Condition-Based)

Standalone Readiness (Boolean-Based)

Neo4j Cluster Formation Verification

Performance Testing

Reconciliation Efficiency Tests

Resource Usage Tests

CI/CD Testing

GitHub Actions Integration

CI-Specific Configuration

Troubleshooting Test Failures

Common Test Issues

1. Namespace Stuck in Terminating

2. Out of Memory (OOMKilled) Failures

3. Test Timeouts

4. Ginkgo Test Suite Conflicts

Test Coverage and Quality

Coverage Targets

Coverage Goals

Quality Checks

CI Workflow Emulation for Troubleshooting (Added 2025-08-22)

Quick Start

What It Does

Key Differences from Local Testing

Debug Output Files

Automatic Troubleshooting Commands

Usage Scenarios

Performance Analysis

Best Practices

Writing New Tests

Adding Unit Tests

Adding Integration Tests

Test Documentation