This guide explains the comprehensive testing strategy for the Neo4j Enterprise Operator, covering unit tests, integration tests, and end-to-end testing practices.
The operator uses a multi-layered testing approach:
- Unit Tests: Fast tests for individual functions and components
- Integration Tests: Full workflow testing with Kubernetes API server
- End-to-End Tests: Real cluster testing with Kind clusters
- Performance Tests: Reconciliation efficiency and resource usage validation
- Ginkgo/Gomega: BDD-style testing framework for integration tests
- Envtest: Kubernetes API server for integration testing
- Kind: Kubernetes in Docker for real cluster testing
- Go Testing: Standard Go testing for unit tests
- Development:
neo4j-operator-devKind cluster - Testing:
neo4j-operator-testKind cluster - CI/CD: Automated testing in GitHub Actions
Unit tests are fast, require no Kubernetes cluster, and test individual functions and components.
# Run all unit tests (no cluster required)
make test-unit
# Run specific package tests
go test ./internal/controller -v
go test ./internal/validation -v
go test ./api/v1alpha1 -v
# Run specific test functions
go test ./internal/controller -run TestGetStatefulSetName -v
go test ./internal/validation -run TestTopologyValidator -vUnit tests are located alongside the code they test:
internal/controller/
├── neo4jenterprisecluster_controller.go
├── neo4jenterprisecluster_controller_test.go
├── plugin_controller.go
├── plugin_controller_unit_test.go # Unexported method tests
└── plugin_controller_test.go # Integration-style tests
func TestGetStatefulSetName(t *testing.T) {
r := &Neo4jPluginReconciler{}
tests := []struct {
name string
deployment *DeploymentInfo
expected string
}{
{
name: "cluster deployment",
deployment: &DeploymentInfo{
Type: "cluster",
Name: "my-cluster",
},
expected: "my-cluster-server",
},
// Add more test cases...
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
result := r.getStatefulSetName(tt.deployment)
assert.Equal(t, tt.expected, result)
})
}
}Integration tests use envtest to provide a real Kubernetes API server without requiring a full cluster.
# Create test cluster (includes cert-manager for TLS tests)
make test-cluster
# Clean operator resources (keep cluster running)
make test-cluster-clean
# Reset cluster (delete and recreate)
make test-cluster-reset
# Delete test cluster entirely
make test-cluster-delete
# Complete test environment cleanup
make test-destroy# Full integration test suite (automatically creates cluster and deploys operator)
make test-integration
# Alternative: step-by-step approach
make test-cluster # Create test cluster
make test-integration # Run tests (uses existing cluster)
make test-cluster-delete # Clean up cluster
# Run specific test suites
ginkgo run -focus "Neo4jEnterpriseCluster" ./test/integration
ginkgo run -focus "should create backup" ./test/integration
ginkgo run -focus "Plugin Installation" ./test/integration
# CI-optimized test commands (for advanced use)
make test-integration-ci # Assumes cluster and operator already deployed
make test-integration-ci-full # Full suite in CI environmentIntegration tests are located in test/integration/ and follow consistent patterns:
var _ = Describe("Neo4jPlugin Integration Tests", func() {
const (
timeout = time.Second * 300 // 5-minute timeout for CI
interval = time.Second * 5
)
Context("Plugin Installation on Cluster", func() {
It("Should install APOC plugin on Neo4jEnterpriseCluster", func() {
ctx := context.Background()
namespace := createUniqueNamespace()
By("Creating namespace")
Expect(k8sClient.Create(ctx, namespace)).Should(Succeed())
By("Creating admin secret")
// Create required secrets...
By("Creating Neo4jEnterpriseCluster")
cluster := &neo4jv1alpha1.Neo4jEnterpriseCluster{
ObjectMeta: metav1.ObjectMeta{
Name: "plugin-test-cluster",
Namespace: namespace.Name,
},
Spec: neo4jv1alpha1.Neo4jEnterpriseClusterSpec{
Image: neo4jv1alpha1.ImageSpec{
Repo: "neo4j",
Tag: "5.26.0-enterprise",
},
Topology: neo4jv1alpha1.TopologyConfiguration{
Servers: 2,
},
// Resource constraints for CI compatibility
Resources: &corev1.ResourceRequirements{
Requests: corev1.ResourceList{
corev1.ResourceCPU: resource.MustParse("100m"),
corev1.ResourceMemory: resource.MustParse("1.5Gi"),
},
Limits: corev1.ResourceList{
corev1.ResourceCPU: resource.MustParse("500m"),
corev1.ResourceMemory: resource.MustParse("1.5Gi"),
},
},
Storage: neo4jv1alpha1.StorageSpec{
Size: "1Gi",
ClassName: "standard",
},
},
}
Expect(k8sClient.Create(ctx, cluster)).Should(Succeed())
By("Waiting for cluster to be ready")
Eventually(func() string {
currentCluster := &neo4jv1alpha1.Neo4jEnterpriseCluster{}
err := k8sClient.Get(ctx, types.NamespacedName{
Name: "plugin-test-cluster",
Namespace: namespace.Name,
}, currentCluster)
if err != nil {
return ""
}
return currentCluster.Status.Phase
}, timeout, interval).Should(Equal("Ready"))
// Continue with plugin testing...
})
})
})Tests verify the new server-based architecture:
By("Verifying server StatefulSet exists with correct name")
serverSts := &appsv1.StatefulSet{}
Eventually(func() error {
return k8sClient.Get(ctx, types.NamespacedName{
Name: clusterName + "-server", // Server-based naming
Namespace: namespace.Name,
}, serverSts)
}, timeout, interval).Should(Succeed())
Expect(*serverSts.Spec.Replicas).To(Equal(int32(2)))Tests verify centralized backup architecture:
By("Verifying centralized backup StatefulSet")
backupSts := &appsv1.StatefulSet{}
Eventually(func() error {
return k8sClient.Get(ctx, types.NamespacedName{
Name: clusterName + "-backup", // Centralized backup
Namespace: namespace.Name,
}, backupSts)
}, timeout, interval).Should(Succeed())
Expect(*backupSts.Spec.Replicas).To(Equal(int32(1))) // Single backup podTests verify both cluster and standalone support:
Context("Plugin Installation on Standalone", func() {
It("Should install GDS plugin on Neo4jEnterpriseStandalone", func() {
// Test standalone deployment with plugin installation
standalone := &neo4jv1alpha1.Neo4jEnterpriseStandalone{
ObjectMeta: metav1.ObjectMeta{
Name: standaloneName,
Namespace: namespace.Name,
},
Spec: neo4jv1alpha1.Neo4jEnterpriseStandaloneSpec{
Image: neo4jv1alpha1.ImageSpec{
Repo: "neo4j",
Tag: "5.26.0-enterprise",
},
// Standalone-specific configuration...
},
}
// Test plugin installation on standalone...
})
})All integration tests use minimal resources to avoid CI scheduling issues:
resources:
requests:
cpu: "100m" # Minimal CPU for CI compatibility
memory: "1.5Gi" # Required for Neo4j Enterprise database operations
limits:
cpu: "500m" # Reasonable limit for testing
memory: "1.5Gi" # Neo4j Enterprise minimum for database operationsstorage:
size: "1Gi" # Minimal size for testing
className: "standard" # Default storage class in Kindconst (
timeout = time.Second * 300 // 5-minute timeout for CI environments
interval = time.Second * 5 // Check every 5 seconds
)Proper resource cleanup is critical to prevent CI failures and resource exhaustion:
All integration tests MUST include AfterEach blocks to prevent resource leaks:
AfterEach(func() {
// Critical: Clean up resources immediately to prevent CI resource exhaustion
if cluster != nil {
By("Cleaning up cluster resource")
// Remove finalizers first
if len(cluster.GetFinalizers()) > 0 {
cluster.SetFinalizers([]string{})
_ = k8sClient.Update(ctx, cluster)
}
// Delete the resource
_ = k8sClient.Delete(ctx, cluster)
cluster = nil
}
// Clean up any remaining resources in namespace
if testNamespace != "" {
cleanupCustomResourcesInNamespace(testNamespace)
}
})Why This Pattern Is Critical:
- Prevents resource accumulation that causes "Insufficient memory" errors
- Ensures cleanup even if tests fail (inline cleanup won't run on failure)
- Removes finalizers to prevent resources stuck in Terminating state
- Cleans namespace resources that might not have owner references
❌ No AfterEach block - Causes resource leaks if tests fail ❌ Inline cleanup only - Won't execute if test panics or fails ❌ Missing namespace cleanup - Leaves behind ConfigMaps, Services, etc. ❌ Not removing finalizers - Resources stay in Terminating state ❌ Relying on test suite cleanup - Not sufficient for resource-intensive tests
Clean up all resources that might have finalizers:
// Neo4j resources
Expect(k8sClient.Delete(ctx, cluster)).Should(Succeed())
Expect(k8sClient.Delete(ctx, standalone)).Should(Succeed())
Expect(k8sClient.Delete(ctx, plugin)).Should(Succeed())
Expect(k8sClient.Delete(ctx, database)).Should(Succeed())
Expect(k8sClient.Delete(ctx, backup)).Should(Succeed())
// Kubernetes resources (usually auto-cleaned by owner references)
// PVCs, Services, StatefulSets are cleaned automatically// Helper function to create unique namespace
func createUniqueNamespace() *corev1.Namespace {
return &corev1.Namespace{
ObjectMeta: metav1.ObjectMeta{
Name: fmt.Sprintf("test-%d", time.Now().UnixNano()),
},
}
}The integration test suite provides cleanup utilities:
// Clean up all custom resources in namespace
cleanupCustomResourcesInNamespace(namespace)
// Force remove finalizers if needed
forceRemoveFinalizers(resource)Test resources should use predictable naming:
// Cluster naming
clusterName := "test-cluster-" + unique-suffix
// Expected StatefulSet names (server-based architecture)
expectedServerSts := clusterName + "-server"
expectedBackupSts := clusterName + "-backup"
// Standalone naming
standaloneName := "test-standalone-" + unique-suffix
expectedStandaloneSts := standaloneName // No suffix for standaloneCritical for Neo4j Enterprise: Tests must allocate sufficient memory:
Resources: &corev1.ResourceRequirements{
Requests: corev1.ResourceList{
corev1.ResourceMemory: resource.MustParse("1.5Gi"), // MINIMUM for Enterprise
},
Limits: corev1.ResourceList{
corev1.ResourceMemory: resource.MustParse("1.5Gi"), // Prevent OOMKill
},
},Why 1.5Gi is required:
- Neo4j Enterprise needs minimum memory for database operations
- Lower values cause Out of Memory kills (exit code 137)
- Database creation and topology operations fail with insufficient memory
Eventually(func() string {
cluster := &neo4jv1alpha1.Neo4jEnterpriseCluster{}
err := k8sClient.Get(ctx, clusterKey, cluster)
if err != nil {
return ""
}
return cluster.Status.Phase
}, timeout, interval).Should(Equal("Ready"))Eventually(func() bool {
standalone := &neo4jv1alpha1.Neo4jEnterpriseStandalone{}
err := k8sClient.Get(ctx, standaloneKey, standalone)
if err != nil {
return false
}
return standalone.Status.Ready
}, timeout, interval).Should(BeTrue())By("Verifying Neo4j cluster formation")
Eventually(func() error {
// Connect to first server and check cluster status
return exec.Command("kubectl", "exec",
clusterName+"-server-0", "--",
"cypher-shell", "-u", "neo4j", "-p", password,
"SHOW SERVERS").Run()
}, timeout, interval).Should(Succeed())It("Should maintain efficient reconciliation rates", func() {
// Monitor reconciliation frequency
// Verify <100 reconciliations per minute under normal conditions
})It("Should use optimal resource patterns", func() {
// Verify centralized backup uses <30% resources of sidecar approach
// Check server-based StatefulSet efficiency
})Tests run automatically in CI with:
- Parallel Execution: Multiple test suites run concurrently
- Resource Constraints: CI-optimized resource limits
- Timeout Handling: Extended timeouts for image pull delays
- Cleanup Automation: Automatic test environment cleanup
# Environment variables for CI
export CI=true
export KUBEBUILDER_ASSETS="$(pwd)/bin/k8s/1.31.0-linux-amd64"
export KUBECONFIG=~/.kube/configSymptoms: Test namespaces remain in "Terminating" state indefinitely
Diagnosis:
# Check for resources with finalizers
kubectl get all,neo4jenterpriseclusters,neo4jenterprisestandalones,neo4jplugins -n <namespace> -o yaml | grep -A5 finalizers
# Check for PVCs
kubectl get pvc -n <namespace>Solutions:
# Force cleanup test resources
make test-cluster-clean
# Reset test cluster entirely
make test-cluster-reset
# Manual finalizer removal (if needed)
kubectl patch neo4jenterprisecluster <name> -n <namespace> \
-p '{"metadata":{"finalizers":[]}}' --type=mergeSymptoms: Pods exit with code 137, "OOMKilled" in pod status
Diagnosis:
# Check pod status
kubectl describe pod <pod-name> | grep -E "(OOMKilled|Memory|Exit.*137)"
# Monitor memory usage
kubectl top pod <pod-name> --containersSolutions:
- Increase memory limits to minimum 1.5Gi for Neo4j Enterprise
- Reduce concurrent test execution
- Use minimal storage and CPU allocations
Symptoms: Tests fail with "Timed out after 300s"
Common Causes:
- Image pull delays in CI environments
- Insufficient resources for cluster formation
- Missing RBAC permissions
Solutions:
# Check operator status
kubectl get pods -n neo4j-operator
# Check operator logs
kubectl logs -n neo4j-operator deployment/neo4j-operator-controller-manager
# Verify cert-manager (required for TLS tests)
kubectl get pods -n cert-manager
# Check cluster formation
kubectl get events --sort-by='.firstTimestamp'Symptoms: "Ginkgo does not support rerunning suites" error
Cause: Multiple RunSpecs() calls in same package
Solution: Ensure only one test suite per package:
// Correct: One RunSpecs per package
func TestControllers(t *testing.T) {
RegisterFailHandler(Fail)
RunSpecs(t, "Controller Suite")
}
// Include all tests in the same suite via Describe blocks# Generate coverage report
make test-coverage
# View coverage in browser
go tool cover -html=coverage.out- Unit Tests: >80% coverage for controller logic
- Integration Tests: All major workflows covered
- E2E Tests: Critical user journeys verified
Integration tests should verify:
- Resource Creation: All expected Kubernetes resources created
- Status Updates: Proper status conditions and phase transitions
- Error Handling: Graceful handling of failure scenarios
- Resource Cleanup: Proper finalizer handling and cleanup
- Performance: Efficient reconciliation and resource usage
When encountering CI failures or testing memory-constrained environments, use the comprehensive CI workflow emulation:
# Emulate complete CI workflow with debug logging
make test-ci-localThe test-ci-local target provides a complete emulation of the GitHub Actions CI workflow:
-
Environment Setup
- Sets
CI=true GITHUB_ACTIONS=trueenvironment variables - Creates
logs/directory for comprehensive debug output - Cleans up any previous test environment
- Sets
-
Unit Test Phase
- Runs unit tests with CI environment variables
- Logs Go version, kubectl version, and environment details
- Saves output to
logs/ci-local-unit.log
-
Integration Test Phase
- Creates test cluster with CI-appropriate resource constraints
- Deploys Neo4j operator
- Runs integration tests with 512Mi memory limits (same as CI)
- Saves output to
logs/ci-local-integration.log
-
Cleanup Phase
- Complete environment destruction
- Saves cleanup output to
logs/ci-local-cleanup.log
| Aspect | Local Development | CI Environment | CI Emulation |
|---|---|---|---|
| Memory Limits | 1.5Gi | 512Mi | 512Mi ✅ |
| Environment Variables | Local defaults | CI=true, GITHUB_ACTIONS=true | CI=true, GITHUB_ACTIONS=true ✅ |
| Resource Constraints | Generous | Limited (~7GB total) | Limited ✅ |
| Debug Logging | Console only | Limited | Comprehensive files ✅ |
| Troubleshooting | Manual | Minimal | Auto-provided commands ✅ |
Generated debug files provide comprehensive troubleshooting information:
-
logs/ci-local-unit.log- Unit test output with environment information
- Go version and tool versions
- Complete test execution logs
-
logs/ci-local-integration.log- Test cluster creation and operator deployment
- Integration test execution with CI constraints
- Resource allocation and memory limit information
-
logs/ci-local-cleanup.log- Environment cleanup operations
- Resource removal confirmation
If integration tests fail, the target automatically provides troubleshooting commands:
# Check operator logs
kubectl logs -n neo4j-operator-system deployment/neo4j-operator-controller-manager
# Check pod status
kubectl get pods --all-namespaces
# Check events
kubectl get events --all-namespaces --sort-by='.lastTimestamp'1. Debugging CI Failures
# CI failed with memory issues? Reproduce locally:
make test-ci-local
# Check specific integration logs
cat logs/ci-local-integration.log | grep -E "(OOMKilled|Memory|Insufficient)"2. Testing Resource Constraints
# Test with CI memory limits before pushing
make test-ci-local
# Verify resource requirements are appropriate
grep -A5 "memory" logs/ci-local-integration.log3. Validating CI Fixes
# After fixing CI issues, validate locally
make test-ci-local
# Confirm tests pass with CI constraints
echo "Exit code: $?"The CI emulation includes performance timing information:
# View test execution timeline
grep "Started at\|Finished at" logs/ci-local-*.log
# Analyze test duration by phase
grep -E "PHASE|✅|❌" logs/ci-local-integration.log- Use Before CI Push: Run
make test-ci-localbefore pushing changes that affect tests - Review All Logs: Check all three log files for complete understanding
- Memory Optimization: Use findings to optimize resource requirements
- Document Issues: Add findings to troubleshooting guides
- Create test file alongside source code
- Follow naming conventions:
*_test.gofor integration,*_unit_test.gofor unit tests - Test unexported methods from within package
- Use table-driven tests for multiple scenarios
- Add to
test/integration/directory - Use Ginkgo BDD style for readability
- Include proper cleanup with finalizer removal
- Set appropriate timeouts (5 minutes for CI)
- Use minimal resources for CI compatibility
- Test both success and failure scenarios
Document test scenarios:
- Purpose: What functionality is being tested
- Setup: Required resources and configuration
- Expected Results: What should happen in success case
- Cleanup: How resources are cleaned up
- CI Considerations: Any special requirements for CI
This comprehensive testing strategy ensures the Neo4j Enterprise Operator works reliably across different environments and deployment scenarios.