This guide provides comprehensive troubleshooting information for Neo4j backup and restore operations using the Kubernetes operator. It covers common issues, diagnostic steps, and solutions for various backup and restore scenarios.
Note: Cluster deployments use a centralized {cluster}-backup pod (container backup). Backup sidecar references apply to standalone deployments.
Before troubleshooting, ensure you have:
- Neo4j Enterprise cluster running version 5.26.0+ (semver) or 2025.01.0+ (calver)
- Appropriate RBAC permissions for backup/restore operations
- Access to cluster logs and events
- Understanding of your storage backend configuration
# Check backup resource status
kubectl get neo4jbackups
kubectl get neo4jrestores
# View detailed resource information
kubectl describe neo4jbackup <backup-name>
kubectl describe neo4jrestore <restore-name>
# Check events
kubectl get events --sort-by=.metadata.creationTimestamp
# View operator logs
kubectl logs -n neo4j-operator deployment/neo4j-operator-controller-manager# List backup/restore jobs
kubectl get jobs -l app.kubernetes.io/component=backup
kubectl get jobs -l app.kubernetes.io/component=restore
# Check job logs
kubectl logs job/<backup-name>-backup
kubectl logs job/<restore-name>-restore
# Check pod status and logs
kubectl get pods -l app.kubernetes.io/component=backup
kubectl logs <backup-pod-name>Error: Neo4j version 5.25.0 is not supported. Minimum required version is 5.26.0
Diagnosis:
# Check cluster image version
kubectl get neo4jenterprisecluster <cluster-name> -o jsonpath='{.spec.image.tag}'
# Check backup/restore resource events
kubectl describe neo4jbackup <backup-name>Solutions:
-
Update Neo4j Version:
spec: image: tag: "5.26.0-enterprise" # or later version
-
Verify Supported Versions:
- Semver: 5.26.0, 5.26.1 (5.26.x is the last semver LTS — no 5.27+ exists)
- Calver: 2025.01.0, 2025.06.1, 2026.01.0+
Error: invalid Neo4j version format: latest. Expected semver (5.26+) or calver (2025.01+)
Solution:
Use specific version tags instead of latest:
spec:
image:
tag: "5.26.0-enterprise"Error: AccessDenied: Access Denied
Diagnosis:
# Check AWS credentials
kubectl get secret aws-credentials -o yaml
# Verify IAM permissions
aws sts get-caller-identity
aws s3 ls s3://your-backup-bucket/Solutions:
-
Verify IAM Permissions:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:PutObject", "s3:DeleteObject", "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::your-backup-bucket", "arn:aws:s3:::your-backup-bucket/*" ] } ] } -
Update Service Account Annotations:
metadata: annotations: eks.amazonaws.com/role-arn: arn:aws:iam::ACCOUNT:role/neo4j-backup-role
-
Check Secret Format:
apiVersion: v1 kind: Secret metadata: name: aws-credentials data: AWS_ACCESS_KEY_ID: <base64-key> AWS_SECRET_ACCESS_KEY: <base64-secret>
Error: 403 Forbidden: Permission denied
Solutions:
-
Verify Service Account Key:
# Check service account secret kubectl get secret gcs-credentials -o yaml # Test GCS access gsutil ls gs://your-backup-bucket/
-
Required GCS Permissions:
storage.objects.createstorage.objects.deletestorage.objects.getstorage.objects.liststorage.buckets.get
Error: AuthenticationFailed: Server failed to authenticate the request
Solutions:
-
Check Storage Account Key:
apiVersion: v1 kind: Secret metadata: name: azure-credentials data: AZURE_STORAGE_ACCOUNT: <base64-account-name> AZURE_STORAGE_KEY: <base64-storage-key>
-
Verify Container Permissions:
# Test Azure CLI access az storage blob list --container-name your-container --account-name your-account
Error: pod has unbound immediate PersistentVolumeClaims
Diagnosis:
# Check PVC status
kubectl get pvc
kubectl describe pvc <pvc-name>
# Check storage class
kubectl get storageclassSolutions:
-
Verify Storage Class:
spec: storage: type: pvc pvc: name: backup-storage size: 100Gi storageClassName: fast-ssd # Ensure this exists
-
Check Available Storage:
# List nodes and storage kubectl describe nodes kubectl get pv
Status: Failed
Message: Failed to create backup job: pods "backup-job-xyz" is forbidden
Diagnosis:
# Check RBAC permissions for backup job service account
kubectl auth can-i create pods/exec --as=system:serviceaccount:<namespace>:neo4j-backup-sa
# Check service account
kubectl get serviceaccount neo4j-backup-sa -o yamlSolutions:
-
Verify RBAC:
apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: neo4j-backup-role rules: - apiGroups: ["batch"] resources: ["jobs", "cronjobs"] verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
-
Check Service Account Binding:
apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: neo4j-backup-rolebinding subjects: - kind: ServiceAccount name: neo4j-backup-sa namespace: <namespace> roleRef: kind: Role name: neo4j-backup-role apiGroup: rbac.authorization.k8s.io
Status: Failed
Message: org.neo4j.cli.CommandFailedException: Path '/data/backups/test-backup' does not exist
Note: This issue has been fixed in the latest operator version. The backup sidecar now automatically creates the backup path before executing the backup command.
If you encounter this with an older operator version:
Diagnosis (standalone only):
# Check operator version
kubectl get deployment -n neo4j-operator neo4j-operator-controller-manager -o jsonpath='{.spec.template.spec.containers[0].image}'
# Check backup sidecar logs
kubectl logs <neo4j-pod> -c backup-sidecarSolutions:
-
Upgrade to Latest Operator:
- The latest operator version automatically creates backup paths
- Neo4j 5.26+ and 2025.x+ require paths to exist
-
Temporary Workaround (if upgrade not possible):
# Manually create backup directory in pod kubectl exec <neo4j-pod> -c backup-sidecar -- mkdir -p /data/backups/<backup-name>
-
Verify Fix in New Version:
# Check that backup sidecar includes mkdir command kubectl get pod <neo4j-pod> -o jsonpath='{.spec.containers[?(@.name=="backup-sidecar")].command}' | grep "mkdir -p"
For clusters, check the centralized backup pod instead:
kubectl logs <cluster>-backup-0 -c backup
Status: Failed
Message: Backup job timed out after 2h0m0s
Solutions:
-
Increase Timeout:
spec: timeout: "4h" # Increase timeout for large databases
-
Check Resource Limits:
spec: options: additionalArgs: - "--parallel-recovery" - "--temp-path=/tmp/backup"
-
Monitor Disk I/O:
# Check node resources kubectl top nodes kubectl top pods
Status: Failed
Message: Backup verification failed: inconsistent data detected
Solutions:
-
Check Database Consistency:
// Connect to Neo4j and run CALL dbms.checkConsistency()
-
Disable Verification Temporarily:
spec: options: verify: false # Disable for problematic databases
-
Use Force Flag:
spec: force: true
Status: Waiting
Message: Target cluster is not ready
Diagnosis:
# Check cluster status
kubectl get neo4jenterprisecluster <cluster-name>
kubectl describe neo4jenterprisecluster <cluster-name>
# Check pod status
kubectl get pods -l app.kubernetes.io/instance=<cluster-name>Solutions:
-
Wait for Cluster Readiness:
# Monitor cluster status kubectl get neo4jenterprisecluster <cluster-name> -w
-
Check Cluster Configuration:
# Ensure cluster has proper resources spec: resources: requests: memory: "2Gi" cpu: "1"
Status: Failed
Message: database myapp already exists. Use replaceExisting option or force flag
Solutions:
-
Use Replace Existing:
spec: options: replaceExisting: true
-
Use Force Flag:
spec: force: true
-
Drop Database First:
// Connect to Neo4j and run DROP DATABASE myapp IF EXISTS
Status: Failed
Message: transaction log validation failed: missing log segment
Diagnosis:
# Check transaction log storage
kubectl describe neo4jrestore <restore-name>
# Verify log storage accessibility
aws s3 ls s3://transaction-logs/production/logs/Solutions:
-
Check Log Retention:
spec: source: pitr: logRetention: "14d" # Increase retention period
-
Disable Log Validation:
spec: source: pitr: validateLogIntegrity: false
-
Use Different Recovery Point:
spec: source: pointInTime: "2025-01-04T10:00:00Z" # Earlier time
Status: Failed
Message: failed to create Neo4j client: connection refused
Diagnosis:
# Check Neo4j service
kubectl get svc -l app.kubernetes.io/instance=<cluster-name>
# Test connectivity
kubectl port-forward svc/<cluster-name>-client 7687:7687
neo4j-client -u neo4j -p password bolt://localhost:7687Solutions:
-
Check Service Configuration:
# Ensure service is properly exposed spec: services: neo4j: enabled: true type: ClusterIP
-
Verify Network Policies:
kubectl get networkpolicies kubectl describe networkpolicy <policy-name>
-
Check Firewall Rules:
# Ensure port 7687 is accessible telnet <cluster-ip> 7687
Status: Failed
Message: backup job killed due to memory limit
Solutions:
-
Increase Job Resources:
# Add to backup job template (requires operator modification) resources: requests: memory: "4Gi" cpu: "2" limits: memory: "8Gi" cpu: "4"
-
Use Incremental Backup:
spec: options: additionalArgs: - "--incremental"
-
Optimize Backup Path:
spec: options: additionalArgs: - "--temp-path=/tmp/backup" - "--parallel-recovery"
Status: Running (for extended time)
Solutions:
-
Enable Compression:
spec: options: compress: true
-
Use Parallel Processing:
spec: options: additionalArgs: - "--parallel-recovery"
-
Check Storage Performance:
# Test storage I/O kubectl exec -it <backup-pod> -- dd if=/dev/zero of=/backup/test bs=1M count=1000
Status: Failed
Message: Pre-restore hooks failed: hook job failed
Diagnosis:
# Check hook job status
kubectl get jobs -l app.kubernetes.io/component=pre-restore
# Check hook job logs
kubectl logs job/<restore-name>-pre-restore-hookSolutions:
-
Increase Hook Timeout:
spec: options: preRestore: job: timeout: "30m" # Increase timeout
-
Fix Hook Script:
spec: options: preRestore: job: template: container: command: ["/bin/sh"] args: ["-c", "set -e; /scripts/pre-restore.sh"] # Add error handling
Status: Failed
Message: failed to execute Cypher statement: syntax error
Solutions:
-
Validate Cypher Syntax:
spec: options: postRestore: cypherStatements: - "CALL db.awaitIndexes(600)" # Add timeout - "MATCH (n:User) WHERE n.created IS NULL SET n.created = datetime()"
-
Check Database State:
// Verify database is accessible CALL db.ping()
Enable debug logging in the operator:
# Restart operator with debug logging
kubectl patch deployment neo4j-operator-controller-manager \
-n neo4j-operator \
-p '{"spec":{"template":{"spec":{"containers":[{"name":"manager","args":["--zap-log-level=debug"]}]}}}}'Monitor resource usage during operations:
# Watch resource usage
watch kubectl top pods
watch kubectl top nodes
# Monitor storage usage
kubectl exec -it <backup-pod> -- df -hTest network connectivity:
# DNS resolution
kubectl exec -it <backup-pod> -- nslookup <cluster-name>-client
# Port connectivity
kubectl exec -it <backup-pod> -- telnet <cluster-name>-client 7687
# Network policies
kubectl get networkpolicies --all-namespaces-
Set up Alerts:
# Prometheus alert for backup failures - alert: BackupFailed expr: increase(neo4j_backup_failures_total[1h]) > 0 for: 5m annotations: summary: "Neo4j backup failed"
-
Regular Health Checks:
# Weekly backup validation kubectl get neo4jbackups -o json | jq '.items[] | select(.status.phase != "Completed")'
-
Storage Monitoring:
# Monitor backup storage growth kubectl get pvc -o jsonpath='{.items[*].status.capacity.storage}'
-
Performance Baselines:
# Establish backup performance baselines kubectl get neo4jbackup -o jsonpath='{.items[*].status.stats.duration}'
-
Backup Validation:
# Monthly restore tests kubectl apply -f test-restore.yaml -
Disaster Recovery Drills:
# Quarterly DR tests kubectl apply -f disaster-recovery-test.yaml
#!/bin/bash
# backup-restore-debug.sh - Collect diagnostic information
echo "=== Neo4j Backup/Restore Diagnostic Report ==="
echo "Generated: $(date)"
echo
echo "=== Cluster Information ==="
kubectl get neo4jenterpriseclusters
echo
echo "=== Backup Resources ==="
kubectl get neo4jbackups
echo
echo "=== Restore Resources ==="
kubectl get neo4jrestores
echo
echo "=== Recent Events ==="
kubectl get events --sort-by=.metadata.creationTimestamp | tail -20
echo
echo "=== Operator Logs (last 100 lines) ==="
kubectl logs -n neo4j-operator deployment/neo4j-operator-controller-manager --tail=100
echo
echo "=== Storage Classes ==="
kubectl get storageclass
echo
echo "=== PVCs ==="
kubectl get pvc- Documentation: Backup and Restore Guide
- API Reference: Neo4jBackup, Neo4jRestore
- Community: Neo4j Community Forum
- Enterprise Support: Neo4j Support Portal
Contact support when:
- Data corruption is suspected
- Backup/restore operations consistently fail
- Performance is significantly degraded
- Security incidents occur
- Complex PITR scenarios need assistance
Provide the diagnostic report and specific error messages when contacting support.