This guide explains how to create Neo4j databases from existing backups or dumps using the seed URI feature in the Neo4j Kubernetes Operator.
The seed URI feature allows you to create new Neo4j databases by restoring them from existing backup files stored in cloud storage or accessible via HTTP/FTP. This is useful for:
- Database Migration: Moving databases between environments
- Testing with Production Data: Creating test databases from production backups
- Disaster Recovery: Restoring databases to specific points in time
- Development Environment Setup: Seeding development databases with sample data
The operator supports the following URI schemes through Neo4j's CloudSeedProvider:
| Scheme | Description | Example |
|---|---|---|
s3:// |
Amazon S3 | s3://my-bucket/backup.backup |
gs:// |
Google Cloud Storage | gs://my-bucket/backup.backup |
azb:// |
Azure Blob Storage | azb://account.blob.core.windows.net/container/backup.backup |
https:// |
HTTPS URLs | https://backup-server.com/backup.backup |
http:// |
HTTP URLs | http://backup-server.com/backup.backup |
ftp:// |
FTP servers | ftp://ftp.server.com/backup.backup |
apiVersion: neo4j.neo4j.com/v1alpha1
kind: Neo4jDatabase
metadata:
name: my-database
spec:
clusterRef: my-cluster
name: mydb
# Seed from S3 backup using system-wide authentication
seedURI: "s3://my-backups/database.backup"
# Optional: specify database topology
topology:
primaries: 2
secondaries: 1
wait: true
ifNotExists: trueUse cloud-native authentication mechanisms that don't require explicit credentials:
AWS S3:
- IAM roles for service accounts (IRSA)
- EC2 instance profiles
- Environment variables on nodes
Google Cloud Storage:
- Workload Identity
- Service account keys via mounted volumes
- Compute Engine default service accounts
Azure Blob Storage:
- Managed identities
- Service principal environment variables
For environments where system-wide authentication isn't available:
apiVersion: v1
kind: Secret
metadata:
name: backup-credentials
data:
AWS_ACCESS_KEY_ID: <base64-encoded-key>
AWS_SECRET_ACCESS_KEY: <base64-encoded-secret>
---
apiVersion: neo4j.neo4j.com/v1alpha1
kind: Neo4jDatabase
metadata:
name: my-database
spec:
clusterRef: my-cluster
name: mydb
seedURI: "s3://my-backups/database.backup"
seedCredentials:
secretRef: backup-credentialsRequired:
AWS_ACCESS_KEY_IDAWS_SECRET_ACCESS_KEY
Optional:
AWS_SESSION_TOKEN(for temporary credentials)AWS_REGION
Required:
GOOGLE_APPLICATION_CREDENTIALS(service account JSON key)
Optional:
GOOGLE_CLOUD_PROJECT
Required:
AZURE_STORAGE_ACCOUNT- Either
AZURE_STORAGE_KEYORAZURE_STORAGE_SAS_TOKEN
Optional:
USERNAMEPASSWORDAUTH_HEADER(for custom authentication)
seedConfig:
# Restore to specific timestamp
restoreUntil: "2025-01-15T10:30:00Z"
# Or restore to specific transaction ID
restoreUntil: "txId:12345"seedConfig:
config:
# Compression: gzip, lz4, none
compression: "gzip"
# Validation: strict, lenient
validation: "strict"
# Buffer size for processing
bufferSize: "128MB"- Performance: Much faster restore times
- Features: Support for point-in-time recovery, compression
- Use Cases: Production workloads, large datasets
- Performance: Slower restore times for large datasets
- Compatibility: Cross-version compatibility, human-readable
- Use Cases: Development, testing, cross-version migrations
The operator will warn when using dump files:
Warning: Using dump file format. For better performance with large databases,
consider using Neo4j backup format (.backup) instead.
You can specify how the restored database should be distributed across your cluster:
topology:
primaries: 2 # Number of primary servers
secondaries: 3 # Number of secondary serversThe operator validates that your topology doesn't exceed cluster capacity and provides warnings for suboptimal configurations.
The operator prevents conflicting configurations:
spec:
# ERROR: Cannot specify both seedURI and initialData
seedURI: "s3://my-backups/database.backup"
initialData:
cypherStatements:
- "CREATE (:Person {name: 'Alice'})"When seedURI is specified, initialData is ignored since the seed provides the initial data.
The operator provides detailed status and events during seed restoration:
Events:
DatabaseCreatedFromSeed: Database successfully created from seed URIDataSeeded: Database seeded from URI successfullyValidationWarning: Validation warnings (e.g., suboptimal topology)
Status Conditions:
Ready: Database is ready and availableValidationFailed: Configuration validation failedCreationFailed: Database creation failed
-
Authentication Failures
- Verify credentials in referenced secret
- Check IAM roles/permissions for system-wide auth
- Ensure workload identity is properly configured
-
URI Access Failures
- Verify the backup file exists at the specified URI
- Check network connectivity from Neo4j pods
- Ensure URI format is correct
-
Validation Errors
- Check that referenced cluster exists and is ready
- Verify topology doesn't exceed cluster capacity
- Ensure no conflicts between seedURI and initialData
-
Performance Issues
- Consider using .backup format instead of .dump
- Adjust bufferSize in seedConfig
- Ensure adequate resources for restoration
# Check database status
kubectl get neo4jdatabase my-database -o yaml
# View operator logs
kubectl logs -n neo4j-operator-system deployment/neo4j-operator-controller-manager
# Check events
kubectl describe neo4jdatabase my-database
# Verify database in Neo4j
kubectl exec -it <neo4j-pod> -- cypher-shell -u neo4j -p <password> "SHOW DATABASES"- Use System-Wide Authentication: Prefer IAM roles, workload identity, and managed identities over explicit credentials
- Rotate Credentials: Regularly rotate any explicit credentials stored in secrets
- Least Privilege: Grant minimal required permissions for backup access
- Network Security: Use private endpoints and VPNs for sensitive backup access
- Audit Access: Monitor and log backup access for compliance
See the examples/databases/ directory for comprehensive examples:
database-from-s3-seed.yaml- S3 with explicit credentialsdatabase-from-gcs-seed.yaml- Google Cloud Storage with workload identitydatabase-from-azure-seed.yaml- Azure Blob Storage with both key and SAS token authdatabase-from-http-seed.yaml- HTTP/HTTPS/FTP examplesdatabase-dump-vs-backup-seed.yaml- Performance comparison between formats
| Feature | Neo4j 5.26+ | Neo4j 2025.x |
|---|---|---|
| Basic seed URI | ✅ | ✅ |
| CloudSeedProvider | ✅ | ✅ |
| Point-in-time recovery | ❌ | ✅ |
| All URI schemes | ✅ | ✅ |
| Topology specification | ✅ | ✅ |
The operator uses Neo4j's modern CloudSeedProvider instead of the deprecated S3SeedProvider:
- ✅ Use: CloudSeedProvider with system-wide authentication
- ❌ Don't Use: S3SeedProvider (deprecated in Neo4j 5.x)
This approach provides better security, broader cloud support, and future compatibility.