Neo4j Database Seed URI Feature Guide

This guide explains how to create Neo4j databases from existing backups or dumps using the seed URI feature in the Neo4j Kubernetes Operator.

Overview

The seed URI feature allows you to create new Neo4j databases by restoring them from existing backup files stored in cloud storage or accessible via HTTP/FTP. This is useful for:

Database Migration: Moving databases between environments
Testing with Production Data: Creating test databases from production backups
Disaster Recovery: Restoring databases to specific points in time
Development Environment Setup: Seeding development databases with sample data

Supported URI Schemes

The operator supports the following URI schemes through Neo4j's CloudSeedProvider:

Scheme	Description	Example
`s3://`	Amazon S3	`s3://my-bucket/backup.backup`
`gs://`	Google Cloud Storage	`gs://my-bucket/backup.backup`
`azb://`	Azure Blob Storage	`azb://account.blob.core.windows.net/container/backup.backup`
`https://`	HTTPS URLs	`https://backup-server.com/backup.backup`
`http://`	HTTP URLs	`http://backup-server.com/backup.backup`
`ftp://`	FTP servers	`ftp://ftp.server.com/backup.backup`

Basic Usage

Simple Seed URI Database

apiVersion: neo4j.neo4j.com/v1alpha1
kind: Neo4jDatabase
metadata:
  name: my-database
spec:
  clusterRef: my-cluster
  name: mydb

  # Seed from S3 backup using system-wide authentication
  seedURI: "s3://my-backups/database.backup"

  # Optional: specify database topology
  topology:
    primaries: 2
    secondaries: 1

  wait: true
  ifNotExists: true

Authentication Methods

1. System-Wide Authentication (Recommended)

Use cloud-native authentication mechanisms that don't require explicit credentials:

AWS S3:

IAM roles for service accounts (IRSA)
EC2 instance profiles
Environment variables on nodes

Google Cloud Storage:

Workload Identity
Service account keys via mounted volumes
Compute Engine default service accounts

Azure Blob Storage:

Managed identities
Service principal environment variables

2. Explicit Credentials via Secrets

For environments where system-wide authentication isn't available:

apiVersion: v1
kind: Secret
metadata:
  name: backup-credentials
data:
  AWS_ACCESS_KEY_ID: <base64-encoded-key>
  AWS_SECRET_ACCESS_KEY: <base64-encoded-secret>
---
apiVersion: neo4j.neo4j.com/v1alpha1
kind: Neo4jDatabase
metadata:
  name: my-database
spec:
  clusterRef: my-cluster
  name: mydb
  seedURI: "s3://my-backups/database.backup"

  seedCredentials:
    secretRef: backup-credentials

Credential Requirements by Provider

Amazon S3

Required:

AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY

Optional:

AWS_SESSION_TOKEN (for temporary credentials)
AWS_REGION

Google Cloud Storage

Required:

GOOGLE_APPLICATION_CREDENTIALS (service account JSON key)

Optional:

GOOGLE_CLOUD_PROJECT

Azure Blob Storage

Required:

AZURE_STORAGE_ACCOUNT
Either AZURE_STORAGE_KEY OR AZURE_STORAGE_SAS_TOKEN

HTTP/HTTPS/FTP

Optional:

USERNAME
PASSWORD
AUTH_HEADER (for custom authentication)

Advanced Configuration

Point-in-Time Recovery (Neo4j 2025.x only)

seedConfig:
  # Restore to specific timestamp
  restoreUntil: "2025-01-15T10:30:00Z"

  # Or restore to specific transaction ID
  restoreUntil: "txId:12345"

CloudSeedProvider Options

seedConfig:
  config:
    # Compression: gzip, lz4, none
    compression: "gzip"

    # Validation: strict, lenient
    validation: "strict"

    # Buffer size for processing
    bufferSize: "128MB"

File Format Considerations

Backup Files (.backup) - Recommended

Performance: Much faster restore times
Features: Support for point-in-time recovery, compression
Use Cases: Production workloads, large datasets

Dump Files (.dump) - Legacy

Performance: Slower restore times for large datasets
Compatibility: Cross-version compatibility, human-readable
Use Cases: Development, testing, cross-version migrations

The operator will warn when using dump files:

Warning: Using dump file format. For better performance with large databases,
         consider using Neo4j backup format (.backup) instead.

Database Topology with Seed URIs

You can specify how the restored database should be distributed across your cluster:

topology:
  primaries: 2    # Number of primary servers
  secondaries: 3  # Number of secondary servers

The operator validates that your topology doesn't exceed cluster capacity and provides warnings for suboptimal configurations.

Conflict Prevention

The operator prevents conflicting configurations:

spec:
  # ERROR: Cannot specify both seedURI and initialData
  seedURI: "s3://my-backups/database.backup"
  initialData:
    cypherStatements:
      - "CREATE (:Person {name: 'Alice'})"

When seedURI is specified, initialData is ignored since the seed provides the initial data.

Status and Events

The operator provides detailed status and events during seed restoration:

Events:

DatabaseCreatedFromSeed: Database successfully created from seed URI
DataSeeded: Database seeded from URI successfully
ValidationWarning: Validation warnings (e.g., suboptimal topology)

Status Conditions:

Ready: Database is ready and available
ValidationFailed: Configuration validation failed
CreationFailed: Database creation failed

Troubleshooting

Common Issues

Authentication Failures
- Verify credentials in referenced secret
- Check IAM roles/permissions for system-wide auth
- Ensure workload identity is properly configured
URI Access Failures
- Verify the backup file exists at the specified URI
- Check network connectivity from Neo4j pods
- Ensure URI format is correct
Validation Errors
- Check that referenced cluster exists and is ready
- Verify topology doesn't exceed cluster capacity
- Ensure no conflicts between seedURI and initialData
Performance Issues
- Consider using .backup format instead of .dump
- Adjust bufferSize in seedConfig
- Ensure adequate resources for restoration

Debugging Commands

# Check database status
kubectl get neo4jdatabase my-database -o yaml

# View operator logs
kubectl logs -n neo4j-operator-system deployment/neo4j-operator-controller-manager

# Check events
kubectl describe neo4jdatabase my-database

# Verify database in Neo4j
kubectl exec -it <neo4j-pod> -- cypher-shell -u neo4j -p <password> "SHOW DATABASES"

Security Best Practices

Use System-Wide Authentication: Prefer IAM roles, workload identity, and managed identities over explicit credentials
Rotate Credentials: Regularly rotate any explicit credentials stored in secrets
Least Privilege: Grant minimal required permissions for backup access
Network Security: Use private endpoints and VPNs for sensitive backup access
Audit Access: Monitor and log backup access for compliance

Examples

See the examples/databases/ directory for comprehensive examples:

database-from-s3-seed.yaml - S3 with explicit credentials
database-from-gcs-seed.yaml - Google Cloud Storage with workload identity
database-from-azure-seed.yaml - Azure Blob Storage with both key and SAS token auth
database-from-http-seed.yaml - HTTP/HTTPS/FTP examples
database-dump-vs-backup-seed.yaml - Performance comparison between formats

Neo4j Version Compatibility

Feature	Neo4j 5.26+	Neo4j 2025.x
Basic seed URI	✅	✅
CloudSeedProvider	✅	✅
Point-in-time recovery	❌	✅
All URI schemes	✅	✅
Topology specification	✅	✅

Migration from S3SeedProvider

The operator uses Neo4j's modern CloudSeedProvider instead of the deprecated S3SeedProvider:

✅ Use: CloudSeedProvider with system-wide authentication
❌ Don't Use: S3SeedProvider (deprecated in Neo4j 5.x)

This approach provides better security, broader cloud support, and future compatibility.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Neo4j Database Seed URI Feature Guide

Overview

Supported URI Schemes

Basic Usage

Simple Seed URI Database

Authentication Methods

1. System-Wide Authentication (Recommended)

2. Explicit Credentials via Secrets

Credential Requirements by Provider

Amazon S3

Google Cloud Storage

Azure Blob Storage

HTTP/HTTPS/FTP

Advanced Configuration

Point-in-Time Recovery (Neo4j 2025.x only)

CloudSeedProvider Options

File Format Considerations

Backup Files (.backup) - Recommended

Dump Files (.dump) - Legacy

Database Topology with Seed URIs

Conflict Prevention

Status and Events

Troubleshooting

Common Issues

Debugging Commands

Security Best Practices

Examples

Neo4j Version Compatibility

Migration from S3SeedProvider

FilesExpand file tree

seed-uri-feature-guide.md

Latest commit

History

seed-uri-feature-guide.md

File metadata and controls

Neo4j Database Seed URI Feature Guide

Overview

Supported URI Schemes

Basic Usage

Simple Seed URI Database

Authentication Methods

1. System-Wide Authentication (Recommended)

2. Explicit Credentials via Secrets

Credential Requirements by Provider

Amazon S3

Google Cloud Storage

Azure Blob Storage

HTTP/HTTPS/FTP

Advanced Configuration

Point-in-Time Recovery (Neo4j 2025.x only)

CloudSeedProvider Options

File Format Considerations

Backup Files (.backup) - Recommended

Dump Files (.dump) - Legacy

Database Topology with Seed URIs

Conflict Prevention

Status and Events

Troubleshooting

Common Issues

Debugging Commands

Security Best Practices

Examples

Neo4j Version Compatibility

Migration from S3SeedProvider