Property Sharding is an advanced Neo4j feature that separates graph structure (nodes and relationships) from properties, enabling better scalability for property-heavy workloads by distributing properties across multiple databases.
Property Sharding decouples data into:
- Graph Shard: Single database containing nodes and relationships WITHOUT properties
- Property Shards: Multiple databases containing properties distributed via hash function
- Virtual Database: Logical database presenting unified view of graph + property shards
Neo4j Version Requirements:
- Minimum: Neo4j 2025.12-enterprise (introduced in 2025.12)
- Note: Property sharding is an enterprise-only feature and requires valid licensing
- Not available on Aura
Cluster Infrastructure Requirements:
- Minimum Servers: 2 servers minimum (3+ recommended for HA graph shard primaries)
- Recommended Servers: 3-7 servers (odd numbers provide better consensus characteristics)
- Maximum Recommended: 20+ servers for very large deployments
Resource Requirements per Server:
| Component | Minimum (Basic) | Recommended (Production) | Notes |
|---|---|---|---|
| Memory | 4GB total | 8GB+ total | 4GB minimum, 8GB+ recommended |
| CPU | 1 core | 2+ cores | Cross-shard queries are CPU intensive |
| Storage | 10GB | 100GB+ | Depends on data volume and shard count |
| Network | 1Gbps | 10Gbps+ | Low latency critical for transaction log sync |
Additional Requirements:
- Authentication: Admin secret required (property sharding requires authenticated cluster access)
- Storage Class: Persistent storage class must be specified (e.g.,
standard,fast-ssd) - Kubernetes Version: 1.24+ for full operator compatibility
- Network Policy: Allow inter-pod communication on discovery and bolt ports
- Cypher Version: Must use Cypher 25 for sharded database operations
Performance Considerations:
- Memory Overhead: 20-30% additional memory required for shard coordination
- CPU Overhead: 20-30% additional CPU required for cross-shard operations
- Storage Growth: Linear growth with number of property shards
- Network Traffic: 2-3x increase in inter-server communication
Capacity Planning Guidelines:
Development/Testing:
topology:
servers: 3
resources:
requests:
memory: 4Gi # Absolute minimum for dev/test
cpu: 2000m # 2 cores for cross-shard queries
limits:
memory: 8Gi # Recommended for production
cpu: 2000mProduction:
topology:
servers: 3 # or 5+ for larger datasets
resources:
requests:
memory: 4Gi # Minimum for dev/test
cpu: 2000m # Cross-shard performance
limits:
memory: 8Gi # Recommended for production
cpu: 4000m # Handle peak loadsHigh-Performance Production:
topology:
servers: 7 # Better distribution
resources:
requests:
memory: 8Gi # Production recommendation
cpu: 4000m # Maximum throughput
limits:
memory: 20Gi # Peak load handling
cpu: 6000m # Burst capability✅ Tested Configuration: The examples below have been tested with Neo4j 2025.12-enterprise on Kubernetes clusters with the specified resource requirements. Property sharding cluster creation typically completes in 2-3 minutes with proper resources.
Before creating a property sharding cluster, create the required admin secret:
kubectl create secret generic neo4j-admin-secret \
--from-literal=username=neo4j \
--from-literal=password=your-secure-passwordapiVersion: neo4j.neo4j.com/v1alpha1
kind: Neo4jEnterpriseCluster
metadata:
name: property-sharding-cluster
namespace: default
spec:
image:
repo: neo4j
tag: 2025.12-enterprise # Property sharding requires 2025.12+
# Authentication required for property sharding
auth:
adminSecret: neo4j-admin-secret
topology:
servers: 3 # 3+ recommended for HA graph shard primaries
storage:
size: 10Gi
className: standard # Storage class must be specified
resources:
requests:
memory: 4Gi # Minimum 4GB for dev/test environments
cpu: 2000m # 2+ cores required for cross-shard queries
limits:
memory: 8Gi # Recommended 8GB for production
cpu: 4000m # Higher CPU for shard coordination overhead (20-30% increase)
# Enable property sharding
propertySharding:
enabled: true
config:
# Required configuration (applied automatically if not specified)
internal.dbms.sharded_property_database.enabled: "true"
db.query.default_language: "CYPHER_25"
internal.dbms.sharded_property_database.allow_external_shard_access: "false"
# Performance tuning (optional)
db.tx_log.rotation.retention_policy: "7 days"
internal.dbms.sharded_property_database.property_pull_interval: "10ms"apiVersion: neo4j.neo4j.com/v1alpha1
kind: Neo4jShardedDatabase
metadata:
name: products-sharded-db
namespace: default
spec:
# Reference to property sharding enabled cluster
clusterRef: property-sharding-cluster
# Virtual database name (what users connect to)
name: products
# Required for property sharding
defaultCypherLanguage: "25"
# Property sharding configuration
propertySharding:
propertyShards: 4 # Number of property shards
# Graph shard topology (stores nodes/relationships)
graphShard:
primaries: 3
secondaries: 2
# Property shard topology (stores properties)
propertyShardTopology:
replicas: 2
# Database creation options
wait: true # Wait for creation to complete
ifNotExists: true # Don't fail if database existsUse seedURI for a single backup location that contains shard-suffixed artifacts
(for example, products-g000 and products-p000). Use seedURIs for per-shard
URIs when seeding from dumps or multiple locations.
spec:
seedURI: "s3://backups/products/"
seedConfig:
restoreUntil: "2025-06-01T10:30:00Z"
seedCredentials:
secretRef: "seed-credentials"| Field | Type | Required | Description |
|---|---|---|---|
enabled |
boolean | Yes | Enable property sharding support on cluster |
config |
map[string]string | No | Advanced property sharding configuration |
When propertySharding.enabled is true, these settings are automatically applied:
config:
internal.dbms.sharded_property_database.enabled: "true"
db.query.default_language: "CYPHER_25"
internal.dbms.sharded_property_database.allow_external_shard_access: "false"config:
# Transaction log retention (critical for shard sync)
db.tx_log.rotation.retention_policy: "7 days"
# Property pull interval
internal.dbms.sharded_property_database.property_pull_interval: "10ms"
# Memory configuration
server.memory.heap.max_size: "8G"
server.memory.pagecache.size: "4G"| Field | Type | Required | Description |
|---|---|---|---|
propertyShards |
int32 | Yes | Number of property shards (1-1000) |
graphShard |
DatabaseTopology | Yes | Topology for graph shard |
propertyShardTopology |
PropertyShardTopology | Yes | Replica topology for property shards |
| Field | Type | Description |
|---|---|---|
primaries |
int32 | Number of primary replicas |
secondaries |
int32 | Number of secondary replicas |
| Field | Type | Description |
|---|---|---|
replicas |
int32 | Number of replicas per property shard |
Development/Testing:
- 2 servers minimum (3+ recommended for HA graph shard primaries)
- 4GB heap per server (absolute minimum for property sharding)
- 8GB+ total RAM per server (recommended for stable production operation)
- Property shards: 2-4
Production:
- 3+ servers recommended (HA graph shard primaries)
- 4-8GB+ heap per server (4GB minimum, 8GB for production)
- 20GB+ total RAM per server (accounting for system overhead)
- 2+ CPU cores per server (cross-shard query requirements)
- Property shards: 4-16 (start conservatively)
Property distribution across shards is automatic; the operator does not provide per-property controls.
| Dataset Size | Property Shards | Reasoning |
|---|---|---|
| < 1M nodes | 2-4 | Minimal overhead |
| 1M-10M nodes | 4-8 | Balanced distribution |
| 10M-100M nodes | 8-16 | Better parallelization |
| 100M+ nodes | 16-32 | Maximum distribution |
Note: Resharding requires an offline neo4j-admin database copy and recreating the database; the operator does not automate resharding.
Check if property sharding is ready:
kubectl get neo4jenterpriseclusters property-sharding-cluster -o yamlLook for:
status:
phase: Ready
propertyShardingReady: trueCheck sharded database health:
kubectl get neo4jshardeddatabases products-sharded-db -o yamlKey status fields:
status:
phase: Ready
shardingReady: true
graphShard:
name: products-g000
ready: true
state: online
propertyShards:
- name: products-p000
ready: true
state: online
virtualDatabase:
name: products
ready: trueConnect to the virtual database:
kubectl port-forward svc/property-sharding-cluster-client 7687:7687Query the virtual database (automatically routes to appropriate shards):
// Connect to virtual database
:use products
// Standard queries work transparently
MATCH (n:Product) RETURN n.name, n.description LIMIT 10;
// Create data (properties distributed automatically)
CREATE (p:Product {
id: "12345", // Stays in graph shard
name: "Widget", // Stays in graph shard
description: "...", // Goes to property shard
metadata: "{...}" // Goes to property shard
});Check individual shards:
// Graph shard (nodes/relationships only)
:use system
SHOW DATABASES
WHERE name STARTS WITH "products-g"
// Property shards (properties only)
SHOW DATABASES
WHERE name STARTS WITH "products-p"1. Version Mismatch
Error: property sharding requires Neo4j 2025.12+
Solution: Upgrade to Neo4j 2025.12 or later.
2. Insufficient Memory
Error: property sharding requires minimum 4GB memory for basic operation, got XXXMB (recommended: 8GB+)
Solution: Increase memory allocation to at least 4GB, recommended 8GB+ for production:
resources:
requests:
memory: 4Gi # Minimum requirement
limits:
memory: 8Gi # Recommended for production3. Insufficient CPU Resources
Error: property sharding requires minimum 1 CPU core, got 500m (recommended: 2+ cores)
Solution: Increase CPU allocation for cross-shard query performance:
resources:
requests:
cpu: 2000m # Recommended minimum for cross-shard queries
limits:
cpu: 4000m # Allow burst capacity4. Invalid Server Count
Error: spec.topology.servers in body should be greater than or equal to 2
Solution: Set at least 2 servers, and consider 3+ for HA graph shard primaries:
topology:
servers: 3 # 3+ recommended for HA graph shard primaries
# Consider 7 servers for larger datasets5. Cluster Not Configured for Property Sharding
Error: referenced cluster does not have property sharding enabled
Solution: Ensure propertySharding.enabled: true on the referenced cluster and that it is Ready.
# Check cluster configuration
kubectl describe neo4jenterprisecluster property-sharding-cluster
# Check sharded database events
kubectl describe neo4jshardeddatabase products-sharded-db
# View operator logs
kubectl logs -n neo4j-operator deployment/neo4j-operator-controller-manager
# Check individual database status
kubectl exec property-sharding-cluster-server-0 -- \
cypher-shell -u neo4j -p password "SHOW DATABASES"
# Check virtual database
kubectl exec property-sharding-cluster-server-0 -- \
cypher-shell -u neo4j -p password "SHOW DATABASES"Note: The operator does not orchestrate backupConfig for Neo4jShardedDatabase. Use explicit Neo4jBackup resources to back up each shard.
Property sharding backup coordinates across all shards:
apiVersion: neo4j.neo4j.com/v1alpha1
kind: Neo4jBackup
metadata:
name: sharded-backup
spec:
clusterRef: property-sharding-cluster
# Multi-database backup
databases:
- name: "products-g000" # Graph shard
type: "full+differential"
- name: "products-p000" # Property shards
type: "full"
- name: "products-p001"
type: "full"
- name: "products-p002"
type: "full"
- name: "products-p003"
type: "full"
schedule: "0 2 * * *"
consistency: "cross-database" # Ensure consistent backup pointFast Queries (Graph Shard Only):
// Structure queries - very fast (single shard)
MATCH (n)-[r]->(m) RETURN count(r)
// Index-based lookups on graph properties - fast
MATCH (p:Product) WHERE p.id = "12345" RETURN p.name, p.categoryModerate Queries (Mixed Access):
// Graph structure + property shard access - moderate speed
MATCH (p:Product)
WHERE p.category = "electronics" // Graph shard
RETURN p.name, p.description // Mixed: graph + property shardSlower Queries (Heavy Property Access):
// Full property shard scans - slower
MATCH (p:Product)
WHERE p.description CONTAINS "high-performance" // Property shard scan
RETURN p.name, p.specifications // Property shard data1. Automatic Property Distribution:
Property distribution is automatic for sharded databases. Focus on shard count, replica topology, and resource sizing.
2. Resource Scaling Recommendations:
Development (Basic Testing):
- Memory: 6-8GB per server (basic functionality)
- CPU: 1-2 cores per server (acceptable for development)
- Network: Standard Kubernetes networking (1Gbps)
- Servers: 2 minimum (3+ recommended for HA)
Production (Recommended):
- Memory: 4-8GB per server (4GB minimum, 8GB recommended)
- CPU: 2-4 cores per server (cross-shard query performance)
- Network: High-speed networking (10Gbps+, low latency)
- Servers: 3-7 servers (better shard distribution)
High-Performance (Enterprise):
- Memory: 16-20GB+ per server (maximum throughput)
- CPU: 4-6+ cores per server (concurrent cross-shard queries)
- Network: Ultra-low latency networking (<1ms)
- Servers: 7+ servers (optimal shard placement)
3. Monitoring Key Metrics:
Resource Utilization:
# Monitor memory usage per server
kubectl top pods -l app.kubernetes.io/name=your-cluster --containers
# Check CPU utilization during peak queries
kubectl top pods -l app.kubernetes.io/name=your-cluster --containersQuery Performance:
// Monitor cross-shard query latencies
CALL dbms.queryJmx("org.neo4j:instance=kernel#0,name=Transactions")
YIELD attributes
RETURN attributes.NumberOfOpenTransactions;
// Check transaction log positions
SHOW DATABASES
WHERE name STARTS WITH "your-db-"
RETURN name, currentStatus, requestedStatus;Network and I/O:
# Monitor network traffic between pods
kubectl exec your-cluster-server-0 -- ss -tuln
# Check storage I/O patterns
kubectl exec your-cluster-server-0 -- iostat -x 1Graph Shard Growth:
- Grows with number of nodes and relationships
- Index storage for graph properties
- Transaction logs for consensus
Property Shard Growth:
- Grows with property volume per shard
- Distributed based on hash function
- Each shard has independent transaction logs
Total Storage Planning:
Total Storage = Graph_Shard + (Property_Shards × Avg_Property_Size) + Overhead
Example:
- Graph: 10GB (structure + graph properties)
- Property Shards: 4 × 25GB = 100GB (distributed properties)
- Overhead: 20GB (transaction logs, indexes, system)
- Total: ~130GB per server (with replication)
Network Performance Requirements:
Transaction Log Synchronization:
- Latency: <10ms between servers (critical)
- Bandwidth: 100MB/s+ sustained (busy clusters)
- Consistency: All shards must stay within transaction log window
Cross-Shard Query Traffic:
- Latency: <5ms for responsive queries
- Bandwidth: Proportional to result set size
- Concurrent Queries: Multiple cross-shard queries increase traffic
Slow Query Performance:
- Check property distribution strategy
- Monitor cross-shard query patterns
- Verify adequate CPU allocation
- Check network latency between servers
High Memory Usage:
- Monitor heap usage during peak loads
- Check transaction log retention settings
- Verify property shard distribution balance
- Consider increasing memory limits
Network Saturation:
- Monitor inter-pod network traffic
- Check for transaction log lag
- Verify network bandwidth capacity
- Consider network policy optimizations
Property sharding cannot be enabled on existing databases. Migration approaches:
- Create new sharded database and migrate data
- Export/import with property distribution
- Application-level migration with dual-write
See migration guide for detailed procedures.
- No in-operator resharding: Resharding requires offline
neo4j-admin database copyand recreation - Neo4j version: Requires 2025.12+ enterprise
- Cypher version: Must use Cypher 25
- No online resharding: Plan shard count carefully
- Increased complexity: More monitoring and operational overhead