Description
Implementation ideas
Overview
Currently, all *.ods
and *.q4
datasets are stored under a single blocks/
path without filesystem partitioning. This monolithic storage approach creates scalability challenges and potential performance bottlenecks.
Risks and Challenges
-
Storage Capacity Constraints
- Since all of the datasets are stored on the same root path as blocks keep increasing this could represent a major challenge in terms of storage distribution and scalability of the volumes.
-
Performance Bottlenecks
- Single directory containing all blocks impacts file lookup performance
- Potential I/O contention when multiple processes access the same directory
- Limited ability to optimize for specific storage hardware characteristics
Proposed Enhancement
Implement a partitioning strategy that would:
- Implement a two-level hierarchical partitioning strategy based on block hash prefixes
- Create 256 primary partitions using the first two hexadecimal characters
- Enable flexible volume distribution across storage resources
Partitioning Example
Existing structure (no partitioning, all files are stored on the same root path blocks/
):
├── 00E1584FF07A13371E6A293EAC970EF42F753C474E0737D93EF1430944227441.ods
├── 10E2584FF07A13371E6A293EAC970EF42F753C474E0737D93EF1430944227441.ods
Partitioning example through the use of the first 2 bytes [00->FF]
would create a structure of 256
indexes.
blocks/
├── 0X/
│ ├── E1584FF07A13371E6A293EAC970EF42F753C474E0737D93EF1430944227441.ods
│ ├── E2584FF07A13371E6A293EAC970EF42F753C474E0737D93EF1430944227441.ods
├── 1X/
│ ├── E1584FF07A13371E6A293EAC970EF42F753C474E0737D93EF1430944227441.ods
│ ├── E2584FF07A13371E6A293EAC970EF42F753C474E0737D93EF1430944227441.ods
├── XX/
│ ├── E1584FF07A13371E6A293EAC970EF42F753C474E0737D93EF1430944227441.ods
│ ├── E2584FF07A13371E6A293EAC970EF42F753C474E0737D93EF1430944227441.ods
Example of volume distribution with partitioning enabled:
Volume 0: 0x/ (Blocks starting with 0)
Volume 1: 1x/ (Blocks starting with 1)
Volume 2: 2x/ (Blocks starting with 2)
Volume 3: 3x/ (Blocks starting with 3)
Volume 4: 4x/ (Blocks starting with 4)
Volume 5: 5x/ (Blocks starting with 5)
Volume 6: 6x/ (Blocks starting with 6)
Volume 7: 7x/ (Blocks starting with 7)
Volume 8: 8x/ (Blocks starting with 8)
Volume 9: 9x/ (Blocks starting with 9)
Volume 10: Ax/ (Blocks starting with A)
Volume 11: Bx/ (Blocks starting with B)
Volume 12: Cx/ (Blocks starting with C)
Volume 13: Dx/ (Blocks starting with D)
Volume 14: Ex/ (Blocks starting with E)
Volume 15: Fx/ (Blocks starting with F)
How this would fix existing limitations ?
When deploying a Data Availability (DA) node on cloud infrastructure, service providers face a critical limitation: cloud platforms typically impose a hard storage limit per volume. Since DA nodes currently store all *.ods
datasets in a single root path, this creates an absolute ceiling that cannot be bypassed.
The proposed partitioning strategy provides a robust solution to these storage constraints
Volume Distribution:
- Creates 256 distinct indexes (00-FF) based on block hash prefixes
- Distributes these indexes across up to 16 separate volumes (0-F)
- Each volume handles blocks with specific prefix ranges
Example of Storage Capacity Benefits based on a limit of 10TB
per block storage (volume):
- Number of volumes: 16 (one for each hex prefix)
- Total theoretical capacity: 160TB per DA node
- Scalability factor: 16x increase from baseline