FRC Proposal: Self-Describing Data Aggregation (Data Segment Index v2) #1216

magik6k · 2025-11-14T20:19:02Z

magik6k
Nov 14, 2025
Maintainer

Starting this as a discussion so that it's easier to gather and respond to feedback. The overarching goals for this proposal are to support more advanced aggregation strategies and unlock innovation for onramps which have those needs.

Allow for more efficient data packing (with randomly sized data that should be roughly 25% efficient)
Maintain the ability to prove data inclusion with slightly larger proofs
Whether PieceCID v3 is going to be required is TBD - misaligned pieces require pieces computed with offset-aware CommP tree shape, the "PieceCID@offset" could consist of a hash of interior leaves of left and right sides of the piece tree with outside leaves set to zero commitments (expanded on below, probably deserves a few diagrams)
Add extensions for more advanced data types (e.g. object storage)
Add space for sub-piece specific ACL (allow for fine-grained ACL)
Propose first mechanism for true erasure-coding support.

Self-Describing Data Aggregation (Data Segment Index v2)

Simple Summary

This FRC extends FRC-0058 Verifiable Data Aggregation by adding content type information, raw data size, access control signaling, and flexible alignment to Data Segment Index entries. It also introduces Proof of Data Segment Inclusion v2 which supports arbitrary alignments and enables tighter data packing while maintaining verifiability.

Abstract

FRC-0058 established a Data Segment Index format for proving inclusion of client data within aggregated deals. This proposal extends the index entry format to include:

Raw Size: Enables computation of Piece CID v2 (FRC-0069)
Multicodec indicator: Identifies content type (Raw or CAR format and future extensions)
ACL Signal: Access control type and data for future retrieval authorization mechanisms
Flexible offset alignment: Removes strict power-of-two alignment requirements

The proposal also introduces Proof of Data Segment Inclusion v2, which uses dual inclusion proofs (leftmost and rightmost leaves) to support various alignment strategies while maintaining compatibility with v1 for traditionally aligned segments.

Change Motivation

The original FRC-0058 Data Segment Index has several limitations:

No content type information: Storage Providers cannot determine whether a segment contains Raw data or CAR-formatted data without implicit processing. There is no way to extend the system to handle more sophisticated metadata like data certificates or advanced content indexes (e.g. provable object bucket mutation lists)
Missing raw size: Computing Piece CID v2 (FRC-0069) requires knowing the amount of null padding, which cannot be derived from padded size alone
No access control signaling: There is no mechanism to indicate that a piece requires access control or authorization for retrieval
Strict alignment requirements: Power-of-two alignment with null trailer padding is inefficient for storage utilization
Limited retrieval support: Without raw size, retrievers must return padded data or rely on external metadata

By extending the index format and proof mechanism, we enable:

Direct computation of Piece CID v2 from index data
Self-describing segments that communicate their format
Access control signaling for future authorization mechanisms
More efficient storage utilization through flexible alignment
Correct retrieval of unpadded data

Specification

Data Segment Index Entry v2

The Data Segment Index Entry v2 extends the v1 format by expanding to 256 bytes post-Fr32-padding (4 nodes). All offset and size fields represent pre-Fr32-padding byte positions and lengths.

Each entry consists of 256 bytes after Fr32 bit padding, providing 1016 bits of usable space across four 254-bit nodes. Each entry is aligned to a 256-byte boundary after Fr32 bit padding.

Entry structure (1016 bits total):

Node 1 (254 bits):
- CommDS (254 bits) - commitment to the data segment
Node 2 (254 bits):
- Offset (64 bits) - offset in bytes from the start of the deal to the Data Segment (pre-Fr32-padding)
- Size (64 bits) - size in bytes of the Data Segment after padding to alignment boundary (pre-Fr32-padding)
- RawSize (62 bits) - actual size in bytes of the raw data before padding (pre-Fr32-padding)
- Multicodec (64 bits) - content type identifier as uvarint-encoded multicodec value
Node 3 (254 bits):
- MulticodecDependent (254 bits) - multicodec-specific data
Node 4 (254 bits):
- ACLType (8 bits) - ACL type indicator (0 = no ACL, other values specified in future FRC)
- ACLData (64 bits) - ACL type-specific data
- Reserved (56 bits) - reserved for future versions of this FRC
- Checksum (126 bits) - SHA-256 checksum truncated to 126 bits

Field Details

Offset (64 bits)

Byte offset from the start of the deal to the beginning of the Data Segment
Measured in pre-Fr32-padding bytes
RECOMMENDED: Align to 127-byte boundaries (pre-Fr32-padding) or to power-of-two post-fr32 for simpler inclusion proofs
MAY use arbitrary alignment for maximum flexibility
Must be 0 in the first entry of the index
Must be increasing in subsequent entries
Must be greater of equal to previous entry's Offset + Size
Inter-piece gaps are assumed to be zero-filled

Size (64 bits)

Total size of the Data Segment including any trailing padding (pre-Fr32-padding)

RawSize (62 bits)

Actual size of the meaningful data before any trailing padding (pre-Fr32-padding)
The data extends from Offset to Offset + RawSize
MUST satisfy: RawSize <= Size
Enables correct data retrieval without padding
Required for computing Piece CID v2 (padding = Size - RawSize)

Multicodec (64 bits)

Identifies the content encoding format
Stored as LE 63 bit multicodec value
Supported values in this version:
- 0x55: Raw binary data
- 0x0202: CAR format (IPLD)
Other values reserved for future FRC extensions and must follow the Multicodec table
Implementations MUST validate against supported values
64-bit encoding accommodates all current and future multicodec values

MulticodecDependent (254 bits)

Extension space for multicodec-specific metadata
For Raw (0x55) and CAR (0x0202) codecs: MUST be set to zero
Future multicodec types MAY define usage of this space
Occupies entire Node 3
Implementations MUST validate this is zero for Raw and CAR codecs

ACLType (8 bits)

ACL (Access Control List) type indicator
Value 0: No ACL, data is publicly retrievable
Non-zero values: Specified by future FRC extensions
Located at the beginning of Node 4
Storage Providers MUST refuse retrievals for pieces with unknown (non-zero) ACL types
Enables future extensibility for access control mechanisms

ACLData (64 bits)

ACL type-specific data
Interpretation depends on ACLType value
MUST be set to zero when ACLType is 0 (no ACL)
Usage defined by the FRC that specifies the corresponding ACLType
Located in Node 4 after ACLType

Reserved (56 bits)

Reserved for future versions of this FRC
Located in Node 4 after ACLData
MUST be set to zero in this version
Implementations MUST validate this field is zero
Future versions may define usage without changing entry size

Checksum (126 bits)

SHA-256 hash of the entry data (CommDS, Offset, Size, RawSize, Multicodec, MulticodecDependent, ACLType, ACLData, Reserved) with checksum bits set to zero
Truncated to 126 bits (same as FRC-0058 v1)
Located at the end of Node 4
Provides integrity verification for the index entry

Index Entry Validation

A Data Segment Index entry is defined as valid if:

It is positioned within the Data Segment Index area (from start of index to end of deal)
The checksum is valid (126-bit SHA-256 truncated hash)
The Multicodec value is supported (0x55 or 0x0202 in this version)
The MulticodecDependent field (254 bits in Node 3) is zero for Raw and CAR codecs
If ACLType is 0, ACLData MUST be zero
If ACLType is non-zero and unknown, Storage Providers MUST refuse retrievals (but the entry is technically valid)
The Reserved field (56 bits in Node 4) is zero
RawSize <= Size
Offset + Size does not exceed the start of the index

Alignment Recommendations

While this specification allows arbitrary alignment, the following is RECOMMENDED:

Offset alignment: 127 bytes (pre-Fr32-padding)
- Rationale: 127 bytes aligns to Fr32 128-byte boundaries, simplifying proof construction
- Provides good balance between flexibility and implementation complexity
- Systems which utilize ACL should align data to at least 127 byte boundaries to avoid leaking un-hashed data in proofs
Size padding: Minimal padding to maintain TreeD structure
- For power-of-two aligned sizes: maintains compatibility with v1 proofs

Storage Provider Data Processing

After receiving deal data, a Storage Provider SHOULD:

Scan the Data Segment Index area for valid entries
For each valid entry:
- Verify the entry checksum
- Validate the Multicodec is supported
- Check ACLType: if non-zero and unknown, mark the piece as non-retrievable
If verification succeeds, make the segment retrievable via:
- Piece CID v2 (computed from CommDS, RawSize, and padding)
- Internal IPLD CIDs in case of IPLD Car data
- Only if ACLType is 0 or a known/supported ACL type

When serving retrievals:

MUST refuse retrievals for pieces with unknown ACLType values
Use RawSize to return only meaningful data (excluding padding)
Use Multicodec to set appropriate Content-Type headers
Support retrieval by both CommDS and Piece CID v2
Apply ACL checks according to the ACLType specification (for known ACL types)

Proof of Data Segment Inclusion v2

Proof of Data Segment Inclusion v2 (PoDSIv2) extends the original PoDSI to support flexible alignment while maintaining verifiability. The proof structure adapts based on segment alignment characteristics.

Proof Structure

The proof consists of:

Leftmost leaf inclusion proof: Merkle proof from the leftmost leaf of the data segment to the deal root
Rightmost leaf inclusion proof: Merkle proof from the rightmost leaf of the data segment to the deal root
Index entry inclusion proof: Merkle proof of the Data Segment Index entry (same as v1)

Proof Components

The aggregator provides:

$\mathrm{CommD}_A$ - commitment to the aggregator's data
$|\mathrm{D}_A|$ - size of the aggregator's data (deal/sector size)
$\mathrm{offset}$ - position of data segment start (pre-Fr32-padding)
$\mathrm{size}$ - padded size of data segment (pre-Fr32-padding)
$\mathrm{rawSize}$ - actual data size without padding (pre-Fr32-padding)
$\pi_{left}$ - inclusion proof for leftmost leaf of the segment
$\pi_{right}$ - inclusion proof for rightmost leaf of the segment
$\mathrm{pos}_{ds}$ - position of the index entry
$\pi_{ds}$ - index entry inclusion proof (as in v1)

Verification Algorithm

The client possesses: $\mathrm{CommDS}$, $\mathrm{rawSize}$, and optionally the raw data.

Verification steps:

Verify index entry inclusion:
- Compute expected index entry from $(\mathrm{CommDS}, \mathrm{offset}, \mathrm{size}, \mathrm{rawSize}, \mathrm{multicodec}, \mathrm{multicodecDependent}, \mathrm{aclType}, \mathrm{aclData}, \mathrm{reserved}, \mathrm{checksum})$
- For Raw and CAR codecs, verify $\mathrm{multicodecDependent} = 0$
- If $\mathrm{aclType} = 0$, verify $\mathrm{aclData} = 0$
- Verify $\mathrm{reserved} = 0$
- Verify $\pi_{ds}$ proves this entry is at position $\mathrm{pos}_{ds}$ in $\mathrm{CommD}_A$
- Verify $\mathrm{pos}_{ds}$ falls within the index region
Verify data inclusion (method depends on alignment):

Case A: Power-of-two aligned with v1-compatible padding
If $\mathrm{size} = 2^{\lceil \log_2(\mathrm{rawSize}) \rceil}$ and offset is suitably aligned:
- This is v1-compatible
- Use standard sub-tree proof: verify $\pi_{left}$ (or a single sub-tree proof) proves $\mathrm{CommDS}$ at the correct position
- $\pi_{right}$ should be consistent with $\pi_{left}$ (can be verified or omitted)
Case B: Power-of-two aligned without padding
If offset is power-of-two aligned but $\mathrm{rawSize} < \mathrm{size}$:
- Verify $\pi_{left}$ and $\pi_{right}$ prove the data segment boundary leaves are in $\mathrm{CommD}_A$
- To verify against Piece CID with null-padding:
  - Take $\pi_{right}$
  - For each sibling node to the right of the rightmost leaf, replace with zero-CommP (commitment of null data)
  - Recompute root - should match expected CommP with padding
Case C: Arbitrary alignment
For non-power-of-two aligned segments:
- Verify $\pi_{left}$ proves the leftmost leaf is at the expected position
- Verify $\pi_{right}$ proves the rightmost leaf is at the expected position
- The range between these proofs should exactly cover $[\mathrm{offset}, \mathrm{offset} + \mathrm{size})$
- Check that interior branches are the same
Verify size consistency:
- Ensure computed tree size from proofs matches $|\mathrm{D}_A|$
- Ensure $\mathrm{offset} + \mathrm{size}$ is within bounds
Verify on-chain:
- Verify DealID is active on chain
- Verify DealID has data commitment $\mathrm{CommD}_A$ and size $|\mathrm{D}_A|$

Piece CID v2 Computation

With the RawSize field, clients can compute Piece CID v2 (FRC-0069):

padding = size - rawSize
pieceCIDv2 = makePieceCIDv2(CommDS, rawSize, padding, treeHeight)

Where:

CommDS is the commitment from the index
rawSize determines the data size
padding is the null bytes added
treeHeight is derived from the padded size after Fr32 conversion

Compatibility with v1

PoDSIv2 maintains backward compatibility with v1:

Segments following v1 alignment rules can use simplified proofs (Case A)
v1 verifiers can verify v1-compatible segments in v2 index
v2 verifiers can verify v1 index entries (by treating missing fields as defaults)

Design Rationale

Why 4 Nodes (256 bytes)?

Expanding from 2 nodes (64 bytes) to 4 nodes (256 bytes) provides:

Full multicodec support: 64 bits accommodates all current and future multicodec values as uvarint
Extension space: 254 bits of multicodec-dependent metadata for future functionality
ACL support: 72 bits (9 bytes) for access control signaling with type-specific data
Version flexibility: 56 bits reserved in Node 4 for future FRC versions without format changes
Checksum parity: Maintains 126-bit checksum from FRC-0058 v1
Reasonable overhead: 4× entry size is acceptable given index is typically <1% of deal size

Why Multicodec-Dependent Space?

The 254-bit multicodec-dependent field (Node 3) enables:

Extensibility: Future codecs can define custom metadata without FRC changes
Self-describing data: Rich metadata travels with the segment
Future use cases: Certificates, compression info, content indexes
Structured growth: Well-defined extension point prevents fragmentation

Why ACL Signal (9 bytes)?

The ACL signal fields (ACLType + ACLData) provide:

Access control foundation: Enables future access control mechanisms
Type safety: 8-bit ACLType allows 255 different ACL schemes
Extensibility: 64 bits of ACLData for type-specific information (actor IDs, conditions)
Safe defaults: ACLType=0 means no ACL, public retrieval
Security by default: Unknown ACL types cause retrieval refusal, preventing unauthorized access
Future compatibility: New ACL types can be defined in future FRCs without changing the index format

Why Recommend 127-byte Alignment?

127 bytes (pre-Fr32-padding):

Aligns to 128-byte Fr32 boundaries
Matches leaf size in Filecoin's Merkle tree structure
Provides good space efficiency
Maintains compatibility with existing tooling

Storage Efficiency Trade-offs

Alignment Strategy	Storage Efficiency	Proof Complexity	v1 Compatible
Power-of-two + v1 padding	75-99% (depends on count)	Simple	Yes
Power-of-two, minimal padding	90-99%	Medium	Partial
127-byte boundary	95-99%	Complex	No
Arbitrary	99%+	Complex	No

Backwards Compatibility

Reading v1 Indexes

v2-aware implementations MUST support reading v1 indexes:

Treat entries as v2 with: RawSize = Size, Multicodec = 0x55, ACLType = 0, ACLData = 0, Reserved = 0
Use v1 verification for v1 index entries
Storage providers should re-index v1 entries to v2 format when practical

Index Version Detection

Implementations can detect index version by:

Attempt to parse as v2 (check v2 checksum matches)
If checksum fails validation, fall back to v1 interpretation
Entire index can only be exclusively v1 or v2, mixed indexes are not allowed

Test Cases

Test Case 1: v1-Compatible Segment

RawSize: 254 bytes (pre-Fr32)
Size: 508 bytes (power-of-two aligned, padded)
Offset: 0 (aligned)
Multicodec: 0x55 (85, Raw)
MulticodecDependent: 0 (all 254 bits zero)
ACLType: 0 (no ACL)
ACLData: 0 (all 64 bits zero)
Reserved: 0 (all 56 bits zero)

Expected: PoDSIv2 should produce functionally equivalent proof to v1

Test Case 2: Tightly Packed Segment

RawSize: 1000 bytes
Size: 1016 bytes (aligned to 127-byte boundary, minimal padding)
Offset: 127 bytes
Multicodec: 0x0202 (514, CAR)
MulticodecDependent: 0 (all 254 bits zero)
ACLType: 0 (no ACL)
ACLData: 0 (all 64 bits zero)
Reserved: 0 (all 56 bits zero)

Expected: Leftmost and rightmost proofs verify correct boundaries, Piece CID v2 computable

Test Case 3: Arbitrary Alignment

RawSize: 500 bytes
Size: 508 bytes (minimal padding to next 127-byte boundary)
Offset: 300 bytes (not aligned to power-of-two)
Multicodec: 0x55 (85, Raw)
MulticodecDependent: 0 (all 254 bits zero)
ACLType: 0 (no ACL)
ACLData: 0 (all 64 bits zero)
Reserved: 0 (all 56 bits zero)

Expected: Boundary proofs verify segment extent, no specific Piece CID v2 proof

Test Case 4: Unknown ACL Type

RawSize: 500 bytes
Size: 508 bytes
Offset: 127 bytes
Multicodec: 0x55 (85, Raw)
MulticodecDependent: 0 (all 254 bits zero)
ACLType: 42 (unknown ACL type)
ACLData: 0x123456789ABCDEF0 (some ACL-specific data)
Reserved: 0 (all 56 bits zero)

Expected: Entry is valid but Storage Provider MUST refuse retrievals for this piece

Security Considerations

Increased Entry Size

Expanding entries from 64 bytes to 256 bytes (4× increase):

Index size grows proportionally, but remains small relative to deal size
For a 32GiB deal with 16MiB index (256Ki entries): v1 uses 16MiB, v2 uses 64MiB
64MiB is 0.2% of 32GiB, acceptable overhead for added functionality
Storage Providers can optimize by compressing/deduplicating unused space
The 126-bit checksum (same as v1) provides strong integrity protection

Malicious Padding

An aggregator could set RawSize < Size to hide data in padding:

This is detectable: clients verify CommDS matches their data commitment
Padding region is explicitly part of the segment and included in proofs
Extra data in padding would change CommDS, failing verification
No security degradation vs v1

MulticodecDependent Field

The 254-bit multicodec-dependent field could be misused:

For Raw and CAR codecs, implementations MUST validate this is zero
Non-zero values for Raw/CAR MUST cause validation failure
Future codecs that use this field MUST clearly specify security implications
Prevents silent metadata injection for current codec types

ACL Signal Fields

The ACL signal fields introduce access control considerations:

Unknown ACL Types: Storage Providers MUST refuse retrievals for unknown ACLType values to prevent unauthorized access
ACLData validation: When ACLType=0, ACLData MUST be zero; implementations must enforce this
Future ACL types: New ACL types defined in future FRCs must include thorough security analysis
Fail-safe default: ACLType=0 (no ACL) is the safe default for public data
No silent bypass: Implementations cannot ignore or skip ACL checks

Proof Complexity

More complex proofs (arbitrary alignment) could introduce verification bugs:

Implementations SHOULD use well-tested Merkle proof libraries
Proof verification MUST fail-safe (reject invalid rather than accept)
Test vectors should cover all alignment cases

Incentive Considerations

Storage Provider Benefits

Better storage utilization → more revenue per sector
Self-describing segments → easier retrieval automation
Piece CID v2 support → compatibility with modern tooling

Client Benefits

More efficient aggregation → lower per-byte costs
Verifiable retrieval size → no paying for padding
Content-type awareness → correct client-side handling

Product Considerations

Aggregator Implementation

Aggregators implementing v2 should:

Start with 127-byte alignment for simplicity
Support both v1 and v2 index generation
Migrate to tighter packing as tooling matures
Provide v2 proofs to all clients

Retrieval Integration

Retrieval systems can:

Return exact RawSize bytes (no padding)
Support both Piece CID v1 and Piece CID v2 without external client-provided context

Implementation

Reference implementations:

go-data-segment/v2 (Go): [To be implemented]

Required implementations in ecosystem:

Storage Providers: Index parsing and proof generation
Aggregators: v2 index construction and proof provision
Verifiers: PoDSIv2 verification logic

Future Work

ACL Type Definitions

Future FRCs can define specific ACL types and their semantics:

Token-based access control: Using ACLData for token references
Signature-based access control: Using ACLData for public key hashes
Multi-party authorization: Using ACLData for threshold signature schemes
Each ACL type definition must specify the ACLData format and verification logic

Index Sections

Special data sections which are indexes over data in other data sections

IPLD - e.g. detached carv2 indexes for car data in data sections
Data Certificates - signature chains over data sections for provenance
Object storage indexes

EC Section

Erasure coding sections enable efficient storage and retrieval of data with built-in redundancy:

Reed-Solomon encoding: Standard erasure coding scheme with configurable data/parity ratios
Systematic codes: Original data blocks remain unchanged, parity blocks provide recovery
Recovery capabilities: Reconstruct original data from any sufficient subset of blocks
Storage efficiency: Better space utilization compared to simple replication
Bandwidth optimization: Retrieve only necessary blocks for reconstruction

Cross-Sector Striping: EC can be distributed across multiple sectors, allowing a single piece to contain erasure coding stripes from multiple related deals:

Stripe aggregation: Multiple deals' data can be erasure coded together into shared parity blocks
Sector optimization: Fill sectors more efficiently by combining EC stripes from related content
Recovery coordination: Reconstruct data by gathering stripes from multiple pieces/sectors
Deal relationships: Group related deals (same client, content series, etc.) for optimal striping

EC sections would specify:

Coding parameters (k data blocks, m parity blocks)
Block size and alignment requirements
Cross-sector stripe mapping and indexing
Recovery algorithms for distributed stripes
Deal protocol extension for failure domain separation hints

Copyright

Copyright and related rights waived via CC0.

rvagg · 2025-11-15T13:30:56Z

rvagg
Nov 15, 2025
Maintainer

Initial thoughts after a first pass:

RawSize (62 bits) - actual size in bytes of the raw data before padding (pre-Fr32-padding)

We could consider an approach similar to PieceCIDv2 where we do raw size as a relationship to size rather than just having two complete size values taking up ~64 bits, if space is a concern at all.

Multicodec (64 bits) - content type identifier as uvarint-encoded multicodec value

since we have 64 bits, maybe we shouldn't bother with uvarint encoding

MulticodecDependent

Can you elaborate on how you imagine this being useful in the future? If we don't have a good imagined usecase for this then it might end up being as useless as the "capabilities" field in CARv2.

ACL

I'm not sure we have enough space here for robust ACL implementations, although I could imagine ACL multicodecs telling us that a certain other piece contains the extended ACL information, so maybe these fields can be used as a redirect there. e.g. the ACL node tells us that the ACL is defined by UCAN and has a CID for the UCAN, then one of the other pieces packaged is identified by a ucan multicodec and happens to have the CID we're looking for, so we end up with the ability to express more rich ACL definitions. Or is this too convoluted?

4 replies

magik6k Nov 24, 2025
Maintainer Author

We could consider an approach similar to PieceCIDv2 where we do raw size as a relationship to size rather than just having two complete size values taking up ~64 bits, if space is a concern at all.

That's possible, though for non-aligned pieces the concept of a piece-size doesn't really apply, and the concept of being some level in a tree definitely doesn't apply.

What works I think:

Keep RawSize u62
Add PieceLevel u8; 0 = not set, 1~~63 level of CommP tree, 64~~255 reserved
- When set the piece MUST be aligned to its size in the sector - this allows for backwards compatible, simpler inclusion proofs
- When not set there is no alignment requirements, and software should just assume piecesize = rawsize

magik6k Nov 24, 2025
Maintainer Author

since we have 64 bits, maybe we shouldn't bother with uvarint encoding

Correct, didn't catch that, but the idea was exactly that we don't want uvarint encoding here

magik6k Nov 24, 2025
Maintainer Author

MulticodecDependent - Can you elaborate on how you imagine this being useful in the future? If we don't have a good imagined usecase for this then it might end up being as useless as the "capabilities" field in CARv2.

Main idea is that it could contain hints about how the SP is expected to process the data, e.g. signal that data in the piece should be indexed at CID level, that CIDs should be announced to IPNI, etc.

magik6k Nov 24, 2025
Maintainer Author

I'm not sure we have enough space here for robust ACL implementations, although I could imagine ACL multicodecs telling us that a certain other piece contains the extended ACL information, so maybe these fields can be used as a redirect there. e.g. the ACL node tells us that the ACL is defined by UCAN and has a CID for the UCAN, then one of the other pieces packaged is identified by a ucan multicodec and happens to have the CID we're looking for, so we end up with the ability to express more rich ACL definitions. Or is this too convoluted?

It's pretty explicitly meant to be just enough space to be a useful pointer to something else. I actually don't want a per-piece ACL, any ACL scheme should apply across many pieces (e.g. whole S3 bucket), so that:

A single metedata operation can change ACL for Billions of object
ACL should be extremely cache-able on retrieval endpoints to minimize per-request overhead, especially when handling bulk data access

Kubuxu · 2025-11-15T13:59:02Z

Kubuxu
Nov 15, 2025
Collaborator

Still reading it and processing but I will start commenting on things that jump out.

Offset:
Must be 0 in the first entry of the index
Must be increasing in subsequent entries
Must be greater of equal to previous entry's Offset + Size

These properties are not verifiable by the client, as they receive only the inclusion proof into their index entry. So while we can say that implementations MUST follow this, the reader implementation MUST NOT refuse to use these indexes.

0 replies

magik6k · 2025-11-22T11:00:13Z

magik6k
Nov 22, 2025
Maintainer Author

Slides from FDS BA: https://docs.google.com/presentation/d/14jgME8p9O2FLx44930cOhE7riNDcWMV8jdrMHRp-Re0/edit?usp=sharing

Misaligned CommP inclusion proofs:

Red nodes are the path which needs to be proven, purple nodes are other data in a sector and zero-commitments in CommP computation - red merkle path just needs to prove top-level node of each blue tree to prove inclusion of the entire misaligned piece, meaning the cost of the entire inclusion proof is just ~2x of a normal single subtree inclusion proof. That comes with the obvious downside that the client must receive offset information to compute final commp.

0 replies

magik6k · 2026-01-07T15:38:14Z

magik6k
Jan 7, 2026
Maintainer Author

CommPat hasher code now in filecoin-project/go-fil-commp-hashhash#30, works under the assumption that commpv1(padding(atOff) || data) == commpv2(data, atOff), where atOff is arbitrary byte offset (so not just leaf-alignment)

0 replies

magik6k · 2026-01-08T15:05:58Z

magik6k
Jan 8, 2026
Maintainer Author

Other interesting use of the Proof of Data Segment Inclusion v2 is that it can be applied to existing data packed in .car files to prove that cids stored in a car-file are actually stored in a given sector (possibly very interesting to FIDL)

0 replies

eshon · 2026-01-08T15:17:04Z

eshon
Jan 8, 2026

"That comes with the obvious downside that the client must receive offset information to compute final commp."

Notes from a call about this:

By the proposed construction, the same data will produce a different commP from different providers, and even from different sectors of an Aurora SP, which means losing consistent e2e content-addressable verifiability which is one of the tenets of Filecoin.
This is sacrificed to reduce padding by 30%, to allow for more economical storage costs at scale. Customers can choose to add consistent e2e commP content-addressability instead and pay more.
There are other possible compromises including tail packing that may only require 10% padding. Although that may complicate other scalability strategies such as streaming directly disk.
There could be a "Filecoin-aligned" consistent commP offered as well as other options to maintain optionality.

1 reply

magik6k Jan 8, 2026
Maintainer Author

By the proposed construction, the same data will produce a different commP from different providers, and even from different sectors of an Aurora SP, which means losing consistent e2e content-addressable verifiability which is one of the tenets of Filecoin.

Let's think about how this looks like today:

No aggregation - user creates deal piece (e.g. Singularity, boost/"data prep" tooling)
- CommP directly visible on-chain
- Requires huge pieces (32G) to make sense
- Very expensive for clients
- Extremely high-touch tooling
- Only works for archival workloads and even for those it's painful
- Complete no-go for any other use
- Practical - Yes; PMF Possible - No, see the last 5 years
All small commitments visible on-chain (all small files are CommP v1)
- Efficient on-disk - all pieces really stored as original files
- Will saturate L1 very very quickly
- So use L2
- Will saturate L2s very quickly
- So use L3
- Storing that state and keeping it available with reasonable guarantees will be an R&D challenge way surpassing anything anything this network has built to this day
  - Or you just reinvent onramps
- We're talking 10s to 100s of millions of requests per second for to make meaningful impact
- Years away from practical
- But CommP is visible on-chain, whatever "on" and "chain" will mean at that scale
- Practical - Maybe eventually, but not even close now; PMF Possible - Sort of, but probably still needs some aggregation/onramp a lot of the time, so that the interfaces to storage are actually something most clients will want to use
Car with Dag aggregates (Estuary / OG .storage DAGCargo style)
- Traversing content-addressed graphs in any hot path is prohibitively expensive
- If deduplication gets involved garbage collection becomes nearly impossible
- Heavily reliant on impossible to partition, heavy, high-traffic indexes
- Absolutely no way to map "IPFS" CID -> CommP
  - This actually becomes possible with the CommPv2/PoDSIv2 construction even on existing data
  - Relies on the goodness of the SP to provide that inclusion proof
PoDSIv1
- Users upload smaller Filecoin pieces
- Onramp puts those into a PoDSI aggregate
- Users solely rely on the onramp/SP to provide information about piece placement in CommD (PoDSI header or direct inclusion proof, neither of which today is possible to fetch from SP software because SPs today don't cache TreeD btw)
- Without onramp/SP collaboration the user has no way to map their CommP -> CommD, CommP is not visible on-chain
PoDSIv2
- In addition to PoDSIv1 properties, users can opt into onboarding pieces in such a way that they can only construct a CommD-matching CommP if they receive offset information from an onramp
- Technically if the data is onboarded in such a way that it's related to some secondary hash (e.g. sha2) in such a way that there is an index section (as PoDSIv2 intends to support) it's also possible to get index information from in-sector metadata just living on the SP
- My opinion: The amount of cooperation from SP/Onramp required to know that some CommP is in some on-chain CommD is not meaningfully different than in case of PoDSIv1 (or a variant of v2 where data is stored at aligned start offsets without full post-piece padding)

This is sacrificed to reduce padding by 30%, to allow for more economical storage costs at scale. Customers can choose to add consistent e2e commP content-addressability instead and pay more.
There are other possible compromises including tail packing that may only require 10% padding. Although that may complicate other scalability strategies such as streaming directly disk.

Let's use a practical example here, dag_size distributions from legacy .storage.

Rawish data

select count(1) as occurrences, dag_size from content group by dag_size -> http://ipfs.io/ipfs/bafybeic6vzid4a6a232suxdqwqrqvlvmk6fr5uee6c4wp46bz44h5nwfyi (csv, ~90MB)

In a more human-readable form, e.g. distributions of piece sizes

level,lower,bucket,occurrences,total_size,pct_total_size,pct_total_count
1,0,254,6122376,900339421,0.000109,4.777374
2,255,508,13075921,4783641838,0.000581,10.20332
3,509,1016,17191900,12491054804,0.001517,13.415074
4,1017,2032,4923016,6533429089,0.000793,3.841497
5,2033,4064,3329748,9906723080,0.001203,2.598248
6,4065,8128,3177619,19023613029,0.00231,2.47954
7,8129,16256,3429383,40773990134,0.00495,2.675994
8,16257,32512,3240724,77120022608,0.009363,2.528781
9,32513,65024,5846984,283199427810,0.034384,4.562481
10,65025,130048,7649230,721579700940,0.087608,5.968799
11,130049,260096,11804455,2294443649883,0.278573,9.211177
12,260097,520192,16324606,6150835595040,0.746785,12.738313
13,520193,1040384,12939775,9595720781817,1.165036,10.097083
14,1040385,2080768,7000458,10355232150614,1.25725,5.462553
15,2080769,4161536,4536655,13175702021980,1.599689,3.540014
16,4161537,8323072,3341243,19571407734551,2.376204,2.607218
17,8323073,16646144,1410928,16714548378008,2.029347,1.100966
18,16646145,33292288,1315190,29546202827527,3.587265,1.026261
19,33292289,66584576,447647,20523712044436,2.491826,0.349305
20,66584577,133169152,398813,38767538144299,4.706846,0.311199
21,133169153,266338304,228093,44129392078762,5.357839,0.177984
22,266338305,532676608,202026,72919510125885,8.853306,0.157644
23,532676609,1065353216,103206,77133829690097,9.364975,0.080533
24,1065353217,2130706432,64877,97108953929306,11.790196,0.050624
25,2130706433,4261412864,24652,72432965400698,8.794234,0.019236
26,4261412865,8522825728,12886,75883171943681,9.21313,0.010055
27,8522825729,17045651456,6233,75509878699237,9.167808,0.004864
28,17045651457,34091302912,4090,100733596647657,12.230271,0.003191
29,34091302913,68182605824,860,39918608051681,4.8466,0.000671

Crunching this with a small utility to simulate packing sectors - https://gist.github.com/magik6k/bcdba170413b9e47b08fa7454cdc1fe8

Dot-Storage data
Strategy                                  Mean(GiB)     Std(MiB)      Util%       Blocks       95% CI
------------------------------------------------------------------------------------------------------
1. Padded sequential                         13.974     2116.673      44.01         8541    300.98 MiB
2. Unpadded sequential                       19.299     1415.369      60.79        11204    201.26 MiB
3a. Pick-10 min align (unpadded)             22.781     2935.772      71.75        18679    417.45 MiB
3b. Pick-10 largest (unpadded)               19.649     1794.177      61.89         5883    255.12 MiB
3c. Pick-10 min waste (unpadded)             20.383     2272.866      64.20        18645    323.19 MiB
3d. Pick-10 small first (unpadded)           19.887     1612.088      62.64        18659    229.23 MiB
4a. Pick-10 min align (padded)               20.424     2032.974      64.33        18625    289.08 MiB
4b. Pick-10 largest (padded)                 14.963     2656.132      47.13         2375    377.68 MiB
4c. Pick-10 min waste (padded)               19.298     2264.457      60.78        18631    321.99 MiB
4d. Pick-10 small first (padded)             18.945     1680.853      59.67        18646    239.01 MiB

P-values (Welch t-test vs strategy 3a):
  1. Padded sequential                : p=3.72e-113 *** (diff: +8.807 GiB)
  2. Unpadded sequential              : p=9.09e-38 *** (diff: +3.482 GiB)
  3b. Pick-10 largest (unpadded)      : p=1.22e-30 *** (diff: +3.132 GiB)
  3c. Pick-10 min waste (unpadded)    : p=5.72e-18 *** (diff: +2.399 GiB)
  3d. Pick-10 small first (unpadded)  : p=5.87e-28 *** (diff: +2.894 GiB)
  4a. Pick-10 min align (padded)      : p=1.64e-18 *** (diff: +2.357 GiB)
  4b. Pick-10 largest (padded)        : p=2.45e-93 *** (diff: +7.818 GiB)
  4c. Pick-10 min waste (padded)      : p=7.02e-33 *** (diff: +3.483 GiB)
  4d. Pick-10 small first (padded)    : p=3.88e-42 *** (diff: +3.836 GiB)

-----
Log-normal data: 127905308 blocks, mean=7233426.28 bytes

Strategy                                  Mean(GiB)     Std(MiB)      Util%       Blocks       95% CI
------------------------------------------------------------------------------------------------------
1. Padded sequential                         14.655      684.001      46.16         2217     97.26 MiB
2. Unpadded sequential                       18.823      695.925      59.28         2861     98.96 MiB
3a. Pick-10 min align (unpadded)             25.586      130.770      80.59        15297     18.59 MiB
3b. Pick-10 largest (unpadded)               19.056      729.256      60.02          464    103.70 MiB
3c. Pick-10 min waste (unpadded)             22.660      117.450      71.37        15451     16.70 MiB
3d. Pick-10 small first (unpadded)           20.456       77.095      64.43        15721     10.96 MiB
4a. Pick-10 min align (padded)               21.582      116.420      67.97        14309     16.55 MiB
4b. Pick-10 largest (padded)                 15.889      697.586      50.04          376     99.19 MiB
4c. Pick-10 min waste (padded)               22.203      113.133      69.93        15389     16.09 MiB
4d. Pick-10 small first (padded)             19.964       85.504      62.88        15642     12.16 MiB

P-values (Welch t-test vs strategy 3a):
  1. Padded sequential                : p=6.95e-244 *** (diff: +10.931 GiB)
  2. Unpadded sequential              : p=4.48e-200 *** (diff: +6.763 GiB)
  3b. Pick-10 largest (unpadded)      : p=3.15e-192 *** (diff: +6.530 GiB)
  3c. Pick-10 min waste (unpadded)    : p=0.00e+00 *** (diff: +2.926 GiB)
  3d. Pick-10 small first (unpadded)  : p=0.00e+00 *** (diff: +5.129 GiB)
  4a. Pick-10 min align (padded)      : p=0.00e+00 *** (diff: +4.004 GiB)
  4b. Pick-10 largest (padded)        : p=3.12e-231 *** (diff: +9.697 GiB)
  4c. Pick-10 min waste (padded)      : p=0.00e+00 *** (diff: +3.383 GiB)
  4d. Pick-10 small first (padded)    : p=0.00e+00 *** (diff: +5.621 GiB)

Most important numbers:

Padded sequential - naive CommP packing with .storage data would be 44.01% efficient (waaay worse than 70% we were thinking it is) - makes sense because if there is a tiny piece between two big pieces we waste tons of space
- Even if each time we optimize for minimum alignment choosing between 10 randomly picked pieces we get only 64.33% (4a)
Sequential "Unpadded" packing (this is the one where we only align to piece size, but there is no forced padding after piece, aka what @ribasushi was proposing) is pretty terrible at only 60.79% utilization
- It's still really bad at only 71.75% (or 80.59% with a much more optimistic clean log-normal distribution) with pick-best-of-10 minimum alignment strategy
- If you're wondering even -pick 1000 -partition-mult 10 -> Pick-1000 min align (unpadded) is 82.70% but the luxry to be that picky about packing is only really available when constructing sectors from data in a staging area somewhere else.
- Neither 30% nor 18% is the "10%" that this packing should have

Basically tldr we can only get to 30%, maaaybe 20% overhead with relatively usable packing strategies. Or we can choose to not solve the NP-hard binpacking problem and just allow for no alignment in cases where the 20~30% is majority of someones margin.

There could be a "Filecoin-aligned" consistent commP offered as well as other options to maintain optionality.

I hope I made the point above. 20-30% reduction in costs is massive, especially when you look at cost of business and margins, which with what we offer and in what markets are pretty thin.

Also, as explained above, I don't agree that PoDSIv2/CommPv2 is meaningfully "less Filecoin aligned" than PoDSIv1/CommPv1. I do agree it's slightly more complex but I don't believe it's more complicated than it needs to be

Optionality is 100% up to the onramp and user - just a matter of allowing users to opt into storing data at a specific offset in padding - CommPv1(data) == CommPv2(data, offset=0) == CommPv2(data, offset = N * PaddedSize(len(data))) (this is what the simulations above call Unpadded). Filecoin-Gateway does retain that optionality but by the looks of it storing data in this way will be ~2x more expensive.

magik6k · 2026-01-08T15:19:23Z

magik6k
Jan 8, 2026
Maintainer Author

The current proposal for CommPv2 also has the ability to work in a semi-tightly packed mode, where the data start offset is aligned to start-of-data - giving users CommP which matches CommP v1.

Note that the output CommP from filecoin-project/go-fil-commp-hashhash#30 for CommP2(data, largeOffset) == CommP2(data, largeOffset % PaddedSize(len(data))) (perhaps the interface of that function should be changed to make this much more obvious)

This is visible in the (slightly jank) visualizer on https://ipfs.io/ipfs/bafkreiacuhuhzbfvplw7d3lzei5vg4uyczoebpq3ihhffplij4he3opply on the bottom visualization when sliding the 'Segment Offset' slider

0 replies

magik6k · 2026-03-16T12:52:45Z

magik6k
Mar 16, 2026
Maintainer Author

Alternate idea which makes client hash/cid calculation much simpler and not requiring communication, coming from the insight that:

CommP and especially CommPat aren't really great convenient hash functions with huge ecosystem
The v2 DSI Proof described here is entirely separate / tangential to CommPat - CommPat/CommPv2 had only one useful property which is somewhat cheap way to check that the aggregated data == the index
But this property alone isn't that useful, what matters more is the commitment of client-known CID to a sector
So alternatively it makes much more sense to just put a whole CID of a payload - and for the 'aggregate put together correctly' the verification would use a committed index (e.g. challenges generated from the hash of the index) and prove inclusions of N leaves within that CID
- It is more expensive than just CommP when 'not just commp' is used, but still allows for cheap commp-only inclusion proving
- It is using strictly stable hashes from client perspective
- Much more friendly to experimentation.
- Some constraints on the CID would apply: 32bytes max (maybe 64?), must be able to create merkle proofs (blake3/bao, unixfs, CommP)

The structure would go from:

Node 1 (254 bits):
CommDS (254 bits) - commitment to the data segment

Node 2 (254 bits):
Offset (64 bits) - offset in bytes from the start of the deal to the Data Segment (pre-Fr32-padding)
Size (64 bits) - size in bytes of the Data Segment after padding to alignment boundary (pre-Fr32-padding)
RawSize (62 bits) - actual size in bytes of the raw data before padding (pre-Fr32-padding)
Multicodec (64 bits) - content type identifier as uvarint-encoded multicodec value

Node 3 (254 bits):
MulticodecDependent (254 bits) - multicodec-specific data

Node 4 (254 bits):
ACLType (8 bits) - ACL type indicator (0 = no ACL, other values specified in future FRC)
ACLData (64 bits) - ACL type-specific data
Reserved (56 bits) - reserved for future versions of this FRC
Checksum (126 bits) - SHA-256 checksum truncated to 126 bits

To:

Node 1 (254 bits):
CommData (254 bits) - commitment to data (multihash hash payload, e.g. sha256 bytes)

Node 2 (254 bits):
Multicodec (64 bits) - content type identifier as uvarint-encoded multicodec value
Multihash (64 bits) - hash type identifier as uvarint-encoded multihash value
Reserved (124 bits) - reserved; Maybe "construction" e.g. spec of unixfs chunking/hashing/etc rules
Last two bits of the 254 leaf - Last two bits of CommData (a 256 hash)

Node 3 (254 bits):
Reserved (64 bits)
Offset (64 bits) - offset in bytes from the start of the deal to the Data Segment (pre-Fr32-padding)
Size (64 bits) - size in bytes of the Data Segment after padding to alignment boundary (pre-Fr32-padding)
RawSize (62 bits) - actual size in bytes of the raw data before padding (pre-Fr32-padding)

Node 4 (254 bits):
ACLType (8 bits) - ACL type indicator (0 = no ACL, other values specified in future FRC)
ACLData (64 bits) - ACL type-specific data
Reserved (56 bits) - reserved for future versions of this FRC
Checksum (126 bits) - SHA-256 checksum truncated to 126 bits

With that the format becomes spiritually a .car file, except without a header which breaks data alignment and no metadata woven into data which obliterates alignment + a real useful index at the end of the payload. As easy to handle as a .car since most tooling can easily be adjusted and fundamentally the CIDs are just most normal IPLD CIDs

0 replies

FRC Proposal: Self-Describing Data Aggregation (Data Segment Index v2) #1216

Uh oh!

magik6k Nov 14, 2025 Maintainer

Self-Describing Data Aggregation (Data Segment Index v2)

Simple Summary

Abstract

Change Motivation

Specification

Data Segment Index Entry v2

Field Details

Index Entry Validation

Alignment Recommendations

Storage Provider Data Processing

Proof of Data Segment Inclusion v2

Proof Structure

Proof Components

Verification Algorithm

Piece CID v2 Computation

Compatibility with v1

Design Rationale

Why 4 Nodes (256 bytes)?

Why Multicodec-Dependent Space?

Why ACL Signal (9 bytes)?

Why Recommend 127-byte Alignment?

Storage Efficiency Trade-offs

Backwards Compatibility

Reading v1 Indexes

Index Version Detection

Test Cases

Test Case 1: v1-Compatible Segment

Test Case 2: Tightly Packed Segment

Test Case 3: Arbitrary Alignment

Test Case 4: Unknown ACL Type

Security Considerations

Increased Entry Size

Malicious Padding

MulticodecDependent Field

ACL Signal Fields

Proof Complexity

Incentive Considerations

Storage Provider Benefits

Client Benefits

Product Considerations

Aggregator Implementation

Retrieval Integration

Implementation

Future Work

ACL Type Definitions

Index Sections

EC Section

Copyright

Replies: 8 comments · 5 replies

Uh oh!

rvagg Nov 15, 2025 Maintainer

Uh oh!

magik6k Nov 24, 2025 Maintainer Author

Uh oh!

magik6k Nov 24, 2025 Maintainer Author

Uh oh!

magik6k Nov 24, 2025 Maintainer Author

Uh oh!

magik6k Nov 24, 2025 Maintainer Author

Uh oh!

Uh oh!

Kubuxu Nov 15, 2025 Collaborator

Uh oh!

magik6k Nov 22, 2025 Maintainer Author

Uh oh!

magik6k Jan 7, 2026 Maintainer Author

Uh oh!

magik6k Jan 8, 2026 Maintainer Author

Uh oh!

Uh oh!

eshon Jan 8, 2026

Uh oh!

magik6k Jan 8, 2026 Maintainer Author

Uh oh!

Uh oh!

magik6k Jan 8, 2026 Maintainer Author

Uh oh!

magik6k
Nov 14, 2025
Maintainer

Replies: 8 comments 5 replies

rvagg
Nov 15, 2025
Maintainer

magik6k Nov 24, 2025
Maintainer Author

magik6k Nov 24, 2025
Maintainer Author

magik6k Nov 24, 2025
Maintainer Author

magik6k Nov 24, 2025
Maintainer Author

Kubuxu
Nov 15, 2025
Collaborator

magik6k
Nov 22, 2025
Maintainer Author

magik6k
Jan 7, 2026
Maintainer Author

magik6k
Jan 8, 2026
Maintainer Author

eshon
Jan 8, 2026

magik6k Jan 8, 2026
Maintainer Author

magik6k
Jan 8, 2026
Maintainer Author

magik6k
Mar 16, 2026
Maintainer Author