You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(ipip-0499): add singularity to divergence table
include singularity as example showing balanced layout has implementation
variants that affect CID determinism for large files:
- document balanced-packed DAG layout variant
(data-preservation-programs/singularity#525)
- note boxo defaults for HAMT parameters
- note rclone defaults for hidden files and symlinks
Copy file name to clipboardExpand all lines: src/ipips/ipip-0499.md
+59-18Lines changed: 59 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -75,7 +75,8 @@ The following [UnixFS](https://specs.ipfs.tech/unixfs/) parameters were identifi
75
75
1. UnixFS file chunking algorithm and chunk size (e.g., fixed-size chunks of 256KiB)
76
76
1. UnixFS DAG layout:
77
77
-`balanced`: builds a balanced tree where all leaf nodes are at the same depth. Optimized for random access, seeking, and range requests within files (e.g., video).
78
-
-`trickle`: builds a tree optimized for streaming, where data can be consumed before the entire file is available. Useful for logs and other append-only data structures where random access is not important.
78
+
-`balanced-packed`: variant of `balanced` that may produce different tree structure for large files. See [Balanced DAG layout variants](#balanced-dag-layout-variants) below.
79
+
-`trickle`: builds a tree optimized for on-the-fly one-time streaming, where data can be consumed before the entire file is available. Useful for logs and other append-only data structures where random access is not important.
79
80
1. UnixFS DAG width (max number of links per `File` node)
80
81
1.[HAMTDirectory](https://specs.ipfs.tech/unixfs/#dag-pb-hamtdirectory) fanout: the branching factor at each level of the HAMT tree (e.g., 256 leaves).
81
82
1.[HAMTDirectory threshold](https://specs.ipfs.tech/unixfs/#when-to-use-hamt-sharding): max `Directory` size before converting to `HAMTDirectory`, based on `PBNode.Links` count or estimated serialized [dag-pb](https://ipld.io/specs/codecs/dag-pb/spec/) size:
@@ -91,27 +92,57 @@ The following [UnixFS](https://specs.ipfs.tech/unixfs/) parameters were identifi
The `balanced` DAG layout has implementation variants that affect CID determinism for large files. CID mismatches have been [observed](https://discuss.ipfs.tech/t/should-we-profile-cids/18507/41) and [investigated](https://discuss.ipfs.tech/t/should-we-profile-cids/18507/44) when comparing [kubo][] and [Singularity][singularity] outputs for files exceeding 1 GiB. This IPIP introduces the name `balanced-packed` to distinguish Singularity's variant from the original `balanced` layout.
98
+
99
+
Implementations adopting a profile SHOULD specify which balanced variant they use. The `unixfs-v1-2025` profile uses `balanced` for maximum compatibility with existing implementations.
100
+
101
+
##### `balanced`
102
+
103
+
The original balanced layout used by [kubo][]/[boxo][], [helia][], and others in the ecosystem. Builds the tree incrementally as chunks stream in:
104
+
- Starts with first chunk as root, grows tree upward as needed
105
+
- Uses explicit depth tracking to fill nodes recursively
106
+
- All leaf nodes end up at the **same depth** from the root
According to [Singularity issue #525](https://github.com/data-preservation-programs/singularity/issues/525):
121
+
> "In Singularity's DAG, the last leaf node is not at the same distance from the root as the others."
122
+
123
+
This structural difference causes CID mismatches for files larger than `chunk_size * dag_width` (e.g., >1 GiB with 1 MiB chunks and 1024 links per node), even when all other parameters match.
124
+
94
125
### Divergence in current implementations
95
126
96
127
We analyzed the default settings across the most popular UnixFS implementations in the ecosystem. The table below documents the divergence that prevents deterministic CID generation today:
@@ -121,6 +152,9 @@ We analyzed the default settings across the most popular UnixFS implementations
121
152
-`opt-out`: Included by default; implementations provide a flag to exclude
122
153
-`preserved`: Symlinks stored as UnixFS Type=4 nodes with target path (per [UnixFS spec](https://specs.ipfs.tech/unixfs/)). Note: Kubo (v0.39) `--dereference-args` only follows symlinks passed as CLI arguments; symlinks found during recursive traversal are always preserved.
123
154
-`followed`: Symlinks dereferenced and treated as target files/directories
155
+
-`skipped`: Symlinks ignored during traversal (not included in DAG)
0 commit comments