Skip to content

Commit 37132f1

Browse files
committed
feat(ipip-0499): add singularity to divergence table
include singularity as example showing balanced layout has implementation variants that affect CID determinism for large files: - document balanced-packed DAG layout variant (data-preservation-programs/singularity#525) - note boxo defaults for HAMT parameters - note rclone defaults for hidden files and symlinks
1 parent 26162e2 commit 37132f1

1 file changed

Lines changed: 59 additions & 18 deletions

File tree

src/ipips/ipip-0499.md

Lines changed: 59 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,8 @@ The following [UnixFS](https://specs.ipfs.tech/unixfs/) parameters were identifi
7575
1. UnixFS file chunking algorithm and chunk size (e.g., fixed-size chunks of 256KiB)
7676
1. UnixFS DAG layout:
7777
- `balanced`: builds a balanced tree where all leaf nodes are at the same depth. Optimized for random access, seeking, and range requests within files (e.g., video).
78-
- `trickle`: builds a tree optimized for streaming, where data can be consumed before the entire file is available. Useful for logs and other append-only data structures where random access is not important.
78+
- `balanced-packed`: variant of `balanced` that may produce different tree structure for large files. See [Balanced DAG layout variants](#balanced-dag-layout-variants) below.
79+
- `trickle`: builds a tree optimized for on-the-fly one-time streaming, where data can be consumed before the entire file is available. Useful for logs and other append-only data structures where random access is not important.
7980
1. UnixFS DAG width (max number of links per `File` node)
8081
1. [HAMTDirectory](https://specs.ipfs.tech/unixfs/#dag-pb-hamtdirectory) fanout: the branching factor at each level of the HAMT tree (e.g., 256 leaves).
8182
1. [HAMTDirectory threshold](https://specs.ipfs.tech/unixfs/#when-to-use-hamt-sharding): max `Directory` size before converting to `HAMTDirectory`, based on `PBNode.Links` count or estimated serialized [dag-pb](https://ipld.io/specs/codecs/dag-pb/spec/) size:
@@ -91,27 +92,57 @@ The following [UnixFS](https://specs.ipfs.tech/unixfs/) parameters were identifi
9192
1. [Mode](https://specs.ipfs.tech/unixfs/#mode-field): optional POSIX file permissions.
9293
1. [Mtime](https://specs.ipfs.tech/unixfs/#mtime-field): optional modification timestamp.
9394

95+
#### Balanced DAG layout variants
96+
97+
The `balanced` DAG layout has implementation variants that affect CID determinism for large files. CID mismatches have been [observed](https://discuss.ipfs.tech/t/should-we-profile-cids/18507/41) and [investigated](https://discuss.ipfs.tech/t/should-we-profile-cids/18507/44) when comparing [kubo][] and [Singularity][singularity] outputs for files exceeding 1 GiB. This IPIP introduces the name `balanced-packed` to distinguish Singularity's variant from the original `balanced` layout.
98+
99+
Implementations adopting a profile SHOULD specify which balanced variant they use. The `unixfs-v1-2025` profile uses `balanced` for maximum compatibility with existing implementations.
100+
101+
##### `balanced`
102+
103+
The original balanced layout used by [kubo][]/[boxo][], [helia][], and others in the ecosystem. Builds the tree incrementally as chunks stream in:
104+
- Starts with first chunk as root, grows tree upward as needed
105+
- Uses explicit depth tracking to fill nodes recursively
106+
- All leaf nodes end up at the **same depth** from the root
107+
- Reference: [`boxo/ipld/unixfs/importer/balanced/builder.go`](https://github.com/ipfs/boxo/blob/v0.35.2/ipld/unixfs/importer/balanced/builder.go)
108+
109+
##### `balanced-packed`
110+
111+
Name introduced by this IPIP for [Singularity][singularity]'s variant. Groups pre-computed links in batch:
112+
- Takes all chunk links as input, then packs them into parent nodes (up to max width)
113+
- Repeats packing level-by-level until single root remains
114+
- Trailing nodes may have fewer children, causing leaf depth to vary
115+
- Optimized for batch processing of pre-chunked data in CAR files
116+
- Reference: [`singularity/pack/packutil/util.go`](https://github.com/data-preservation-programs/singularity/blob/v0.6.0-RC4/pack/packutil/util.go) `AssembleFileFromLinks()`
117+
118+
##### Observed differences
119+
120+
According to [Singularity issue #525](https://github.com/data-preservation-programs/singularity/issues/525):
121+
> "In Singularity's DAG, the last leaf node is not at the same distance from the root as the others."
122+
123+
This structural difference causes CID mismatches for files larger than `chunk_size * dag_width` (e.g., >1 GiB with 1 MiB chunks and 1024 links per node), even when all other parameters match.
124+
94125
### Divergence in current implementations
95126

96127
We analyzed the default settings across the most popular UnixFS implementations in the ecosystem. The table below documents the divergence that prevents deterministic CID generation today:
97128

98-
| Parameter | kubo (CIDv0) | helia | storacha | kubo (CIDv1) | dasl |
99-
| ----------------------------- | ------------------------ | -------------------- | ------------------ | ----------------------------- | ------------ |
100-
| Based on | v0.39 (`unixfs-v0-2015`) | @helia/unixfs 6.0.4 | w3cli 7.12.0 | v0.39 (`test-cid-v1` profile) | spec 2025-12 |
101-
| CID version | CIDv0 | CIDv1 | CIDv1 | CIDv1 | CIDv1 |
102-
| Hash function | sha2-256 | sha2-256 | sha2-256 | sha2-256 | sha2-256 |
103-
| Chunking algorithm | fixed-size | fixed-size | fixed-size | fixed-size | N/A |
104-
| Max chunk size | 256KiB | 1MiB | 1MiB | 1MiB | N/A |
105-
| DAG layout | balanced | balanced | balanced | balanced | N/A |
106-
| DAG width (children per node) | 174 | 1024 | 1024 | 174 | N/A |
107-
| HAMTDirectory fanout | 256 blocks | 256 blocks | 256 blocks | 256 blocks | N/A |
108-
| HAMTDirectory threshold | 256KiB (links-bytes) | 256KiB (links-bytes) | 1000 (links-count) | 256KiB (links-bytes) | N/A |
109-
| Leaves | dag-pb | raw | raw | raw | N/A |
110-
| Empty directories | included | included | excluded | included | N/A |
111-
| Hidden entities | excluded (opt-in) | excluded (opt-in) | excluded (opt-in) | excluded (opt-in) | N/A |
112-
| Symlinks | preserved | followed | followed | preserved | N/A |
113-
| Mode (permissions) | excluded (opt-in) | excluded (opt-in) | not supported | excluded (opt-in) | N/A |
114-
| Mtime (modification time) | excluded (opt-in) | excluded (opt-in) | not supported | excluded (opt-in) | N/A |
129+
| Parameter | [kubo][] (CIDv0) | [helia][] | [storacha][] | [kubo][] (CIDv1) | [singularity][] | [dasl][] |
130+
| ----------------------------- | ------------------------ | -------------------- | ------------------ | ----------------------------- | ----------------------------------- | ------------ |
131+
| Based on | v0.39 (`unixfs-v0-2015`) | @helia/unixfs 6.0.4 | w3cli 7.12.0 | v0.39 (`test-cid-v1` profile) | v0.6.0-RC4 (454b630) | spec 2025-12 |
132+
| CID version | CIDv0 | CIDv1 | CIDv1 | CIDv1 | CIDv1 | CIDv1 |
133+
| Hash function | sha2-256 | sha2-256 | sha2-256 | sha2-256 | sha2-256 | sha2-256 |
134+
| Chunking algorithm | fixed-size | fixed-size | fixed-size | fixed-size | fixed-size | N/A |
135+
| Max chunk size | 256KiB | 1MiB | 1MiB | 1MiB | 1MiB | N/A |
136+
| DAG layout | balanced | balanced | balanced | balanced | [balanced-packed](#balanced-packed) | N/A |
137+
| DAG width (children per node) | 174 | 1024 | 1024 | 174 | 1024 | N/A |
138+
| HAMTDirectory fanout | 256 blocks | 256 blocks | 256 blocks | 256 blocks | 256 blocks (boxo) | N/A |
139+
| HAMTDirectory threshold | 256KiB (links-bytes) | 256KiB (links-bytes) | 1000 (links-count) | 256KiB (links-bytes) | 256KiB (links-bytes) (boxo) | N/A |
140+
| Leaves | dag-pb | raw | raw | raw | raw | N/A |
141+
| Empty directories | included | included | excluded | included | included | N/A |
142+
| Hidden entities | excluded (opt-in) | excluded (opt-in) | excluded (opt-in) | excluded (opt-in) | included (rclone) | N/A |
143+
| Symlinks | preserved | followed | followed | preserved | skipped (rclone) | N/A |
144+
| Mode (permissions) | excluded (opt-in) | excluded (opt-in) | not supported | excluded (opt-in) | not supported | N/A |
145+
| Mtime (modification time) | excluded (opt-in) | excluded (opt-in) | not supported | excluded (opt-in) | not supported | N/A |
115146

116147
**Terminology:**
117148

@@ -121,6 +152,9 @@ We analyzed the default settings across the most popular UnixFS implementations
121152
- `opt-out`: Included by default; implementations provide a flag to exclude
122153
- `preserved`: Symlinks stored as UnixFS Type=4 nodes with target path (per [UnixFS spec](https://specs.ipfs.tech/unixfs/)). Note: Kubo (v0.39) `--dereference-args` only follows symlinks passed as CLI arguments; symlinks found during recursive traversal are always preserved.
123154
- `followed`: Symlinks dereferenced and treated as target files/directories
155+
- `skipped`: Symlinks ignored during traversal (not included in DAG)
156+
- `(rclone)`: Singularity delegates file traversal to [rclone](https://rclone.org/); values shown reflect rclone defaults
157+
- `(boxo)`: Singularity overrides some [boxo][] defaults but relies on implicit boxo defaults for these values
124158

125159
## Detailed design
126160

@@ -205,6 +239,13 @@ specification compliance. This section can be skipped if IPIP does not deal
205239
with the way IPFS handles content-addressed data, or the modified specification
206240
file already includes this information.
207241

242+
[kubo]: https://github.com/ipfs/kubo
243+
[boxo]: https://github.com/ipfs/boxo
244+
[helia]: https://github.com/ipfs/helia
245+
[storacha]: https://github.com/storacha/w3cli
246+
[singularity]: https://github.com/data-preservation-programs/singularity
247+
[dasl]: https://dasl.ing
248+
208249
### Copyright
209250

210251
Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).

0 commit comments

Comments
 (0)