Skip to content

Commit d8b8389

Browse files
committed
incorporate kubo#10774
Import.* config params for controlling DAG width were added in: ipfs/kubo#10774
1 parent 6cc64cb commit d8b8389

1 file changed

Lines changed: 40 additions & 42 deletions

File tree

src/ipips/ipip-0499.md

Lines changed: 40 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -6,74 +6,63 @@ date: 2025-04-03
66
ipip: proposal
77
editors:
88
- name: Michelle Lee
9+
github: mishmosh
10+
affiliation:
11+
name: IPFS Foundation
912
relatedIssues:
10-
- n/a
11-
order: 0000
13+
- https://discuss.ipfs.tech/t/should-we-profile-cids/18507
14+
order: 0499
1215
tags: ['ipips']
1316
---
1417

1518
## Summary
1619

1720
<!--One paragraph explanation of the IPIP.-->
18-
This proposal introduces profiles for IPFS CIDs. Profiles explicitly define CID version, hash algorithm, chunk size, DAG width, layout, and other parameters.
21+
This proposal introduces profiles for IPFS CIDs. Profiles explicitly define CID version, hash algorithm, chunk size, DAG width, layout, and other parameters.
1922

2023
## Motivation
2124

2225
Currently, CIDs can be generated with a variety of settings and optimizations for chunking, DAG width, and more. This means the same file can yield multiple, different CIDs depending on which tools and settings are used, and it is not possible to reliably reproduce or verify the CID. Profiles offer With profiles, following the same profile will produce identical CIDs for identical content, whic makes verification regardless of implementation.
2326

2427
## Detailed design
2528

26-
We introduce a profile naming system,
29+
We introduce a profile naming system,
2730

2831
Each profile must specify the following characteristics:
2932

3033
1. CID version (currently only CIDv0 or CIDv1)
31-
2. Hash algorithm
32-
3. UnixFS Chunk size (explicitly set, not contextual/reactive to input)
33-
4. UnixFS directory DAG width
34-
5. UnixFS directory DAG layout
35-
6. HAMT directory DAG threshold
36-
7. HAMT directory DAG width
37-
8. Leaf Envelope (historically dag-pb, now none/raw)
38-
9. Allow empty directories
39-
6. Required
34+
1. Hash algorithm
35+
1. UnixFS Chunk algorithm (e.g. size-based or content-based)
36+
1. UnixFS directory DAG layout (e.g. balanced, trickle)
37+
1. UnixFS file DAG width (max number of links per `File` node)
38+
1. UnixFS directory DAG width (max number of links per basic `Directory` node)
39+
1. UnixFS HAMT directory DAG threshold (max `Directory` size before switching to `HAMTDirectory`)
40+
1. HAMT directory DAG width (max number of fanout links per internal HAMTDirectory node)
41+
1. Leaf Envelope (historically `dag-pb`, CIDv1 introduced `raw` leaves)
42+
1. Empty directories (informative suggestion)
4043

4144
Additional profiles can be added at a future date. Profile names may be chosen from the names of any botanical tree with compound leaves.
4245

43-
| | Helia default | Kubo default | Storacha default | "test-cid-v1" profile | DASL |
44-
|-------------|---------------|-----------------------------|------------------|-----------------------|---------------|
45-
| CID version | CIDv1 | CIDv1 | CIDv1 | CIDv1 | CIDv1 |
46-
| Hash Algo | sha-256 | sha-256 | sha-256 | sha-256 | sha-256 |
47-
| Chunk size | 1MiB | 256KiB | 1MiB | 1MiB | not specified |
48-
| DAG width | 1024 | 174 (but it's complicated*) | 1024 | 174 | not specified |
49-
| DAG layout | balanced | balanced | balanced | balanced | not specified |
50-
| HAMT threshold | 256KiB (est) | 256KiB (est) | 1000 **links** | 256KiB | not specified |
51-
| HAMT width | 256 blocks | 256 blocks | 256 blocks | 256 blocks | not specified |
52-
| Leaves | raw | raw | raw | raw | not specified |
53-
| EmptyDirs | allowed | allowed | disallowed | allowed | not specified |
54-
55-
5646
This would be specified as a table in (forthcoming UnixFS spec).
5747

58-
59-
6048
## Design rationale
6149

62-
The profile names are chosen to be easy to pronounce.
63-
64-
Here is a summary table of current defaults, thanks to input & clarifications from @2color @achingbrain @lidel:
50+
The profile names are chosen to be easy to pronounce.
6551

66-
| | Helia default | Kubo default | Storacha default | "test-cid-v1" profile | DASL |
67-
|-------------|---------------|-----------------------------|------------------|-----------------------|---------------|
68-
| CID version | CIDv1 | CIDv1 | CIDv1 | CIDv1 | CIDv1 |
69-
| Hash Algo | sha-256 | sha-256 | sha-256 | sha-256 | sha-256 |
70-
| Chunk size | 1MiB | 256KiB | 1MiB | 1MiB | not specified |
71-
| DAG width | 1024 | 174 (but it's complicated*) | 1024 | 174 | not specified |
72-
| DAG layout | balanced | balanced | balanced | balanced | not specified |
52+
Here is a summary table of current (2025-Q2) defaults, thanks to input & clarifications from @2color @achingbrain @lidel:
7353

74-
* Kubo has 2 different default DAG widths:
75-
* For HAMT-sharded directories, the `DefaultShardWidth` [here](https://github.com/ipfs/boxo/blob/f1d5312e3be45d151bb9c8f11c9283820687bea3/ipld/unixfs/io/directory.go#L30) is 256.
76-
* For files, `DefaultLinksPerBlock` [here](https://github.com/ipfs/boxo/blob/v0.29.0/ipld/unixfs/importer/helpers/helpers.go#L30) is ~174
54+
| | Helia default | Kubo `legacy-cid-v0` (default) | Storacha default | Kubo `test-cid-v1` | Kubo `test-cid-v1-wide` | DASL |
55+
|---------------------------------|---------------|-----------------------------------|------------------|--------------------|---------------------------|---------------|
56+
| CID version | CIDv1 | CIDv0 | CIDv1 | CIDv1 | CIDv1 | CIDv1 |
57+
| Hash Algo | sha-256 | sha-256 | sha-256 | sha-256 | sha-256 | sha-256 |
58+
| Chunk size | 1MiB | 256KiB | 1MiB | 1MiB | 1MiB | not specified |
59+
| Max links `File` node | 1024 | 174 | 1024 | 174 | **1024** | not specified |
60+
| Max links `Directory` node | ? | 0 | ? | 0 | 0 | ? |
61+
| Max fanout `HAMTDirectory` node | 256 blocks | 256 blocks | 256 blocks | 256 blocks | **1024** | not specified |
62+
| `HAMTDirectory` threshold | 256KiB (est) | 256KiB (est:links[name+cid]) | 1000 **links** | 256KiB | **1MiB** | not specified |
63+
| DAG layout | balanced | balanced | balanced | balanced | balanced | not specified |
64+
| Leaves | raw | raw | raw | raw | raw | not specified |
65+
| Empty directories | allowed | allowed | disallowed | allowed | allowed | not specified |
7766

7867
See related discussion at https://discuss.ipfs.tech/t/should-we-profile-cids/18507/
7968

@@ -85,7 +74,7 @@ Reliable, deterministic CIDs allow independent verification of content across to
8574

8675
Implementations will need to (1) make CID generation settings configurable and (2) support user setting of profiles.
8776

88-
Kubo currently has no CLI / RPC / Config option to control DAG width in Kubo. https://github.com/ipfs/kubo/issues/10751 is the starting point to add that ability.
77+
Kubo 0.35 will have [`Import.*` configuration](https://github.com/ipfs/kubo/blob/master/docs/config.md#import) option to control DAG width.
8978

9079
### Security
9180

@@ -95,6 +84,15 @@ TODO
9584

9685
Another approach could be to name profiles based on the key UnixFS/CID parameters, e.g. v1-sha256-balanced-1mib-1024w-raw. This is longer and more convoluted.
9786

87+
88+
#### Empty directories
89+
90+
Decision if empty directories should be included is left out of scope.
91+
92+
Tools can apply arbitrary filtering before passing filesystem entries
93+
to be converted into a DAG, thus for 1:1 CID reproducibility one should
94+
run without any prefilters, or ensure the same prefilters are applied.
95+
9896
## Test fixtures
9997

10098
TODO

0 commit comments

Comments
 (0)