Summary
Add support for IPIP-499 CID profiles (unixfs-v1-2025 and unixfs-v0-2015) to enable deterministic CID generation across IPFS implementations.
Current State
Helia's @helia/unixfs already uses settings close to unixfs-v1-2025:
- CIDv1, sha2-256, raw leaves
- 1 MiB chunk size
- 1024 links per node (DAG width)
- 256 block HAMT fanout
However, the HAMT sharding threshold estimation uses links-bytes (sum of link name + CID lengths) instead of block-bytes (full serialized dag-pb size). This causes CID mismatches when directories are near the 256 KiB threshold.
See: packages/unixfs/src/commands/utils/is-over-shard-threshold.ts
Required Changes
@achingbrain below are just broad strokes prototypes,. feel free to adjust names/api to be idiomatic to what helia does
1. Update HAMT threshold estimation to block-bytes
The estimateNodeSize() function currently sums link name and CID byte lengths. For unixfs-v1-2025 compliance, it should use the full serialized block size:
// Current (links-bytes):
size += link.Name.length + link.Hash.bytes.byteLength
// Needed (block-bytes):
size = dagPb.encode(node).byteLength
This affects:
is-over-shard-threshold.ts
- any other code checking directory size for HAMT conversion
2. Add profile option
Add a single profile option that applies all relevant settings internally:
// simple usage - just pick a profile
const cid = await fs.addBytes(data, { profile: 'unixfs-v1-2025' })
// or use the legacy profile for CIDv0 compatibility
const cid = await fs.addBytes(data, { profile: 'unixfs-v0-2015' })
Users who need custom behavior can still override individual settings:
// start with a profile, then tweak specific knobs
const cid = await fs.addBytes(data, {
profile: 'unixfs-v1-2025',
chunker: fixedSize({ chunkSize: 512 * 1024 }) // override just this one setting
})
3. Expose individual knobs for advanced users
The shardSplitThresholdBytes option exists but there's no way to choose between links-bytes and block-bytes estimation. Add a hamtSizeEstimation option so advanced users can control this independently of profiles.
4. Add IPIP-499 test vectors
Add tests that verify CIDs match the spec fixtures for both profiles. See the Test fixtures section in IPIP-499 for reference CIDs covering small files, multi-level DAGs, and HAMT threshold boundary cases.
5. Verify HAMT threshold comparison uses >
The threshold comparison should be strictly greater than (>), not >=. A directory exactly at 262144 bytes remains a basic directory.
Current code in is-over-shard-threshold.ts line 31 uses size > threshold which is correct (so far, double check if we did not flop-flop and decided its GO thats wrong and JS did the right thing)
Profile Comparison
| Setting |
unixfs-v0-2015 |
unixfs-v1-2025 |
| CID Version |
0 |
1 |
| Raw Leaves |
false |
true |
| Hash |
sha2-256 |
sha2-256 |
| Chunk size |
256 KiB |
1 MiB |
| DAG width |
174 |
1024 |
| HAMT fanout |
256 |
256 |
| HAMT threshold |
256 KiB |
256 KiB |
| HAMT estimation |
links-bytes |
block-bytes |
Related Work
Summary
Add support for IPIP-499 CID profiles (
unixfs-v1-2025andunixfs-v0-2015) to enable deterministic CID generation across IPFS implementations.Current State
Helia's
@helia/unixfsalready uses settings close tounixfs-v1-2025:However, the HAMT sharding threshold estimation uses
links-bytes(sum of link name + CID lengths) instead ofblock-bytes(full serialized dag-pb size). This causes CID mismatches when directories are near the 256 KiB threshold.See:
packages/unixfs/src/commands/utils/is-over-shard-threshold.tsRequired Changes
@achingbrain below are just broad strokes prototypes,. feel free to adjust names/api to be idiomatic to what helia does
1. Update HAMT threshold estimation to
block-bytesThe
estimateNodeSize()function currently sums link name and CID byte lengths. Forunixfs-v1-2025compliance, it should use the full serialized block size:This affects:
is-over-shard-threshold.ts2. Add
profileoptionAdd a single
profileoption that applies all relevant settings internally:Users who need custom behavior can still override individual settings:
3. Expose individual knobs for advanced users
The
shardSplitThresholdBytesoption exists but there's no way to choose betweenlinks-bytesandblock-bytesestimation. Add ahamtSizeEstimationoption so advanced users can control this independently of profiles.4. Add IPIP-499 test vectors
Add tests that verify CIDs match the spec fixtures for both profiles. See the Test fixtures section in IPIP-499 for reference CIDs covering small files, multi-level DAGs, and HAMT threshold boundary cases.
5. Verify HAMT threshold comparison uses
>The threshold comparison should be strictly greater than (
>), not>=. A directory exactly at 262144 bytes remains a basic directory.Current code in
is-over-shard-threshold.tsline 31 usessize > thresholdwhich is correct (so far, double check if we did not flop-flop and decided its GO thats wrong and JS did the right thing)Profile Comparison
unixfs-v0-2015unixfs-v1-2025Related Work