IPFS hash feature use non-specified algorithm which is not widely compatible in the ecosystem

It looks like you are implementing what looks like the [Kubo](https://github.com/ipfs/kubo/) defaults, they are nearing 10 years and lack the newest features we support, I thus want to change thoses so I am poking around where people rely on thoses defaults in the ecosystem.

https://github.com/ethereum/solidity/blob/develop/libsolutil/IpfsHash.cpp

Unixfs is an open format which allows for multiple writer implementations to implement their own linking logic such as append logs, content aware chunking (cutting around logical boundries in the content, such as iframes in video files, content in archive formats, ...), more packed representation, ... while all of thoses are automatically compatible with all reader implementations.
This as designed lead to a inconsistent hashes in the ecosystem, examples with implementations that produce different CIDs:
- [`github.com/Jorropo/linux2ipfs`](https://github.com/Jorropo/linux2ipfs) use 2MiB raw leaves with 2MiB roots (instead of 174 links).
- [`github.com/ipld/go-car/cmd/car`](https://github.com/ipld/go-car/tree/master/cmd/car) use a different TSize logic.
- [`github.com/ipfs/boxo/mfs`](https://github.com/ipfs/boxo/tree/main/mfs) (which is available in `Kubo` with `ipfs files ...`) has different defaults and can produce identical files with different CIDs if you use a different list of copy, write, append, ... operations.
- [`github.com/filecoin-project/lotus`](https://github.com/filecoin-project/lotus) (*I belive*) uses raw leaves with 1MiB chunks and 1024 links with some variant of blake2
- [`web3.storage`](https://web3.storage/) & [`nft.storage`](https://nft.storage/) use raw leaves with 1MiB chunks
- [`github.com/bmwiedemann/ipfs-iso-jigsaw`](https://github.com/bmwiedemann/ipfs-iso-jigsaw) chunk each file in an ISO separately and then concatenate the resulting files with the ISO metadata in a unixfs root allowing different versions of similar isos to share the blocks for the unchanged files (incremental file updates).
- ... more I don't know on the top of my head

Hopefully this serves as a demonstration that unixfs is good at tailoring for usecases, not repeatable hashing of data.

I see 3 potential fixes:
1. Add an option to the compiler to output a [`.car` file](https://ipld.io/specs/transport/car/), basically instead of relying on `ipfs add` magically perfectly outputing the same CID, you do not run 2 chunkers, the solc chunker would output the blocks in an archive and then the user could `ipfs dag import` (which read blocks for blocks instead of chunking). This is how chunkers are meant to work (this or using some other transport than car).
2. Write a proposal and make a new spec for repeatable unixfs chunkers inside [`ipfs/specs`](https://github.com/ipfs/specs) and implement it, you could then use a single link inline CID with metadata to embed that into the CID. So the CIDs would encode `unixfs-balanced-chunksize-256KiB-dag-pb-leaves-...` and could be fed into an other implementation to have it the same.
3. Replace all the multiblock and dagpb logic with a `raw-blake3` CID. The reason we use the unixfs merkle dag format is unlike plain sha256 it supports for easy incremental verification, seeking (downloading random parts of the file without having to download the full file) and has very high exponential fanout (allows to do parallel multipeer downloads).
   All of thoses features are available builtin in well specified hash functions blake3 being one of them, this removes support for the most esoteric one like custom chunking, but instead adding the same files multiple times.
   Blake3 is also used by default by the new [`github.com/n0-computer/iroh`](https://github.com/n0-computer/iroh) implementation.

# TL;DR:

You implement unixfs which is not a specified repeatable hash function (the same input can hash to different hashes depending on how the internal merkle-datastructure is built which is usecase dependent).
Given your usecase is simple usually small text files I belive you should switch to use plain blake3 instead which is a well fixed merkletree (instead of the loose merkledag unixfs is).

## Note 0

out of all the IPFS implementations I know only iroh knows how to handle blake3 incremental verification *yet*, other Kubo & friends supports blake3 but as dumb hashes, so it still uses unixfs + blake3 to handle files above the block limit 1~4MiB, we are intrested in adding this capability in the future.

## Note 1

Even tho there is a one to many `file bytes → CID` unixfs relationship, assuming cryptographically secure hash functions there always is a unique `CID → bytes` relationship.

## Note 2

Blake3 might not be the best sollution, what I am sure is that relying on random unspecified behaviours of some old piece of software is definitely wrong. :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IPFS hash feature use non-specified algorithm which is not widely compatible in the ecosystem #14389

TL;DR:

Note 0

Note 1

Note 2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

IPFS hash feature use non-specified algorithm which is not widely compatible in the ecosystem #14389

Description

TL;DR:

Note 0

Note 1

Note 2

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions