Skip to content

fix: ipip-499 profile name, perf. and tests#458

Merged
achingbrain merged 7 commits intomainfrom
fix/ipip-499-profile-name-and-tests
Mar 9, 2026
Merged

fix: ipip-499 profile name, perf. and tests#458
achingbrain merged 7 commits intomainfrom
fix/ipip-499-profile-name-and-tests

Conversation

@lidel
Copy link
Copy Markdown
Member

@lidel lidel commented Feb 27, 2026

This PR

the intention here was to improve test coverage of ipfs/specs#499 by testing exactly the same DAGs Kubo does, and have exactly the same coverage of border behavior of HAMT and DAG fanout behaviors, but I've found a typo and a performance issue which also got fixed:

  • fix the v0 profile name typo: unixfs-v0-2025 -> unixfs-v0-2015 (matching the spec and kubo)
  • add tests verifying CID parity with kubo for both unixfs-v0-2015 and unixfs-v1-2025 profiles
  • fix O(N^2) directory size estimation in the block-bytes shard split strategy

Added tests

All expected CIDs match kubo's cidProfileExpectations character-for-character, covering files (chunk and max-links boundaries), directories (HAMT threshold boundaries for both links-bytes and
block-bytes), empty directories, and profile option overrides.

Test helpers port kubo's ChaCha20-based deterministic PRNG and filename generator so JS tests use identical inputs as the Go tests.

This may feel like overkill, but we really don't want to be sinking time N months from now if/when some regression occurs. These things are extremely tricky to debug if something goes wrong, even with LLMs. This regression test suite ensures IPIP-499 test fixtures are guarded: any future refactors to improve performance can be done with way more safety.

Fixes

Avoiding re-encoding protobuf

The block-bytes strategy re-encoded the full protobuf directory node after every file insert (O(N^2) total). Now the serialized size is computed arithmetically and updated incrementally -- O(1) per insert.

  • v1-2025 directory threshold tests: ~150 SECONDS (37229b3) -> ~150 MS (22f3606)

Link name length and UTF-8

This is change i'm least confident about, in GO filenames can be arbitrary bytes, but in JS @ipld/dag-pb seems to force/limit filenames to UTF-8(?) while JS represents strings as UTF-16 which makes my head hurt a bit

(with that in mind) links-bytes estimation used JS string length (UTF-16 code units) instead of UTF-8 byte length for link names. Now uses UTF-8 byte length to match Go's len(name). No effect on ASCII-only names but safer for any data onboarded with UTF-8 characters.

@achingbrain thanks in advance for a sanity check on this (tests pass, same CIDs as in Kubo/IPIP-499, but maybe we dont cover all edge cases?)

every expected CID matches kubo's cidProfileExpectations structs
character-for-character, covering both links-bytes (v0) and
block-bytes (v1) shard split strategies.

- src/index.ts: rename profile 'unixfs-v0-2025' to 'unixfs-v0-2015'
  to match the spec and kubo
- test/helpers/deterministic.ts: port kubo's ChaCha20 PRNG and
  AlphabetEasy filename generation for reproducible test vectors
- test/ipip-499-profiles.spec.ts: 21 tests verifying both profiles
  against kubo 0.40 reference CIDs for files (chunk boundary,
  max-links boundary), directories (HAMT threshold boundary),
  empty dirs, option overrides, and strategy divergence
- package.json: add @noble/ciphers devDependency for ChaCha20
@lidel lidel requested a review from achingbrain February 27, 2026 00:40
the block-bytes shard split strategy called marshal().byteLength after
every file insert, re-serializing the full protobuf directory node each
time. for a 4766-file directory this meant ~11.3M link serializations
total (O(N^2)), causing the v1-2025 directory threshold tests to take
~150s each.

src/utils/pb-size.ts:
- new file with pure arithmetic functions ported from @ipld/dag-pb's
  pb-encode.js (varintLen, linkSerializedSize, dataFieldSerializedSize)
- utf8ByteLength computes UTF-8 byte count without TextEncoder
  allocation, matching what @ipld/dag-pb uses for PBLink.Name encoding

src/dir-flat.ts:
- estimateNodeSize() now computes exact serialized size arithmetically
  instead of calling marshal(), matching pb-encode.js byte-for-byte
- put() incrementally adjusts nodeSize (O(1) per insert) instead of
  invalidating it (which forced O(N) recomputation on next estimate)
- both strategies use explicit if/else with throw on unknown value
- links-bytes: use utf8ByteLength(name) instead of name.length
  (correctness fix for non-ASCII names, matches Go's len(name))

test/pb-size.spec.ts:
- unit tests for all four pb-size.ts functions, verified against
  @ipld/dag-pb's encode(prepare(node)) output

test/ipip-499-profiles.spec.ts:
- pre-compute shared filename arrays once in before() hook instead
  of calling deterministicFilenames 7 times with duplicate params
- per-describe blockstores to reduce memory accumulation
- removed explicit timeouts (no longer needed)
- updated stale comments about block-bytes re-serialization

v1-2025 directory tests: ~150s -> ~150ms (1000x faster)
full IPIP-499 suite (21 tests): minutes -> ~5s
@lidel lidel force-pushed the fix/ipip-499-profile-name-and-tests branch from 22f3606 to 1a82209 Compare February 27, 2026 00:48
@lidel lidel marked this pull request as ready for review February 27, 2026 01:03
@achingbrain
Copy link
Copy Markdown
Member

achingbrain commented Feb 27, 2026

Thanks for opening this. I'm going to break this PR into smaller chunks as some stuff is high confidence and some needs some eyeballing.

Comment thread packages/ipfs-unixfs-importer/package.json Outdated
Comment thread packages/ipfs-unixfs-importer/test/helpers/deterministic.ts Outdated
Comment thread packages/ipfs-unixfs-importer/src/dir-flat.ts
achingbrain added a commit that referenced this pull request Mar 1, 2026
Extracts the IPIP-499 related tests from #458
@achingbrain achingbrain merged commit b569843 into main Mar 9, 2026
36 of 37 checks passed
@achingbrain achingbrain deleted the fix/ipip-499-profile-name-and-tests branch March 9, 2026 12:41
github-actions bot pushed a commit that referenced this pull request Mar 9, 2026
## [ipfs-unixfs-importer-v16.1.2](ipfs-unixfs-importer-16.1.1...ipfs-unixfs-importer-16.1.2) (2026-03-09)

### Bug Fixes

* import performance of directories ([#458](#458)) ([b569843](b569843))

### Trivial Changes

* add ipip499 tests ([#460](#460)) ([73de3d1](73de3d1)), closes [#458](#458)
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 9, 2026

🎉 This PR is included in version ipfs-unixfs-importer-v16.1.2 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants