fix: ipip-499 profile name, perf. and tests#458
Merged
achingbrain merged 7 commits intomainfrom Mar 9, 2026
Merged
Conversation
every expected CID matches kubo's cidProfileExpectations structs character-for-character, covering both links-bytes (v0) and block-bytes (v1) shard split strategies. - src/index.ts: rename profile 'unixfs-v0-2025' to 'unixfs-v0-2015' to match the spec and kubo - test/helpers/deterministic.ts: port kubo's ChaCha20 PRNG and AlphabetEasy filename generation for reproducible test vectors - test/ipip-499-profiles.spec.ts: 21 tests verifying both profiles against kubo 0.40 reference CIDs for files (chunk boundary, max-links boundary), directories (HAMT threshold boundary), empty dirs, option overrides, and strategy divergence - package.json: add @noble/ciphers devDependency for ChaCha20
the block-bytes shard split strategy called marshal().byteLength after every file insert, re-serializing the full protobuf directory node each time. for a 4766-file directory this meant ~11.3M link serializations total (O(N^2)), causing the v1-2025 directory threshold tests to take ~150s each. src/utils/pb-size.ts: - new file with pure arithmetic functions ported from @ipld/dag-pb's pb-encode.js (varintLen, linkSerializedSize, dataFieldSerializedSize) - utf8ByteLength computes UTF-8 byte count without TextEncoder allocation, matching what @ipld/dag-pb uses for PBLink.Name encoding src/dir-flat.ts: - estimateNodeSize() now computes exact serialized size arithmetically instead of calling marshal(), matching pb-encode.js byte-for-byte - put() incrementally adjusts nodeSize (O(1) per insert) instead of invalidating it (which forced O(N) recomputation on next estimate) - both strategies use explicit if/else with throw on unknown value - links-bytes: use utf8ByteLength(name) instead of name.length (correctness fix for non-ASCII names, matches Go's len(name)) test/pb-size.spec.ts: - unit tests for all four pb-size.ts functions, verified against @ipld/dag-pb's encode(prepare(node)) output test/ipip-499-profiles.spec.ts: - pre-compute shared filename arrays once in before() hook instead of calling deterministicFilenames 7 times with duplicate params - per-describe blockstores to reduce memory accumulation - removed explicit timeouts (no longer needed) - updated stale comments about block-bytes re-serialization v1-2025 directory tests: ~150s -> ~150ms (1000x faster) full IPIP-499 suite (21 tests): minutes -> ~5s
22f3606 to
1a82209
Compare
Member
|
Thanks for opening this. I'm going to break this PR into smaller chunks as some stuff is high confidence and some needs some eyeballing. |
achingbrain
reviewed
Feb 27, 2026
achingbrain
reviewed
Feb 27, 2026
achingbrain
reviewed
Feb 27, 2026
achingbrain
added a commit
that referenced
this pull request
Mar 1, 2026
Extracts the IPIP-499 related tests from #458
This was referenced Mar 5, 2026
|
🎉 This PR is included in version ipfs-unixfs-importer-v16.1.2 🎉 The release is available on: Your semantic-release bot 📦🚀 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR
the intention here was to improve test coverage of ipfs/specs#499 by testing exactly the same DAGs Kubo does, and have exactly the same coverage of border behavior of HAMT and DAG fanout behaviors, but I've found a typo and a performance issue which also got fixed:
unixfs-v0-2025->unixfs-v0-2015(matching the spec and kubo)unixfs-v0-2015andunixfs-v1-2025profilesblock-bytesshard split strategyAdded tests
All expected CIDs match kubo's
cidProfileExpectationscharacter-for-character, covering files (chunk and max-links boundaries), directories (HAMT threshold boundaries for bothlinks-bytesandblock-bytes), empty directories, and profile option overrides.Test helpers port kubo's ChaCha20-based deterministic PRNG and filename generator so JS tests use identical inputs as the Go tests.
This may feel like overkill, but we really don't want to be sinking time N months from now if/when some regression occurs. These things are extremely tricky to debug if something goes wrong, even with LLMs. This regression test suite ensures IPIP-499 test fixtures are guarded: any future refactors to improve performance can be done with way more safety.
Fixes
Avoiding re-encoding protobuf
The
block-bytesstrategy re-encoded the full protobuf directory node after every file insert (O(N^2) total). Now the serialized size is computed arithmetically and updated incrementally -- O(1) per insert.Link name length and UTF-8
This is change i'm least confident about, in GO filenames can be arbitrary bytes, but in JS @ipld/dag-pb seems to force/limit filenames to UTF-8(?) while JS represents strings as UTF-16 which makes my head hurt a bit
(with that in mind)
links-bytesestimation used JS string length (UTF-16 code units) instead of UTF-8 byte length for link names. Now uses UTF-8 byte length to match Go'slen(name). No effect on ASCII-only names but safer for any data onboarded with UTF-8 characters.@achingbrain thanks in advance for a sanity check on this (tests pass, same CIDs as in Kubo/IPIP-499, but maybe we dont cover all edge cases?)