I've been trying to understand some of the implementation details of this, but I think there is some contextual knowledge that I like. I'll try to ask those questions here in a hope to find answers and contribute some improvements or at least code comments for future readers of this code.
-
I am confused by why ConsumableBuffer starts reading from the last byte as opposed to first byte
|
constructor (value: Uint8Array) { |
|
this._value = value |
|
this._currentBytePos = value.length - 1 |
|
this._currentBitPos = 7 |
What is even more confusing that while it reads bytes from the right side it reads bits from left side & the way it keeps tracks of the _currentBitPos would make you think bits are also read from the right side, but as it turns out it's deceiving because it actually reverses start position just before reading bits
|
const availableBits = this._currentBitPos + 1 |
|
const taking = Math.min(availableBits, pendingBits) |
|
const value = byteBitsToInt(byte, availableBits - taking, taking) |
Perhaps most confusing about all this is the fact that js-ipfs-unixfs reverses hash bits, which makes me wonder if that would have been unnecessary if ConsumableBuffer just read bytes from left to right instead.
https://github.com/ipfs/js-ipfs-unixfs/blob/a5b9cad0cddd391e7dd93904bc4d3366f8e79202/packages/ipfs-unixfs-importer/src/options.js#L8-L16
async function hamtHashFn (buf) {
return (await murmur3128.encode(buf))
// Murmur3 outputs 128 bit but, accidentally, IPFS Go's
// implementation only uses the first 64, so we must do the same
// for parity..
.slice(0, 8)
// Invert buffer because that's how Go impl does it
.reverse()
}
-
InfiniteHash seems to represent hash of some seed bytes, in a following pattern:
[...hash(seed), ...hash([...seed, 1]), ...hash([...seed, 2]), ....]
if I understand it's code correctly (default) hash will return 8 bytes, after which new hash is generated from seed bytes & depth producing another 8 bytes, etc...
|
async _produceMoreBits () { |
|
this._depth++ |
|
|
|
const value = this._depth > 0 ? uint8ArrayConcat([this._value, Uint8Array.from([this._depth])]) : this._value |
|
const hashValue = await this._hashFn(value) |
|
const buffer = new ConsumableBuffer(hashValue) |
|
|
|
this._buffers.push(buffer) |
|
this._availableBits += buffer.availableBits() |
|
} |
I suspect after 255*8 bytes things will repeat (because Uint8Array.from([256]) would be [1]), not sure if that is intentional or a bug, but it is a scenario that will not occur in practice so probably does not matter.
Is this overall hashing behavior described somewhere ? I can't seem to find anything similar in go code base and in fact I got an impression that go will error with out of bound error.
Then again could it be that non of our tests run into conflicting keys and go and js might not align when that occurs ?
-
I'm under impression that _findPlace basically traverses trie by deriving index from 8 bits of the key hash, if bucket is encountered next 8 bits of the key hash is used to descend one level deeper etc... And since our hashes are deterministically derived from key + depth we will eventually encounter index without bucket in it
|
async _findPlace (key: string | InfiniteHash): Promise<BucketPosition<T>> { |
|
const hashValue = this._options.hash(typeof key === 'string' ? uint8ArrayFromString(key) : key) |
|
const index = await hashValue.take(this._options.bits) |
|
|
|
const child = this._children.get(index) |
|
|
|
if (child instanceof Bucket) { |
|
return await child._findPlace(hashValue) |
|
} |
|
|
|
return { |
|
bucket: this, |
|
pos: index, |
|
hash: hashValue, |
|
existingChild: child |
|
} |
|
} |
Is this accurate ? And if so, I feel like the whole hash cycling thing probably needs to be part of unixfs spec.
I've been trying to understand some of the implementation details of this, but I think there is some contextual knowledge that I like. I'll try to ask those questions here in a hope to find answers and contribute some improvements or at least code comments for future readers of this code.
I am confused by why
ConsumableBufferstarts reading from the last byte as opposed to first bytejs-hamt-sharding/src/consumable-buffer.ts
Lines 28 to 31 in 44b6e6b
What is even more confusing that while it reads bytes from the right side it reads bits from left side & the way it keeps tracks of the
_currentBitPoswould make you think bits are also read from the right side, but as it turns out it's deceiving because it actually reverses start position just before reading bitsjs-hamt-sharding/src/consumable-buffer.ts
Lines 47 to 49 in 44b6e6b
Perhaps most confusing about all this is the fact that js-ipfs-unixfs reverses hash bits, which makes me wonder if that would have been unnecessary if
ConsumableBufferjust read bytes from left to right instead.https://github.com/ipfs/js-ipfs-unixfs/blob/a5b9cad0cddd391e7dd93904bc4d3366f8e79202/packages/ipfs-unixfs-importer/src/options.js#L8-L16
InfiniteHashseems to represent hash of some seed bytes, in a following pattern:if I understand it's code correctly (default) hash will return 8 bytes, after which new hash is generated from seed bytes & depth producing another 8 bytes, etc...
js-hamt-sharding/src/consumable-hash.ts
Lines 80 to 89 in 44b6e6b
I suspect after 255*8 bytes things will repeat (because Uint8Array.from([256]) would be [1]), not sure if that is intentional or a bug, but it is a scenario that will not occur in practice so probably does not matter.
Is this overall hashing behavior described somewhere ? I can't seem to find anything similar in go code base and in fact I got an impression that go will error with out of bound error.
Then again could it be that non of our tests run into conflicting keys and go and js might not align when that occurs ?
I'm under impression that
_findPlacebasically traverses trie by deriving index from 8 bits of the key hash, if bucket is encountered next 8 bits of the key hash is used to descend one level deeper etc... And since our hashes are deterministically derived from key + depth we will eventually encounter index without bucket in itjs-hamt-sharding/src/bucket.ts
Lines 154 to 170 in 44b6e6b
Is this accurate ? And if so, I feel like the whole hash cycling thing probably needs to be part of unixfs spec.