Skip to content

bug: bucket_id hash is non-deterministic on big-endian targets #127

@Troublor

Description

@Troublor

Summary

The bucket_id hash function produces platform-dependent results due to native-endian byte interpretation in the vendored AHash implementation. On big-endian targets, the same key bytes would produce different bucket IDs than on little-endian targets, violating the consensus determinism guarantee.

Severity

Low in practice (all current deployment targets are little-endian), but high in principle — a consensus-critical hash function must be deterministic across all platforms.

Affected Code

File: salt/src/state/ahash/convert.rs

macro_rules! convert {
    ($a:ty, $b:ty) => {
        impl Convert<$b> for $a {
            fn convert(self) -> $b {
                zerocopy::transmute!(self)  // ← native-endian bit reinterpretation
            }
        }
    };
}

zerocopy::transmute! is semantically equivalent to core::mem::transmute — it reinterprets the raw bits without any byte-order conversion. On a little-endian target, [0x01, 0x02, ..., 0x08] becomes 0x0807060504030201. On a big-endian target, the same bytes become 0x0102030405060708.

This feeds through ReadFromSlice::read_u64/u128/u32/u16Hasher::writehash()bucket_id(), meaning the same key produces different bucket IDs on LE vs BE.

Suggested Fix

Replace zerocopy::transmute! with explicit little-endian conversion:

// Before (native-endian, non-deterministic):
impl Convert<u64> for [u8; 8] {
    fn convert(self) -> u64 {
        zerocopy::transmute!(self)
    }
}

// After (always little-endian, deterministic):
impl Convert<u64> for [u8; 8] {
    fn convert(self) -> u64 {
        u64::from_le_bytes(self)
    }
}

Apply to all ReadFromSlice conversion paths: u16, u32, u64, u128.

This also allows dropping the zerocopy dependency entirely since it's only used for these transmute! calls in the hasher module.

Context

Discovered during megaeth-labs/mega-evm#225, where the salt crate's bucket_id hasher was inlined into mega-evm. The inlined version already uses from_le_bytes, making it deterministic across all platforms.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions