Description
If I test the hash3_int
method against php8.1.2 murmur3a
it appears to have divergences.
I found this after stumbling across a divergence by chance, and then testing 300000
randomly generated keys in roughly the same format as the problematic key I'd discovered.
e.g.
Given the keys
vdegpxazixnfmxxrshidlvnqyqh20301009315950
xlznhcnqpwpmfpxihtnrgz20290705241970
kuieegiqdkciqyxpuheemjtxcb202709178016583
fdthgabmvundazjxxu202810222044209
eqtlxyiianhpkjupowha200709169794430
tzlvesrdvngxoatqowmvjwdxuey201808121086444
wexigdatedfbjvdrbnwwc202602132914266
hash3_int
and murmur3a
give different hash values.
All the other tested keys matched exactly.
Running: 300000 iterations
PHP version: 8.1.2-1ubuntu2.20
Seed: 0
Progress: 300000/300000
Divergent hashes found:
Array
(
[0] => Array
(
[key] => vdegpxazixnfmxxrshidlvnqyqh20301009315950
[hash3Int] => 331996093
[murmur3a] => 2187344450
)
[1] => Array
(
[key] => xlznhcnqpwpmfpxihtnrgz20290705241970
[hash3Int] => 2026460930
[murmur3a] => 1939335100
)
[2] => Array
(
[key] => kuieegiqdkciqyxpuheemjtxcb202709178016583
[hash3Int] => 1380540917
[murmur3a] => 2945654283
)
[3] => Array
(
[key] => fdthgabmvundazjxxu202810222044209
[hash3Int] => 2427300608
[murmur3a] => 2986937244
)
[4] => Array
(
[key] => eqtlxyiianhpkjupowha200709169794430
[hash3Int] => 3932939188
[murmur3a] => 3428012532
)
[5] => Array
(
[key] => tzlvesrdvngxoatqowmvjwdxuey201808121086444
[hash3Int] => 1240384036
[murmur3a] => 2145074505
)
[6] => Array
(
[key] => wexigdatedfbjvdrbnwwc202602132914266
[hash3Int] => 4187875406
[murmur3a] => 245478873
)
)
This is using
<?php
function hash3Int(string $key, int $seed = 0): int
{
$key = array_values(unpack('C*', $key));
$klen = count($key);
$h1 = $seed < 0 ? -$seed : $seed;
$remainder = $i = 0;
for ($bytes = $klen - ($remainder = $klen & 3); $i < $bytes; ) {
$k1 = $key[$i]
| ($key[++$i] << 8)
| ($key[++$i] << 16)
| ($key[++$i] << 24);
++$i;
$k1 = (((($k1 & 0xffff) * 0xcc9e2d51) + ((((($k1 >= 0 ? $k1 >> 16 : (($k1 & 0x7fffffff) >> 16) | 0x8000)) * 0xcc9e2d51) & 0xffff) << 16))) & 0xffffffff;
$k1 = $k1 << 15 | ($k1 >= 0 ? $k1 >> 17 : (($k1 & 0x7fffffff) >> 17) | 0x4000);
$k1 = (((($k1 & 0xffff) * 0x1b873593) + ((((($k1 >= 0 ? $k1 >> 16 : (($k1 & 0x7fffffff) >> 16) | 0x8000)) * 0x1b873593) & 0xffff) << 16))) & 0xffffffff;
$h1 ^= $k1;
$h1 = $h1 << 13 | ($h1 >= 0 ? $h1 >> 19 : (($h1 & 0x7fffffff) >> 19) | 0x1000);
$h1b = (((($h1 & 0xffff) * 5) + ((((($h1 >= 0 ? $h1 >> 16 : (($h1 & 0x7fffffff) >> 16) | 0x8000)) * 5) & 0xffff) << 16))) & 0xffffffff;
$h1 = ((($h1b & 0xffff) + 0x6b64) + ((((($h1b >= 0 ? $h1b >> 16 : (($h1b & 0x7fffffff) >> 16) | 0x8000)) + 0xe654) & 0xffff) << 16));
}
$k1 = 0;
switch ($remainder) {
case 3:
$k1 ^= $key[$i + 2] << 16;
case 2:
$k1 ^= $key[$i + 1] << 8;
case 1:
$k1 ^= $key[$i];
$k1 = ((($k1 & 0xffff) * 0xcc9e2d51) + ((((($k1 >= 0 ? $k1 >> 16 : (($k1 & 0x7fffffff) >> 16) | 0x8000)) * 0xcc9e2d51) & 0xffff) << 16)) & 0xffffffff;
$k1 = $k1 << 15 | ($k1 >= 0 ? $k1 >> 17 : (($k1 & 0x7fffffff) >> 17) | 0x4000);
$k1 = ((($k1 & 0xffff) * 0x1b873593) + ((((($k1 >= 0 ? $k1 >> 16 : (($k1 & 0x7fffffff) >> 16) | 0x8000)) * 0x1b873593) & 0xffff) << 16)) & 0xffffffff;
$h1 ^= $k1;
}
$h1 ^= $klen;
$h1 ^= ($h1 >= 0 ? $h1 >> 16 : (($h1 & 0x7fffffff) >> 16) | 0x8000);
$h1 = ((($h1 & 0xffff) * 0x85ebca6b) + ((((($h1 >= 0 ? $h1 >> 16 : (($h1 & 0x7fffffff) >> 16) | 0x8000)) * 0x85ebca6b) & 0xffff) << 16)) & 0xffffffff;
$h1 ^= ($h1 >= 0 ? $h1 >> 13 : (($h1 & 0x7fffffff) >> 13) | 0x40000);
$h1 = (((($h1 & 0xffff) * 0xc2b2ae35) + ((((($h1 >= 0 ? $h1 >> 16 : (($h1 & 0x7fffffff) >> 16) | 0x8000)) * 0xc2b2ae35) & 0xffff) << 16))) & 0xffffffff;
$h1 ^= ($h1 >= 0 ? $h1 >> 16 : (($h1 & 0x7fffffff) >> 16) | 0x8000);
return $h1;
}
function murmur3a(string $key, int $seed = 0): int
{
return (int) base_convert(hash('murmur3a', $key, false, ["seed" => $seed]), 16, 10);
}
function generateRandomKey()
{
$stringPart = substr(str_shuffle(str_repeat("abcdefghijklmnopqrstuvwxyz", 10)), 0, rand(10, 30));
$datePart = date("Ymd", timestamp: rand(strtotime("2000-01-01"), strtotime("2030-12-31")));
$intPart = rand(1000, 9999999);
return $stringPart . $datePart . $intPart;
}
function runTests($numTests, $seed)
{
echo "Running: $numTests iterations\n";
echo "PHP version: " . phpversion() . "\n";
echo "Seed: $seed\n";
$divergentHashes = [];
for ($i = 0; $i < $numTests; $i++) {
echo "Progress: " . ($i + 1) . "/$numTests\r";
$key = generateRandomKey();
$hash3Int = hash3Int($key, $seed);
$murmur3a = murmur3a($key, $seed);
if ($hash3Int !== $murmur3a) {
$divergentHashes[] = [
'key' => $key,
'hash3Int' => $hash3Int,
'murmur3a' => $murmur3a
];
}
}
echo "\n";
return $divergentHashes;
}
$numTests = isset($argv[1]) ? (int) $argv[1] : 1000;
$seed = isset($argv[2]) ? (int) $argv[2] : 0;
$divergentHashes = runTests($numTests, $seed);
if (empty($divergentHashes)) {
echo "All hashes matched!\n";
} else {
echo "Divergent hashes found:\n";
print_r($divergentHashes);
}
I also tested against other versions of murmurhash3 (in various languages; go, JavaScript, Python, etc) and they all do give the same hash as murmur3a
- which leads me to suspect there is a subtle bug in the hash3_int
implementation within this repo.
This seems to be consistent, if I run the test for a few hundred thousand iterations it finds a small number of divergent keys.
Happy to give more context and information if you need it, and apologies if I am overlooking something here!