-
Notifications
You must be signed in to change notification settings - Fork 141
Description
I ran the following code built with GCC in Linux VM:
#include <stdint.h>
#include <stdio.h>
#include <time.h>
#include <tsl/robin_set.h>
int main() {
for (uint64_t size = 1 << 20; size <= 1 << 25; size <<= 1) {
for (int mode = 0; mode < 2; mode++) {
tsl::robin_set<uint64_t> x;
uint64_t startTime = clock();
for (uint64_t i = 0; i < size; i++)
x.insert(i * (mode == 0 ? 0xDEADBEEF : 1 << 20));
uint64_t endTime = clock();
double deltaTime = double(endTime - startTime) / CLOCKS_PER_SEC;
uint64_t sum = 0;
for (uint64_t val : x)
sum += val;
printf("N = %llu %c: time = %0.3lf chk = %llu\n", size, (mode ? 'B' : 'M'), deltaTime, sum);
}
}
return 0;
}
and I got:
N = 1048576 M: time = 0.043 chk = 6257894696195457024
N = 1048576 B: time = 8.280 chk = 576460202547609600
N = 2097152 M: time = 0.115 chk = 6588752116096958464
N = 2097152 B: time = 19.110 chk = 2305841909702066176
N = 4194304 M: time = 0.169 chk = 7916099200727646208
N = 4194304 B: time = 35.500 chk = 9223369837831520256
N = 8388608 M: time = 0.354 chk = 13233322349299761152
Killed
So I guess inserting integers divisible by 2^20 takes quadratic time.
Moreover, trying to insert 16M values results in a crash.
Most likely because std::hash<uint64_t>(x) = x on GCC.
Note that I used default settings and got no warnings!
Awful hash function by default is rather critical issue for people who don't know much about hashing (and would probably do worse trying to implement their own hash function or hash table). And given that TSL interface is very STL-like, I think that's the audience it is targeted at.
A proper hash function usually contains three parts:
- Combining: getting one integer out of many values/tuples/sequences/etc.
- Finalizing: doing some transformation for good statistical properties after step 3.
- Reduction: reducing the domain from something like whole range of
uint64_tto an index in hash table.
As usual, C++ standard is not precise enough, and STL is not cross-platform.
On MSVC, std::hash performs steps 1 and 2, while std::unordered_set only does step 3.
On GCC, std::hash only performs step 1, while std::unordered_set does steps 2 and 3.
It means that if you use std::hash directly, then you should run your own hash finalizer. TSL hash table only does step 3, but uses std::hash, meaning that the crucial step 2 is missed.