<locale>
: std::collate_byname<_Elem>::hash()
yields different hashes for strings that collate the same #5212
Open
Description
[locale.collate.virtuals]/3 specifies that collate<_Elem>::do_hash()
returns the same hash for all strings that collate the same. However, collate_byname<_Elem>::(do_)hash()
does not produce such hashes for non-C locales.
Test case
#include <iostream>
#include <locale>
using namespace std;
int main() {
const locale loc("de_DE");
auto& coll = use_facet<collate<wchar_t>>(loc);
const wchar_t ex1[] = L"Straße";
const wchar_t ex2[] = L"Strasse";
cout << "collate the same: " << (coll.compare(ex1, ex1 + size(ex1) - 1, ex2, ex2 + size(ex2) - 1) == 0) << '\n';
cout << "hash the same: " << (coll.hash(ex1, ex1 + size(ex1) - 1) == coll.hash(ex2, ex2 + size(ex2) - 1));
return 0;
}
prints
collate the same: 1
hash the same: 0
Expected result
This should print
collate the same: 1
hash the same: 1
Additional remarks
For non-C locales, I think the hash function should essentially do:
return hash(transform(_First, _Last));
Alternatively, LCMapStringA/W/Ex
with LCMAP_HASH
could be used. It's probably faster, but LCMAP_HASH
is not guaranteed to produce the same hash for all strings that collate the same according to the API documentation, so it seems this also wouldn't fully conform to [locale.collate.virtuals]/3.