Skip to content

<locale>: std::collate_byname<_Elem>::hash() yields different hashes for strings that collate the same #5212

Open
@muellerj2

Description

[locale.collate.virtuals]/3 specifies that collate<_Elem>::do_hash() returns the same hash for all strings that collate the same. However, collate_byname<_Elem>::(do_)hash() does not produce such hashes for non-C locales.

Test case

#include <iostream>
#include <locale>

using namespace std;

int main() {
	const locale loc("de_DE");
	auto& coll = use_facet<collate<wchar_t>>(loc);
	const wchar_t ex1[] = L"Straße";
	const wchar_t ex2[] = L"Strasse";

	cout << "collate the same: " << (coll.compare(ex1, ex1 + size(ex1) - 1, ex2, ex2 + size(ex2) - 1) == 0) << '\n';
	cout << "hash the same: " << (coll.hash(ex1, ex1 + size(ex1) - 1) == coll.hash(ex2, ex2 + size(ex2) - 1));
	return 0;
}

prints

collate the same: 1
hash the same: 0

Godbolt link

Expected result

This should print

collate the same: 1
hash the same: 1

Additional remarks

For non-C locales, I think the hash function should essentially do:

return hash(transform(_First, _Last));

Alternatively, LCMapStringA/W/Ex with LCMAP_HASH could be used. It's probably faster, but LCMAP_HASH is not guaranteed to produce the same hash for all strings that collate the same according to the API documentation, so it seems this also wouldn't fully conform to [locale.collate.virtuals]/3.

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions