Skip to content

<locale>: std::collate_byname<_Elem>::hash() yields different hashes for strings that collate the same #5212

Closed
@muellerj2

Description

@muellerj2

[locale.collate.virtuals]/3 specifies that collate<_Elem>::do_hash() returns the same hash for all strings that collate the same. However, collate_byname<_Elem>::(do_)hash() does not produce such hashes for non-C locales.

Test case

#include <iostream>
#include <locale>

using namespace std;

int main() {
	const locale loc("de_DE");
	auto& coll = use_facet<collate<wchar_t>>(loc);
	const wchar_t ex1[] = L"Straße";
	const wchar_t ex2[] = L"Strasse";

	cout << "collate the same: " << (coll.compare(ex1, ex1 + size(ex1) - 1, ex2, ex2 + size(ex2) - 1) == 0) << '\n';
	cout << "hash the same: " << (coll.hash(ex1, ex1 + size(ex1) - 1) == coll.hash(ex2, ex2 + size(ex2) - 1));
	return 0;
}

prints

collate the same: 1
hash the same: 0

Godbolt link

Expected result

This should print

collate the same: 1
hash the same: 1

Additional remarks

For non-C locales, I think the hash function should essentially do:

return hash(transform(_First, _Last));

Alternatively, LCMapStringA/W/Ex with LCMAP_HASH could be used. It's probably faster, but LCMAP_HASH is not guaranteed to produce the same hash for all strings that collate the same according to the API documentation, so it seems this also wouldn't fully conform to [locale.collate.virtuals]/3.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingfixedSomething works now, yay!

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions