Skip to content

Conversation

@GulaschLulatsch
Copy link

@GulaschLulatsch GulaschLulatsch commented Jul 5, 2025

Sorted UI entries by animal name before drawing
20250705162917_1

@GulaschLulatsch
Copy link
Author

@Toxin4ick Bumping after a week of inactivity

Comment on lines +116 to +118
std::sort(sortedNames.begin(), sortedNames.end(), [](const auto* a, const auto* b) {
return a->second < b->second;
});
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem: You're comparing std::string values that use UTF-8 encoding by bytes, not by characters.

When comparing two std::string values with <, the compiler compares their bytes, one at a time, in lexicographical order. This works only for ASCII-encoded strings like English.

But UTF-8 is variable-length:

English characters = 1 byte (ASCII)

Russian, Korean, etc. = 2-4 bytes

Byte-wise comparison breaks down with multi-byte characters.

  1. UTF-8 byte order ≠ alphabetic order

Here’s a concrete example in Russian:

cpp

std::string a = "Ёж"; // U+0401 (Ё), U+0436 (ж) std::string b = "Енот"; // U+0415 (Е), U+043D (н), ...

In the UTF-8 byte representation:

"Ё" = 0xD0 0x81

"Е" = 0xD0 0x95

Now, 0x81 < 0x95, so "Ёж" < "Енот" — but that’s not correct in the Russian alphabet.
Alphabetically, Ё follows Е, not precedes it.

So...

Your comparator:

cpp

return a->second < b->second;

Means: compare UTF-8 strings byte-by-byte, without regard to their actual Unicode meaning. This causes incorrect ordering in any language beyond ASCII.

✅ When does this code work?

WhyEnglish (ASCII)
✅1-byte chars match alphabetical order

Russian (UTF-8)
❌Multi-byte chars break proper order

Korean (UTF-8)
❌Same — Hangul characters won't sort correctly

❌ Example of incorrect order:

With your code:

cpp

{ "Ёж", "Енот", "Дятел" } → Sorted as: "Ёж", "Дятел", "Енот" ❌

Correct order alphabetically (Unicode-aware):

cpp
"Дятел", "Енот", "Ёж" ✅

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I understand the problem.

Thank your for your detailed explanation.

To confirm a possible alternative solution:

Sorting unicode is not possible without knowing the users locale, this is already indirectly set through the ini file, where we have the langFile that we could use to determine locale.

C++ Standard does not allow for sufficient collation to sort strings language aware, so as a proposal, I would statically link against ICU and use an ICU Collator determined from the configured locale to provide a comparator that allows for culture specific alphabetic sorting.

This is a bit of additional effort, specially the inclusion of a third party binary-library. If it is fine for you I would use the following days to come up with a draft

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants