sort animals in UI alphabetically #2

GulaschLulatsch · 2025-07-05T14:21:33Z

Sorted UI entries by animal name before drawing

GulaschLulatsch · 2025-07-14T08:43:58Z

@Toxin4ick Bumping after a week of inactivity

Toxin4ick · 2025-07-23T00:14:06Z

AnimalsFinder/animalsFinder.cpp

+	std::sort(sortedNames.begin(), sortedNames.end(), [](const auto* a, const auto* b) {
+		return a->second < b->second;
+	});


The problem: You're comparing std::string values that use UTF-8 encoding by bytes, not by characters.

When comparing two std::string values with <, the compiler compares their bytes, one at a time, in lexicographical order. This works only for ASCII-encoded strings like English.

But UTF-8 is variable-length:

English characters = 1 byte (ASCII)

Russian, Korean, etc. = 2-4 bytes

Byte-wise comparison breaks down with multi-byte characters.

UTF-8 byte order ≠ alphabetic order

Here’s a concrete example in Russian:

cpp

std::string a = "Ёж"; // U+0401 (Ё), U+0436 (ж) std::string b = "Енот"; // U+0415 (Е), U+043D (н), ...

In the UTF-8 byte representation:

"Ё" = 0xD0 0x81

"Е" = 0xD0 0x95

Now, 0x81 < 0x95, so "Ёж" < "Енот" — but that’s not correct in the Russian alphabet.
Alphabetically, Ё follows Е, not precedes it.

So...

Your comparator:

cpp

return a->second < b->second;

Means: compare UTF-8 strings byte-by-byte, without regard to their actual Unicode meaning. This causes incorrect ordering in any language beyond ASCII.

✅ When does this code work?

WhyEnglish (ASCII)
✅1-byte chars match alphabetical order

Russian (UTF-8)
❌Multi-byte chars break proper order

Korean (UTF-8)
❌Same — Hangul characters won't sort correctly

❌ Example of incorrect order:

With your code:

cpp

{ "Ёж", "Енот", "Дятел" } → Sorted as: "Ёж", "Дятел", "Енот" ❌

Correct order alphabetically (Unicode-aware):

cpp
"Дятел", "Енот", "Ёж" ✅

Alright, I understand the problem.

Thank your for your detailed explanation.

To confirm a possible alternative solution:

Sorting unicode is not possible without knowing the users locale, this is already indirectly set through the ini file, where we have the langFile that we could use to determine locale.

C++ Standard does not allow for sufficient collation to sort strings language aware, so as a proposal, I would statically link against ICU and use an ICU Collator determined from the configured locale to provide a comparator that allows for culture specific alphabetic sorting.

This is a bit of additional effort, specially the inclusion of a third party binary-library. If it is fine for you I would use the following days to come up with a draft

sort animals in UI alphabetically

91ae800

Toxin4ick requested changes Jul 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

sort animals in UI alphabetically #2

sort animals in UI alphabetically #2

Uh oh!

GulaschLulatsch commented Jul 5, 2025 •

edited

Loading

Uh oh!

GulaschLulatsch commented Jul 14, 2025

Uh oh!

Toxin4ick Jul 23, 2025

Uh oh!

GulaschLulatsch Jul 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sort animals in UI alphabetically #2

Are you sure you want to change the base?

sort animals in UI alphabetically #2

Uh oh!

Conversation

GulaschLulatsch commented Jul 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GulaschLulatsch commented Jul 14, 2025

Uh oh!

Toxin4ick Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

GulaschLulatsch Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

GulaschLulatsch commented Jul 5, 2025 •

edited

Loading