Fix TrailingCase::Unchanged handling for Dutch by Manishearth · Pull Request #7863 · unicode-org/icu4x

Manishearth · 2026-04-09T19:51:39Z

I couldn't figure out an easy way to keep the dutch handling within full_helper (where all other locale-sensitive stuff lives), since this is the one case that actually affects the titlecasing uppercase-to-lowercase state change.

I think this implementation works nicely. The dutch_i_at_beginning and dutch_ij_pair_at_beginning_count split comes from when I was trying to retain the old code and refactor it to be reusable, but I realized that I could do this without retaining the old code. I can merge these two functions, but I kind of like how it turned out. They're more testable this way.

They're not implemented on CaseMapContext since they run at the beginning of a string.

Changelog

icu_casemapping: Fix TrailingCase::Unchanged handling for Dutch

sffc · 2026-04-09T23:39:56Z

                } else {
                    break;
                }


The code path with dutch_titlecase_count is a little confusing but fine

Optional: I would have expected something more like

Suggested change

} else if i == 0 && is_dutch && is_ij_start(c) {

// pass

} else if i == 1 && is_dutch && is_ij_end(c) && self.titlecase_tail_casing == TrailingCase::Lower {

mapping = MappingKind::Lower;

} else {

break;

}

there's a lot more state in between ij_start and ij_end because you have to check for accents and stuff.

sffc · 2026-04-09T23:44:18Z

-                _ => return false,
+/// Is there an i at the beginning of the string which may be relevant
+/// for Dutch titlecasing?
+fn dutch_i_at_beginning(s: &'_ str) -> Option<DutchIData<'_>> {


Optional:

Suggested change

fn dutch_i_at_beginning(s: &'_ str) -> Option<DutchIData<'_>> {

fn dutch_i_at_beginning(s: &'_ str) -> Option<DutchIData<'_>> {

let mut chars = s.chars();

let has_accent = match chars.next() {

Some('i') | Some('I') => match chars.peek() {

Some(ACUTE) => {

chars.next();

true

}

_ => false

},

Some('í') | Some('Í') => true,

_ => return None

}

Some(DutchIData { /* ... */ }

)

I considered this, and waffled on choice for a while

Firstly, .peek() doesn't work that way, you have to call .peekable() to make an intermediate iterator and then peek.

I think

let peekable = chars.peekable(); match peekable.peek() { Some(ACUTE) => peekable.next(), true, ... }

is not very different from having a rest local variable. The main improvement is that my version is a little bit more verbose with the structs, which I don't like but I think it's fine. I like not having to track control flow in my version because each branch returns. Sometimes I wish Rust had field-order construction for local structs....

sffc · 2026-04-09T23:48:35Z

+///
+/// In dutch titlecasing mode, the first N characters should be uppercased:
+/// `ijabc` should titlecase to `IJabc`.
+fn dutch_ij_pair_at_beginning_count(s: &str, mapping: &CaseMap) -> Option<usize> {


Suggestion (optional): make a helper function that takes a &str and returns a struct similar to DutchIData but works for either I or J, so you can re-use the logic. It can take const generics for the four magic chars i, I, í, Í

It's not reusable logic, the j logic is pretty different

Manishearth requested a review from sffc April 9, 2026 19:51

Fix TrailingCase::Unchanged handling for Dutch

cf41d72

Manishearth force-pushed the dutch-trailing-case branch from d7abc2e to cf41d72 Compare April 9, 2026 20:05

sffc approved these changes Apr 9, 2026

View reviewed changes

Manishearth merged commit 5a3c265 into unicode-org:main Apr 10, 2026
34 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix TrailingCase::Unchanged handling for Dutch#7863

Fix TrailingCase::Unchanged handling for Dutch#7863
Manishearth merged 1 commit intounicode-org:mainfrom
Manishearth:dutch-trailing-case

Manishearth commented Apr 9, 2026

Uh oh!

sffc Apr 9, 2026

Uh oh!

Manishearth Apr 10, 2026

Uh oh!

sffc Apr 9, 2026

Uh oh!

Manishearth Apr 10, 2026

Uh oh!

sffc Apr 9, 2026

Uh oh!

Manishearth Apr 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

+} else if i == 0 && is_dutch && is_ij_start(c) {
+    // pass
+} else if i == 1 && is_dutch && is_ij_end(c) && self.titlecase_tail_casing == TrailingCase::Lower {
+    mapping = MappingKind::Lower;
+} else {
+    break;
+}

-fn dutch_i_at_beginning(s: &'_ str) -> Option<DutchIData<'_>> {
+fn dutch_i_at_beginning(s: &'_ str) -> Option<DutchIData<'_>> {
+    let mut chars = s.chars();
+    let has_accent = match chars.next() {
+        Some('i') | Some('I') => match chars.peek() {
+            Some(ACUTE) => {
+                chars.next();
+                true
+            }
+            _ => false
+        },
+        Some('í') | Some('Í') => true,
+        _ => return None
+    }
+    Some(DutchIData { /* ... */ }
+)

Conversation

Manishearth commented Apr 9, 2026

Changelog

Uh oh!

sffc Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Manishearth Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

sffc Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Manishearth Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

sffc Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Manishearth Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants