Fix TrailingCase::Unchanged handling for Dutch#7863
Fix TrailingCase::Unchanged handling for Dutch#7863Manishearth merged 1 commit intounicode-org:mainfrom
Conversation
d7abc2e to
cf41d72
Compare
| } else { | ||
| break; | ||
| } |
There was a problem hiding this comment.
The code path with dutch_titlecase_count is a little confusing but fine
Optional: I would have expected something more like
| } else if i == 0 && is_dutch && is_ij_start(c) { | |
| // pass | |
| } else if i == 1 && is_dutch && is_ij_end(c) && self.titlecase_tail_casing == TrailingCase::Lower { | |
| mapping = MappingKind::Lower; | |
| } else { | |
| break; | |
| } |
There was a problem hiding this comment.
there's a lot more state in between ij_start and ij_end because you have to check for accents and stuff.
| _ => return false, | ||
| /// Is there an i at the beginning of the string which may be relevant | ||
| /// for Dutch titlecasing? | ||
| fn dutch_i_at_beginning(s: &'_ str) -> Option<DutchIData<'_>> { |
There was a problem hiding this comment.
Optional:
| fn dutch_i_at_beginning(s: &'_ str) -> Option<DutchIData<'_>> { | |
| fn dutch_i_at_beginning(s: &'_ str) -> Option<DutchIData<'_>> { | |
| let mut chars = s.chars(); | |
| let has_accent = match chars.next() { | |
| Some('i') | Some('I') => match chars.peek() { | |
| Some(ACUTE) => { | |
| chars.next(); | |
| true | |
| } | |
| _ => false | |
| }, | |
| Some('í') | Some('Í') => true, | |
| _ => return None | |
| } | |
| Some(DutchIData { /* ... */ } | |
| ) |
There was a problem hiding this comment.
I considered this, and waffled on choice for a while
Firstly, .peek() doesn't work that way, you have to call .peekable() to make an intermediate iterator and then peek.
I think
let peekable = chars.peekable();
match peekable.peek() {
Some(ACUTE) => peekable.next(), true,
...
}is not very different from having a rest local variable. The main improvement is that my version is a little bit more verbose with the structs, which I don't like but I think it's fine. I like not having to track control flow in my version because each branch returns. Sometimes I wish Rust had field-order construction for local structs....
| /// | ||
| /// In dutch titlecasing mode, the first N characters should be uppercased: | ||
| /// `ijabc` should titlecase to `IJabc`. | ||
| fn dutch_ij_pair_at_beginning_count(s: &str, mapping: &CaseMap) -> Option<usize> { |
There was a problem hiding this comment.
Suggestion (optional): make a helper function that takes a &str and returns a struct similar to DutchIData but works for either I or J, so you can re-use the logic. It can take const generics for the four magic chars i, I, í, Í
There was a problem hiding this comment.
It's not reusable logic, the j logic is pretty different
Fixes #7681
I couldn't figure out an easy way to keep the dutch handling within
full_helper(where all other locale-sensitive stuff lives), since this is the one case that actually affects the titlecasing uppercase-to-lowercase state change.I think this implementation works nicely. The
dutch_i_at_beginninganddutch_ij_pair_at_beginning_countsplit comes from when I was trying to retain the old code and refactor it to be reusable, but I realized that I could do this without retaining the old code. I can merge these two functions, but I kind of like how it turned out. They're more testable this way.They're not implemented on CaseMapContext since they run at the beginning of a string.
Changelog
icu_casemapping: Fix
TrailingCase::Unchangedhandling for Dutch