Skip to content

Conversation

@sffc
Copy link
Member

@sffc sffc commented Dec 17, 2025

CLDR-17223

  • This PR completes the ticket.

NOTE: The CLDR implementation of locale display names needs to be updated with the new algorithm.

ALLOW_MANY_COMMITS=true

@sffc
Copy link
Member Author

sffc commented Dec 17, 2025

Do I need to add this to any of the following files where I find moreInformation referenced:

  • TestCoverageLevel.txt
  • missingOk.txt
  • prettyPath.txt
  • PathHeader.txt
  • PathDescriptions.md
  • IdToPath.java
  • ExampleDependencies.java
  • TestXPathTable.java

Also, about this test failure:

Error:  (TestExampleGenerator.java:327)  Error: No example:	<[>	"//ldml/characters/nestedBracketReplacement[@source=\"([^\"]*+)\"]",

I want this to not appear in survey tool, because it's easy to get wrong and it hardly ever changes. How should I do this? Should I add it to TestExampleGenerator::DELIBERATE_EXCLUDED_EXAMPLES like moreInformation is?

@sffc sffc requested a review from macchiati December 17, 2025 02:23
@sffc sffc marked this pull request as ready for review December 17, 2025 02:23
@conradarcturus
Copy link
Contributor

I was worried about not every language using parentheses (thereby a substitution could fail) but parentheses are pretty universal for localePattern https://www.unicode.org/cldr/charts/48/by_type/locale_display_names.locale_name_patterns.html#4c3249dbb329101c
Screenshot 2025-12-19 at 11 36 43

Can you test out the algorithm that would implement this on RTL locales? I want to make sure it doesn't cause problems.

Naming-wise, I like "innerBracket" (a bracket that is in a bracket) over "nestedBracketReplacement", but I defer to you.

Per usual, that's my 2 cents but I am happy to follow your lead. I'm deferring the accept to someone who is a more frequent Design group member.

@macchiati
Copy link
Member

The stuff to update is in https://cldr.unicode.org/development/updating-dtds.

The way to hide stuff in the survey tool is not obvious. It is to use a ; HIDE suffix in PathHeader.txt

Example:
//ldml/dates/timeZoneNames/zone[@type="Etc/(GMT|UTC)(.*)"]/exemplarCity ; Special ; Suppress ; Etc/$1$2 ; exemplarCity ; HIDE

I'll file a ticket to fix that.

Copy link
Member

@macchiati macchiati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ticket has been accepted, so this is not blocked by that.

I have one open question; otherwise this looks good.


<!-- Moved up as part of change to moderate -->
<coverageLevel value="moderate" match="characters/ellipsis[@type='%ellipsisTypes']"/>
<coverageLevel value="moderate" match="characters/nestedBracketReplacement"/>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think coverage might need to be complete, that is, have the source attribute. But the value can be '%A'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(this comment is on an old version of the PR before I figured out how to do it)

| hi-u-nu-latn-t-en-h0-hybrid | Hindi (Hybrid: English, Western Digits) |
| en-u-nu-deva-t-de | English (Transform: German, Devanagari Digits) |
| fr-z-zz-zzz-v-vv-vvv-u-uu-uuu-t-ru-Cyrl-s-ss-sss-a-aa-aaa-x-u-x | French (Transform: Russian \[Cyrillic\], uu: uuu, a: aa-aaa, s: ss-sss, v: vv-vvv, x: u-x, z: zz-zzz) |
| hi-u-nu-latn-t-en-h0-hybrid | Hindi (Western Digits, Hybrid: English) |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. The question I have is, when there are both T and U, which is "most important" for the user. Example, I think that for hi-Latn, the fact that it is a hybrid might be more important.

Also need to check; I think we might have a specialized name for hi-Latin, namely Hinglish

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the order in order to make it more clear that the Unicode extension keywords apply to the main language and not the transform language. Example:

my-IN-t-en-MM-u-nu-latn
Current Spec Burmese (India, Transform: English [Myanmar [Burma]], Latin Digits)
Flatten, current order Burmese (India, Transform: English, Myanmar [Burma], Latin Digits)
Flatten, new order Burmese (India, Latin Digits, Transform: English, Myanmar [Burma])

With "Flatten, current order", it's not clear that "Latin Digits" applies to "Burmese (India)" and not "English (Myanmar [Burma])"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, makes sense

@sffc
Copy link
Member Author

sffc commented Jan 15, 2026

Can you test out the algorithm that would implement this on RTL locales? I want to make sure it doesn't cause problems.

I don't see how this would adversely impact RTL locales relative to not performing the replacement. We're swapping one bracket for another, with the same bidi classes.

Naming-wise, I like "innerBracket" (a bracket that is in a bracket) over "nestedBracketReplacement", but I defer to you.

Were you thinking something like:

<innerBracket bracket="(">[</innerBracket>

I feel very neutral on the name of the XML tag.

@sffc sffc force-pushed the nestedBracketReplacement branch from 09c4419 to 5bf384b Compare January 17, 2026 09:09
@jira-pull-request-webhook
Copy link

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

@sffc
Copy link
Member Author

sffc commented Jan 17, 2026

@conradarcturus did you wish to reply to my reply above, or shall I merge this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants