Skip to content

Don't output conflicting fields in datagen #6477

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

sffc
Copy link
Member

@sffc sffc commented Apr 21, 2025

Fixes #6205

@@ -11,7 +11,7 @@
"elements": [
"LLLL U",
"MM.y",
"r(U) MMMM",
"r(U) LLLL",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data is strange. availableFormats has:

                "Gy": "U",
                "GyMMM": "LLL U",
                "GyMMMd": "d MMM U",
                "GyMMMEd": "E, d MMM U",
                "GyMMMM": "r(U) MMMM",
                "GyMMMMd": "r(U) MMMM d",
                "GyMMMMEd": "r(U) MMMM d, E",
...
                "y": "U",
                "yyyy": "U",
                "yyyyM": "MM.y",
                "yyyyMd": "dd.MM.y",
                "yyyyMEd": "E, dd.MM.y",
                "yyyyMMM": "LLL U",
                "yyyyMMMd": "d MMM U",
                "yyyyMMMEd": "E, d MMM U",
                "yyyyMMMM": "LLLL U",
                "yyyyMMMMd": "r(U) MMMM d",
                "yyyyMMMMEd": "r(U) MMMM d, E",

So, we get variant patterns for LLLL U (without era) and r(U) MMMM (with era), but only for the long form: the medium (abbreviated) form has LLL U for both.

Not sure what's right. Seems like more of a CLDR problem than an ICU4X problem at the current time.

@@ -15,6 +15,6 @@
"ccc, dd.MM.y г.",
"EEEE, d MMMM y г. GGG",
"E, d MMM y г. GGG",
"E, dd MMM y г. GGG"
"ccc, dd MMM y г. GGG"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is the motivating issue.

Manishearth
Manishearth previously approved these changes Apr 21, 2025
Copy link
Member

@Manishearth Manishearth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

general shape lgtm. looks like the algorithm needs some work still from your comments

@@ -19,6 +21,12 @@ use icu_provider::prelude::*;

use super::DatagenCalendar;

type Trio<'a> = (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: some docs on what this is?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed it to VariantPatterns, made it a struct, and gave it some docs.

for pattern_item in pattern_items.iter_mut() {
let PatternItem::Field(field) = pattern_item else {
continue; // nothing to do
while let Err(e) = names.load_for_pattern(&DebugProvider, &date_time_pattern) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: can we guarantee that this terminates? PErhaps we should document why this terminates OR have some kind of debug tracker that ensures this only runs for X iterations

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think it terminates, assuming that the code is written correctly. The loop only evaluates if there was a ConflictingField error. Each time the loop evaluates, it should reduce the ConflictingField errors by one, because it replaces the conflicting field with a non-conflicting field.

I'll add a comment about this.

@@ -213,6 +213,9 @@ impl SourceDataProvider {
})
.map(|trio| {
enforce_consistent_field_lengths(trio, |previous_field, field| {
if attributes.as_str() == "ej" {
return; // too noisy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: what's going on here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to filter by quality instead of a blanket filter on "ej".

@@ -11,7 +11,7 @@
"elements": [
"LLLL U",
"MM.y",
"r(U) MMMM",
"r(U) LLLL",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Same issue as in Chinese.

@sffc
Copy link
Member Author

sffc commented Apr 22, 2025

@Manishearth Let me know if you want me to improve the commit history

@sffc sffc marked this pull request as ready for review April 22, 2025 23:53
@sffc sffc requested review from robertbastian, zbraniecki and a team as code owners April 22, 2025 23:53
@sffc
Copy link
Member Author

sffc commented Apr 22, 2025

The warnings in full datagen are:

gregorian/ie/ym0de: conflicting field: EEEE <=> E
gregorian/hsb/ym0: conflicting field: LLLL <=> MMMM
gregorian/el-polyton/ym0: conflicting field: MMM <=> LLL
gregorian/rw/ym0d: conflicting field: MMMM <=> MMM
gregorian/et/ym0de: conflicting field: MMMM <=> MMM
gregorian/pl/ym0: conflicting field: LLL <=> MMM
gregorian/ast/ym0: conflicting field: LLLL <=> MMMM
gregorian/ie/ym0: conflicting field: MMM <=> LLL
japanese/el-CY/ym0d: conflicting field: GGGGG <=> GGG
gregorian/hy/ym0: conflicting field: LLLL <=> MMMM
gregorian/hy/ym0: conflicting field: LLL <=> MMM
gregorian/dsb/ym0: conflicting field: LLLL <=> MMMM
gregorian/el/ym0: conflicting field: MMM <=> LLL
dangi/gd/ym0: conflicting field: LLLL <=> MMMM
dangi/ru-BY/ym0: conflicting field: LLLL <=> MMMM
dangi/ru-KG/ym0: conflicting field: LLLL <=> MMMM
dangi/ru-UA/ym0: conflicting field: LLLL <=> MMMM
gregorian/bs/ym0: conflicting field: LLLL <=> MMMM
dangi/ru-MD/ym0: conflicting field: LLLL <=> MMMM
gregorian/el-CY/ym0: conflicting field: MMM <=> LLL
dangi/ru-KZ/ym0: conflicting field: LLLL <=> MMMM
chinese/ru-BY/ym0: conflicting field: LLLL <=> MMMM
dangi/ru/ym0: conflicting field: LLLL <=> MMMM
japanese/el-CY/ym0d: conflicting field: GGGGG <=> GGG
japanese/el/ym0d: conflicting field: GGGGG <=> GGG
japanese/el-polyton/ym0d: conflicting field: GGGGG <=> GGG
japanese/el/ym0d: conflicting field: GGGGG <=> GGG
japanese/el-polyton/ym0d: conflicting field: GGGGG <=> GGG
chinese/ru-UA/ym0: conflicting field: LLLL <=> MMMM
chinese/ru/ym0: conflicting field: LLLL <=> MMMM
chinese/ru-KG/ym0: conflicting field: LLLL <=> MMMM
chinese/ru-MD/ym0: conflicting field: LLLL <=> MMMM
chinese/gd/ym0: conflicting field: LLLL <=> MMMM
chinese/ru-KZ/ym0: conflicting field: LLLL <=> MMMM

@Manishearth
Copy link
Member

Let me know if you want me to improve the commit history

Yes, just squashing commits such that there is one for the code changes and one for the data changes would be great

@sffc
Copy link
Member Author

sffc commented Apr 23, 2025

It's a bit hard to squash commits since there are merge commits involved. I'll rewrite the branch.

@sffc sffc force-pushed the weekday-bug-6205 branch from ba9cc6c to cb1569e Compare April 23, 2025 01:19
@sffc
Copy link
Member Author

sffc commented Apr 23, 2025

Branch rewrite done.

There isn’t anything to compare.
ba9cc6c and cb1569e are identical.

@sffc sffc requested a review from Manishearth April 23, 2025 01:20
@Manishearth
Copy link
Member

Thank you so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Debug panic in datetime
2 participants