-
Notifications
You must be signed in to change notification settings - Fork 399
CLDR-17851 SupplementalCalendarData parser, update ExampleGenerator #4642
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLDR-17851 SupplementalCalendarData parser, update ExampleGenerator #4642
Conversation
- minimal, modern parser - minimal test case - commented-out 'dump' function
- update example generator to use
- move TestEraMap into TestSDI
@sffc this gives us a data driven approach that should be less crashy on era data changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand enough about how CLDR works to be able to give this a thorough review
mostly just a FYI that CLDR could now have data quality checks on |
example generator issue, will bring back soon |
- use new supplemental data info
- BH, added in CLDR-18464, is out of order… attempt to find the prev and next era by date.
i think this is ready for review - maybe the BH is a formatter issue |
Thank you for doing this! I'm not familiar enough either to review this unfortunately, although I could try applying it in my branch and check if it fixed the error if that works? |
Already did and it does work
El El vie, 25 abr 2025 a la(s) 6:24 p.m., Annemarie Apple <
***@***.***> escribió:
… *AEApple* left a comment (unicode-org/cldr#4642)
<#4642 (comment)>
Thank you for doing this! I'm not familiar enough either to review this
unfortunately, although I could try applying it in my branch and check if
it fixed the error if that works?
—
Reply to this email directly, view it on GitHub
<#4642 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGQZMZJVOIWBLKJ7VMBPMT23K73ZAVCNFSM6AAAAAB34QIRLKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQMZRGU4TMNRQGI>
.
You are receiving this because you were assigned.Message ID:
***@***.***>
|
tools/cldr-code/src/main/java/org/unicode/cldr/test/ExampleGenerator.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few items
tools/cldr-code/src/main/java/org/unicode/cldr/util/SupplementalCalendarData.java
Outdated
Show resolved
Hide resolved
tools/cldr-code/src/main/java/org/unicode/cldr/util/SupplementalCalendarData.java
Show resolved
Hide resolved
tools/cldr-code/src/main/java/org/unicode/cldr/util/SupplementalCalendarData.java
Show resolved
Hide resolved
tools/cldr-code/src/main/java/org/unicode/cldr/util/SupplementalCalendarData.java
Show resolved
Hide resolved
type = Integer.parseInt(xpath.getAttributeValue(INDEX, LDMLConstants.TYPE)); | ||
|
||
start = xpath.getAttributeValue(INDEX, LDMLConstants.START); | ||
startCalendar = forDateString(start); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A Calendar is pretty heavy-weight for this. Would be simplest to just extract the millis, or use an Instant if we want type-safety
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, an even better reason is that then you (and your clients) can avoid complications with null. The earliest start date is Integer.MIN_VALUE and the latest end date is Integer.MAX_VALUE.
There is one further complication. Normally missing start/end dates get the above values. However, for the Japanese calendar, the end date is set based on the next line's start data - 1.
So
<era type="0" start="645-6-19"/>
<era type="1" start="650-2-15"/>
...
<era type="235" start="1989-1-8" code="heisei"/>
<era type="236" start="2019-5-1" code="reiwa"/>
Is equivalent to
<era type="0" start="645-6-19" end="650-2-14" />
<era type="1" start="650-2-15" end="671-12-31"/>
...
<era type="235" start="1989-1-8" end="2019-4-30" code="heisei"/>
<era type="236" start="2019-5-1" code="reiwa"/>
So that means you don't know the end data for an era until the next one gets added. And that's assuming that you read them in file order. Otherwise you need to read them in, and sort by start date.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need more clarity in the spec here?
before this, i couldn't find any code in CLDR that read these lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that means you don't know the end data for an era until the next one gets added. And that's assuming that you read them in file order. Otherwise you need to read them in, and sort by start date.
That's why i had the end time be null there.
import org.unicode.cldr.icu.LDMLConstants; | ||
|
||
public class SupplementalCalendarData implements Iterable<String> { | ||
/** an <era> element */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/** an <era> element */ | |
/** an <era> element, which represents the era information from a calendar such as: | |
* <pre> | |
* <calendarData><supplementalData> | |
* <calendar type="islamic"> | |
* <calendarSystem type="lunar" /> | |
* <eras> | |
* <era type="0" start="622-7-15" code="islamic" aliases="ah"/> | |
* <era type="1" end="622-7-14" code="islamic-inverse" aliases="bh"/> | |
* </eras> | |
* </calendar> | |
* </pre> | |
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or just link to the spec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The behavior of start/end is documented in https://www.unicode.org/reports/tr35/tr35-69/tr35-dates.html#Calendar_Data
(It could use some wordsmithing to clarify that "open-ended" applies when there isn't a series; probably start that first. I'll file a ticket for that.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type = Integer.parseInt(xpath.getAttributeValue(INDEX, LDMLConstants.TYPE)); | ||
|
||
start = xpath.getAttributeValue(INDEX, LDMLConstants.START); | ||
startCalendar = forDateString(start); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, an even better reason is that then you (and your clients) can avoid complications with null. The earliest start date is Integer.MIN_VALUE and the latest end date is Integer.MAX_VALUE.
There is one further complication. Normally missing start/end dates get the above values. However, for the Japanese calendar, the end date is set based on the next line's start data - 1.
So
<era type="0" start="645-6-19"/>
<era type="1" start="650-2-15"/>
...
<era type="235" start="1989-1-8" code="heisei"/>
<era type="236" start="2019-5-1" code="reiwa"/>
Is equivalent to
<era type="0" start="645-6-19" end="650-2-14" />
<era type="1" start="650-2-15" end="671-12-31"/>
...
<era type="235" start="1989-1-8" end="2019-4-30" code="heisei"/>
<era type="236" start="2019-5-1" code="reiwa"/>
So that means you don't know the end data for an era until the next one gets added. And that's assuming that you read them in file order. Otherwise you need to read them in, and sort by start date.
} | ||
|
||
/** only works within the same cal system */ | ||
@Override |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Override | |
@Override | |
/* Normally comparisons require all of the fields to be listed, to ensure that no | |
* two unequal values a and b will never have a.compareTo(b) = 0. | |
* However, the structure of the survey tool guaranties that no two eras (for different calendarTypes) | |
* will overlap in start/end dates. | |
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the survey tool doesn't guarantee this - do you mean the spec? the unit tests ought to validate this. But I'm not sure what all of the constraints are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, we don't make that clear. Filed https://unicode-org.atlassian.net/browse/CLDR-18576
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. It is the spec not the Survey Tool.
.compare(getLatest(), o.getLatest()) | ||
.compare(getType(), o.getType()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.compare(getLatest(), o.getLatest()) | |
.compare(getType(), o.getType()) | |
.compare(getEnd(), o.getEnd()) |
Because of the above feature, you will never need the type for comparison; just the calendarType and end.
@macchiati this is getting pretty heavy weight. -and not working. Could I go back to the lightweight parser that succeeds for purposes of examples and split out the heavier api questions? |
Specifically, back to Friday's version (but memoize it so the object is only created once). And split out a PR for the more heavyweight processor that tries to sort out the situation for Japanese, etc. I was trying to make a parser that just reflected what was in the XML, to enable unit testing, but it is kind of turning into a general purpose implementation of the algorithm. |
- address some concerns - split other issues out to a separate PR- unicode-org#4648
3339539
to
0efdaee
Compare
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
- spotless
@macchiati PTAL. This is basically what you saw Friday, plus fixed the bad |
CLDR-17851
Fixes original root cause of CLDR-18464 Fix IndexOutOfBoundsException in ExampleGenerator.handleEras #4629
This PR completes the ticket.
ALLOW_MANY_COMMITS=true