-
Notifications
You must be signed in to change notification settings - Fork 244
Avoid claiming that HijriSimulated corresponds to islamic-rgsa #7311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
robertbastian
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't remove the docs if we still have the logic
icu4x/components/calendar/src/any_calendar.rs
Line 865 in c46d47f
| Hijri(Some(HijriCalendarAlgorithm::Rgsa)) => Ok(AnyCalendarKind::HijriSimulatedMecca), |
|
How can we best go from what API surface we have shipped to acknowledging that On the Web Platform side, the result of the December 2025 TG2 meeting was to treat Can we deprecate our icu4x/components/locale_core/src/preferences/extensions/unicode/keywords/calendar.rs Line 56 in 2caa5a1
(I'm pretty sure that I have tried to make the point about the history and implementation situation of |
|
The What we should remove/deprecate is the |
|
CC @sffc and @Manishearth to agree on the approach before writing a more complex PR. |
|
I think we should have further discussion about the desired end state first. |
sffc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR removes a factual statement. If the user provides "islamic-rgsa" as their locale calendar, ICU4X returns the HijriSimulatedMecca calendar. We can discuss separately how we want to behave here, which involves agreeing on what we want the end state to be, but let's not delete valid documentation for what is the current status quo.
The statement is a statement about CLDR correspondence, and (I believe) the claim of CLDR correspondence is incorrect.
When I added CCs, I meant to have that conversation. Do we agree that the following statements are true?
If we can agree that the above statement are true, can we agree on the following conclusions?
|
|
I think the core disagreement between people here and your points is whether I wanted this documentation because as our enums have, over time, diverged from the BCP47/CLDR names, and this makes things quite confusing when debugging. |
As I said in last week's meeting, I think it's bad for a library that is positioned and regarded as embodying i18n expertise to have things whose status is "whatever" but it's not apparent to the library user. |
I consider https://cldr.unicode.org/development/development-process/design-proposals/islamic-calendar-types to be the source of truth for the CLDR calendar definitions. It defines The Reingold simulations are, I believe, close enough to the real sightings that it's possible to encode sighting data with an adjustment of exactly 0 or 1 days, whereas Tabular Islamic seems to require -2, -1, 0, 1, or 2 days. The above document was apparently published in 2013, which is not really all that long ago. We should be able to find the people involved in that decision. |
Where does it say that?
How many days does UAQ need? |
|
The ICU-TC has long considered, and still considers, Reingold to be the gold standard for astronomical simulations. So, if the goal is to match the moment when a new moon actually occurs, then Reingold should be the basis over any of the other algorithmic approaches. Umm al-Qura involves a proprietary/heuristic "Saudi criterion". In practice, Umm al-Qura is usually ahead of the human observation, but it could be behind. I'll try to explain my understanding in a table:
Let me explain the third column a bit more. Suppose Tabular says that Ramadan starts on Wednesday. Your sighting-adjustment code needs to be prepared to adjust it to Monday (if the table is really late with its crescent), Tuesday, Wednesday, Thursday, or Friday (if the table is really early with its crescent). If Reingold says that Ramadan starts on Wednesday, though, then Ramadan will either start on Wednesday or Thursday (if there was cloud cover or the crescent wasn't visible before sunset, etc). I think but am not certain that Ramadan wouldn't be starting on a Friday. If Umm al-Qura says it starts on Wednesday, then the result is probably also Wednesday or Thursday, but it's possible depending on the version of Umm al-Qura being used that it could be late and Ramadan actually started on Tuesday. This post is based on my understanding of the situation and could change based on additional facts/data. |
Is there any attested use case indicating that users need an unofficial approximation that's expected to yield results somewhere between the two Saudi Arabia-based authoritative sources (KACST's simulation, which we do have the data for for a date range, and sighting, which we don't have the data for and cannot have future data for)? Oracle requested an identifier with semantics that we don't have the data for and OpenJDK didn't end up shipping it. ICU4X shouldn't do something approximal in order to have some behavior for all the minted identifiers. ICU4X should be able to conclude that ICU4X doesn't have an implementation for what the identifier is meant for. |
|
Discussion about future of this calendar:
|
|
Additional background: https://unicode-org.atlassian.net/browse/CLDR-5525 I talked with @macchiati who said:
|
|
My personal position:
|
This makes these goals incompatible as stated, since ICU4X's HijriSimulated does not match what ICU4C does.
This is reasonable, but we should document this as not matching any ground truth and instead being a source for adjustments.
Thanks for clarifying. My own position is that it should not be in AnyCalendar (regardless of what AnyCalendar ends up being).
This means we can continue to have a type for this with the utility you propose, without having any costs on other users.
How does this sound to you? Footnotes
|
|
ICU4C new Intl.DateTimeFormat("en", { calendar: "islamic", dateStyle: "long" }).format(new Date(2000, 0, 1))
'Ramadan 25, 1420 AH'
new Intl.DateTimeFormat("en", { calendar: "islamic-civil", dateStyle: "long" }).format(new Date(2000, 0, 1))
'Ramadan 24, 1420 AH'
new Intl.DateTimeFormat("en", { calendar: "islamic-rgsa", dateStyle: "long" }).format(new Date(2000, 0, 1))
'Ramadan 25, 1420 AH'
new Intl.DateTimeFormat("en", { calendar: "islamic-tbla", dateStyle: "long" }).format(new Date(2000, 0, 1))
'Ramadan 25, 1420 AH'
new Intl.DateTimeFormat("en", { calendar: "islamic-umalqura", dateStyle: "long" }).format(new Date(2000, 0, 1))
'Ramadan 24, 1420 AH'
new Intl.DateTimeFormat("en", { calendar: "islamic", dateStyle: "long" }).format(new Date(2001, 0, 1))
'Shawwal 7, 1421 AH'
new Intl.DateTimeFormat("en", { calendar: "islamic-civil", dateStyle: "long" }).format(new Date(2001, 0, 1))
'Shawwal 5, 1421 AH'
new Intl.DateTimeFormat("en", { calendar: "islamic-rgsa", dateStyle: "long" }).format(new Date(2001, 0, 1))
'Shawwal 7, 1421 AH'
new Intl.DateTimeFormat("en", { calendar: "islamic-tbla", dateStyle: "long" }).format(new Date(2001, 0, 1))
'Shawwal 6, 1421 AH'
new Intl.DateTimeFormat("en", { calendar: "islamic-umalqura", dateStyle: "long" }).format(new Date(2001, 0, 1))
'Shawwal 6, 1421 AH'In the first set, |
|
From looking deeper, I might have been misled by this pair of ICU4C comments: One says that IslamicCalendar is IslamicCivilCalendar, the other says that IslamicRgsaCalendar is IslamicCalendar. Perhaps that comment is talking about different ctors? Looking through the code I can see this:
So, documentation aside, IslamicCivilCalendar is not the same as IslamicRgsaCalendar. |
|
Redoing my proposal from my previous comment in light of new info: @sffc Most of your arguments are about why we ought to have such a type. Most of my concerns are about it showing up in AnyCalendar, how does this sound as a solution:
This means we can continue to have a type for this with the utility you propose, without having any costs on other users. |
Can you elaborate or post a link to where you've previously elaborated on these concerns?
I no longer consider HijriSimulated to be costly to other users since #7301 was merged. I consider the documentation/reputational* cost of HijriSimulated not being HijriSimulated for the remainder of 2.x to be much greater than a few kB saved from a function that is much larger than that. I don't see a compelling reason not to kick the AnyCalendar decision down the road to 3.0. * By "reputational" I mean that we've already landed too much deprecated stuff in |
Partly, in #7320 (comment) I want AnyCalendar to contain "i18n-useful, production-quality calendars". I think HijriSimulated is not that useful on its own, though I think you've convinced me that it's something we should expose for people to build on top of. I think there's an argument to be made that it is useful on its own since its errors are restricted to a single month at a time (unlike Chinese, where the entire year gets messed up), but I'd want to see how often those errors occur. I worry about implicitly encouraging people to use an incomplete calendar by exposing it this way, even if we document it. I don't think it's a great look to say "here is a calendar in our default set, but beware, it's very often wrong". I think a fresh-slate approach to RGSA might be better. Codesize also was a concern but it's probably resolved. I still don't enjoy carrying around the ~500B of data for the simulated history, but it's fine. I recognize that with codesize resolved these concerns are not as strong.
I think that's a good argument. Note that the calendar types are already deprecated, it's just the variants and the Rules implementor that would be deprecated. So it's not much. But yes, it's a risk. I don't think most of our users are directly using these APIs so I am not convinced of the reputational cost. Overall my experience in Rust is that deprecation is not in and of itself viewed as a bad thing (it shows a commitment to stability). Churn is, though, especially if APIs people are using get moved around a bunch, but I do not expect there to be many direct users of these types. |
I don't see what makes HijriSimulated special here, and why UAQ or tabular cannot be used as a basis. A previous argument that it's at most off by 1 day does not seem to be true (#7311 (comment)).
I agree, the packed data is 500B, code size shouldn't be much bigger either. In the grand scheme of The only argument I really see is aligning with ICU4C. I'm not sure how good of an argument that is:
The question is what to do with the
|
|
Interesting: the official Saudi sightings announced by the High Judiciary Council of Saudi Arabia might yield 31-day months, which is not compatible with our data model. |
It's substantially different from the other calendars in ICU4X (except possibly The other calendars exist in the world independently of ICU4X, ICU4C, ICU4J, CLDR, and Reingold, and at least for near-present-day dates the ICU4X implementations are exact matches for the other existence and not approximations. (To be clear, Saudi Arabia sighting exists independently of ICU4X, ICU4C, ICU4J, CLDR, and Reingold; I mean
Are we trying to fit use cases to the situation being that we have the code, or are there use cases that make it necessary to have the code?
A calendar as a basis to apply sighting adjustments on top of does not appear to be a thing that ICU4X has been designed for. Have users asked for this, or is this about working from having code backwards towards a use case? Microsoft's non-Umm al-Qura Hijri calendar (the one that corresponds to It seems to me that this means that even for the purpose of displaying the current sighting-based date in the system UI, this has the issue of not being able to display day 30 when the tabular rule doesn't allow it. Furthermore, it seems that an adjustment that's made for the purpose of the current month also affecting calculations concerning past and future dates outside the current month could cause problems. Do we know of user opinions of how well this Windows feature meets user needs?
In ICU4C (but not in ICU4J last I looked) The sequence of events that resulted in
Indeed it's not the same. It appears that the first implementation in ICU4J had civil (tabular) and astronomical simulation in the same class with civil as the default and a civil flag that the caller could set to false to activate the astronomical simulation. Somewhere along the way the default has flipped the other way round so that ICU4C
ICU4C has a reputation of representing i18n expertise and what it does has been assumed to be appropriate to expose to the Web. Dealing with the realization that ICU4C's I think as a reputational matter, we should avoid a situation where someone assumes that you get Saudi Arabia sighting from ICU4X when asking for
A key part of how the ECMA side ended up in the current state was that it didn't seem useful to clone ICU4C's
I think it's better to clearly not claim implementation than to be knowingly wrong.
I think the caller should not get a Hijri calendar to make it clear that we don't have actual The TG2 answer from the December meeting has the outcome of "use the region's default calendar" for JavaScript |
Furthermore, our model isn't compatible with the same (era, era-year, month, day) tuple being used for more than one rata die within one calendar. We should not be treating Oracle's request for an identifier that OpenJDK didn't even ship with as a reason for ICU4X to ship a calendar. |
|
The "request from Oracle" is completely irrelevant to the fact that islamic-rgsa has been shipping with this behavior for 12 years. It doesn't matter who requested it or asked for it or why it landed that way. |
|
FWIW, just looking at the code, the ICU4C implementation directly uses ICU4X's HijriSimulated was computed via icu4x/utils/calendrical_calculations/src/islamic.rs Lines 41 to 56 in 775be2f
This includes location data for calculating phasis. As noted previously, due to the nature of the Islamic calendar, this can at most cause one or two days to be labeled as being in the wrong month, but it's unclear to me how common that is. Quick back of the envelope calculation: The Shaukat criterion looks for the new moon when the sun is 4.5 degrees below the horizon, and the moon is at least 4.1 degrees above. The moon-sun angle shifts by around 13 degrees a day. This means that the Shaukat criterion is looking for situations where the moon-sun angle is between 8.6 and ~13 degrees on the day of the new moon. On the other hand, the ICU4C criterion is looking for a new moon, which is anywhere between 0 and 13 degrees. So I'd imagine that ICU4C and ICU4X would diverge on month starts around a third of the time. This is quick and dirty math and it's likely I've made a mistake. We should just compare the two. |
|
When we implemented the calendrical calculations, we made the decision to implement from Reingold instead of porting ICU4C, so I wouldn't be surprised if the crescent criterion is different; hopefully the ICU4X criterion is better. I don't want my position to differ based on a choice that was made to improve the quality of this calendar. |
|
That's fine, I just wanted to highlight that we are not implementing exactly what ICU4C does. It is specifically clarifying the facts presented in #7311 (comment) |
When there's an actual software compatibility constraint, it may be justified to do something that's knowingly not what it says on the label. (To be clear, for reasons upthread, I'm not suggesting making it do what it says on the label, I'm suggesting not pretending to have an implementation of
I find the argumentation that ICU4X should continue to have a mapping that isn't known to serve a software compatibility need, isn't know to address user needs, and is known not to be what it says on the label very frustrating. The reason why I care is that ICU4C giving the appearance that it supports |
I really really disagree with your characterisation that calendars are on a quality scale. They are not, they are either correct or not correct. ICU4C's implementation does not match what has been and will be observed in Mecca, and neither does ours. It doesn't matter that we tried to do the maths differently. In fact, it's bad that we created yet another Hijri calendar that doesn't actually exist. |
I really (really?) disagree with your characterization that these calendars are correct or not correct. There is no such thing as a generally correct Hijri calendar, since it is based on human observations. Umm al-Qura plasters over the problem in controverial ways, and Tabular makes no claim of matching ground truth. Developers working with Hijri calendars have multiple options to approximate the calendar, all with their own tradeoffs. HijriSimulated is one such option. |
There is a correct calendar for |
You are right that there is no generically correct Hijri calendar, but we're not trying to ship The way I see "correctness" of individual calendars is: Does it match what people use? Then it's correct, provided it is defined in a way that is sufficiently unambiguous about which calendar it is so that the people who use it can use it and the people who don't aren't misled. The UAQ calendar is in use as a civil calendar. The Dawoodi Bohra calendar (which is one of the two Tabular ones, I can't remember which1), is in use: multiple Bohri friends of mine have it on their walls, and when I've asked about observations they've said "oh yeah we don't do that, other Muslim groups do that". The goal of this library is not to provide religiously accurate calendars based on scriptural interpretation. The goal is to provide calendars that people can actually use, and in that sense, at least some of these calendars are in use. Footnotes
|
If someone reaches for If someone reaches for We can't implement the thing on the CLDR documentation pages, but we can implement the thing from ICU4C.
Thanks, this is helpful background. Is there evidence of |
Using my "developer preference vs user preference" distinction in #7320 (comment), I'm not sure that user preferences would be based on "the behavior from ICU4C". At best, user preferences would be based on "oh, this calendar is close enough to what I actually use". My understanding is that UAQ should also be "close enough", in that sense. Actually, it might be closer than ICU4C, since UAQ incorporates the Mecca sighting location in its math. Developer preference could be based on ICU4C, but developers would likely be using the enum variants directly, not the codes, and they are best suited to be nudged by documentation.
I remember doing research on this ages ago and finding some stuff, but I'm not sure. Overall, the Friday epoch seems to be more rare. It would be nice to do this research properly at some point. |
|
@sffc yesterday you told us that ECMA is a VIP client. ECMA wants to treat
Can we please not relitigate all the other calendars. |
My position is that if ECMA needs a feature that is best implemented in the core library, then we should do so. ECMA has a lot of weird handling of locale identifiers, and ICU4X exports all the necessary building blocks for it.
The status quo is that ICU4X ships the four models that ICU4C ships, with improvements to match Reingold better. People seem to want to change the status quo to instead reflect things that are in real use. (I agree for 3.0 but disagree for 2.x) If that is the case, then the corollary is that we should consider removing not only islamic-rgsa but also islamic-civil. |
|
The usage in Temporal means that islamic-civil cannot be removed regardless of whether it is used, until Temporal makes a decision on removing it. This does not hold for rgsa or japanext. I think it's definitely worth investigating: I am not happy with having calendars in our code where we do not have full documentation on what actual in-use calendar they map to. I also think it's not a priority to fix that. The proposal is basically to switch from "ship what ICU4C ships" to "ship what Temporal wants". We have basically never actually shipped what ICU4C ships for most nontrivial calendars, aside from the basic code correspondence. |
That's an interesting question for docs, both ICU4X and MDN. However, we don't need to explore that as a prerequisite to proceeding here. The Hijri tabular concept as well as the parameters used for the particular instantiation exist independently of ICU4X/ICU4C/ICU4J/CLDR/Reingold, so we are not making up a calendar the way we are with |
Publicly-available information about how the islamic-rgsa identifier was minted for JDK purposes, how JDK acknowledges past-sighting-data-only calendar possibility, the context of minting the identifier together with islamic-umalqura (an SA simulation), and the CLDR display name for islamic-rgsa all indicate that islamic-rgsa is supposed to denote sighting
data (that can be past only), so we shouldn't be saying that a simulation corresponds to it.