Description
I ran into an implementation issue a few days ago in Chakra where calling formatToParts on a date from a non-Gregorian calendar can cause some confusion. Right now, V8 and Chakra agree that new Intl.DateTimeFormat("en-u-ca-chinese", { year: "numeric" }).format()
produces the string "2018(wu-xu)", while SpiderMonkey returns the string "35". As a result, when formatting to parts, SpiderMonkey returns a single part, { type: "year", value: "35" }
... however, what the expected result should be for V8 and Chakra is unclear to me. "2018(wu-xu)" is created at least on our end by passing "y" to udatpg_getBestPattern, which will return "r(U)" for the Chinese calendar, where "r" is the related Gregorian year and "U" is the year name.
Before continuing, if "2018(wu-xu)" is not the correct string to be producing here, then this may not be an issue. With that being said, it looks like all of the engines that I have access to on Windows (cant test JSC+Intl right now) generally are using udatpg_getBestPattern or something vaguely similar, so even if this string is wrong, this issue may come up again in the future. Additionally, if the ability to ask for short or long date is added, then we will definitely run into issues where basically any LDML specifier could show up in the users preferred short/long date format, so formatToParts would need to be able to map every LDML specifier to a part type. As such, with the current part types, is it expected that there will be two "year" parts coming from this string? That goes against an unwritten rule that I have always assumed about this API where there would only ever be 0 or 1 of each non-literal part, such that you could do ...formatToParts().filter((part) => part.type === "year")[0]
to get the correct year. Perhaps that was a bad assumption on my part, but if there are multiple year fields, how will an application know which is the preferable one for when they just want to display "the year," whatever that may mean? I don't know enough about the Chinese calendar to say if 2018 or wu-xu (or neither) is the best answer, but I don't think the API is currently set up to handle this situation.
If we agree that 2018 and wu-xu should not both be type = year, I would propose adding a new part type, "relatedYear," to line up with the LDML's "r" field.
/cc @zbraniecki, @jefgen
Activity
anba commentedon Mar 29, 2018
I guess you create the
UDateTimePatternGenerator
forudatpg_getBestPattern
withudatpg_open("en-u-ca-chinese")
for the example case from above? Because that seems to be incorrect and insteadudatpg_open("en")
should be used.https://tc39.github.io/ecma402/#sec-lookupmatcher and https://tc39.github.io/ecma402/#sec-bestfitmatcher are both required to return an element from their
availableLocales
parameter. TheavailableLocales
parameter is always the[[AvailableLocales]]
list of a specific Intl constructor.https://tc39.github.io/ecma402/#sec-resolvelocale saves the return value of LookupMatcher/BestFitMatcher into
result.[[dataLocale]]
. The data-locale in this case is "en".https://tc39.github.io/ecma402/#sec-initializedatetimeformat uses
[[dataLocale]]
to find a matching pattern.That means the data-locale "en" needs to be passed to
udatpg_open
and not the resolved locale "en-u-ca-chinese".And least that's how I interpret the spec and it's what we've implemented in SpiderMonkey. 😄
JSC agrees with SpiderMonkey and also returns "35".
jackhorton commentedon Mar 29, 2018
That makes sense. I can update Chakra to match.
With that being said, I am not sure I agree with the idea that the format for a date should be picked irrespective of the calendar being used.
EDIT: Also, what if a language-script-region has a default calendar that is the Chinese calendar?
Use the base locale when getting the best match pattern
anba commentedon Apr 12, 2018
Thankfully https://github.com/unicode-cldr/cldr-core/blob/master/supplemental/calendarPreferenceData.json doesn't show any region which defaults to the Chinese calendar. So it's not (yet) an issue, at least when only considering CLDR data. 😄
jackhorton commentedon Apr 12, 2018
Isn't it theoretically possible for a user to say on their system that they prefer a specific calendar? I know ICU has had something between trouble with and disregard for picking up that preference at least on Windows, but in general, a user should be able to say that they prefer the Chinese calendar and have
new Intl.DateTimeFormat().format()
print something reasonable for them.zbraniecki commentedon Apr 12, 2018
Historically ICU just didn't look into OS Regional Prefs (I did try to champion an idea for ECMA402 to allow implementers to do that), but user can still do that by setting their locale to
en-US-u-ca-chinese
- that literally means I want en-US with chinese calendar which is the same as if we looked into OS Regional Prefs.But the crux is what happens next. With no data for chinese calendar, we'll just negotiate it and fallback on gregorian as the only calendar we have data for.
So it's not about the API (at least not primarily) but about data availability. It's not different from trying to format a
2-letter
month if we only havenumeric
or23h
hourCycle if we only have 12/24. We'll fallback on the next best thing.jackhorton commentedon Apr 12, 2018
In this case, if im not mistaken, we do have data for the Chinese calendar, though? Perhaps I am misinterpreting what youre saying, but when I ask ICU for something with the Chinese calendar, it knows how to use the Chinese calendar.
littledan commentedon Apr 13, 2018
@jackhorton We've talked about exposing the UA calendar settings in
navigator.locales
, see whatwg/html#3046jungshik commentedon Apr 14, 2018
@littledan: aside from that, the current spec about 'u-ca-' is a bit unexpected from the API user's point of view.
CLDR does have separate formatting data for non-Gregorian calendars, but when picking up the best match patten for formatting, the current spec dictates that the base locale without 'u-ca-' be used while actual date is calculated using the calendar specified in 'u-ca'.
v8 switched to using the base locale (per spec) for now, but we need to revisit whether using base locale is a good idea.
( https://chromium-review.googlesource.com/c/v8/v8/+/1006915 ).
littledan commentedon Apr 16, 2018
Overall, the idea in the original post sounds good to me. I imagine the change would be in two parts:
If this sort of fix sounds reasonable to everyone, we could whip up the spec text and have a PR pretty soon. Would this solve the problem?
[Merge to 6.7] Use the base locale when getting the best match pattern
Normative: Improve handling of non-Gregorian calendars
4 remaining items
Normative: Improve handling of non-Gregorian calendars
Normative: Allow calendar to determine choice of pattern
Normative: Permit relatedYear and yearName in output
Normative: Permit relatedYear and yearName in output
srl295 commentedon Nov 19, 2019
2018(wu-xu)
is the correct (linguistically) behavior. Using nodejs HEAD (v8 versoin 7.9.317.20-node.20 ) with a harmony flag:srl295 commentedon Nov 19, 2019
http://demo.icu-project.org/icu4jweb/flexTest.jsp?pat=y&_=th Thai defaults to buddhist.
Normative: Allow calendar to determine choice of pattern
Normative: Permit relatedYear and yearName in output
Normative: Permit relatedYear and yearName in output (#351)
Normative: Allow calendar to determine choice of pattern
Normative: Allow calendar to determine choice of pattern (#349)
sffc commentedon Jun 5, 2020
Is this issue fixed now that #349 is checked in? Can you split out remaining work into a new ticket?