Skip to content

Clarifying the expected behavior for non-gregorian calendars in DateTimeFormat.formatToParts #225

Open
@jackhorton

Description

@jackhorton

I ran into an implementation issue a few days ago in Chakra where calling formatToParts on a date from a non-Gregorian calendar can cause some confusion. Right now, V8 and Chakra agree that new Intl.DateTimeFormat("en-u-ca-chinese", { year: "numeric" }).format() produces the string "2018(wu-xu)", while SpiderMonkey returns the string "35". As a result, when formatting to parts, SpiderMonkey returns a single part, { type: "year", value: "35" }... however, what the expected result should be for V8 and Chakra is unclear to me. "2018(wu-xu)" is created at least on our end by passing "y" to udatpg_getBestPattern, which will return "r(U)" for the Chinese calendar, where "r" is the related Gregorian year and "U" is the year name.

Before continuing, if "2018(wu-xu)" is not the correct string to be producing here, then this may not be an issue. With that being said, it looks like all of the engines that I have access to on Windows (cant test JSC+Intl right now) generally are using udatpg_getBestPattern or something vaguely similar, so even if this string is wrong, this issue may come up again in the future. Additionally, if the ability to ask for short or long date is added, then we will definitely run into issues where basically any LDML specifier could show up in the users preferred short/long date format, so formatToParts would need to be able to map every LDML specifier to a part type. As such, with the current part types, is it expected that there will be two "year" parts coming from this string? That goes against an unwritten rule that I have always assumed about this API where there would only ever be 0 or 1 of each non-literal part, such that you could do ...formatToParts().filter((part) => part.type === "year")[0] to get the correct year. Perhaps that was a bad assumption on my part, but if there are multiple year fields, how will an application know which is the preferable one for when they just want to display "the year," whatever that may mean? I don't know enough about the Chinese calendar to say if 2018 or wu-xu (or neither) is the best answer, but I don't think the API is currently set up to handle this situation.

If we agree that 2018 and wu-xu should not both be type = year, I would propose adding a new part type, "relatedYear," to line up with the LDML's "r" field.

/cc @zbraniecki, @jefgen

Activity

anba

anba commented on Mar 29, 2018

@anba
Contributor

I guess you create the UDateTimePatternGenerator for udatpg_getBestPattern with udatpg_open("en-u-ca-chinese") for the example case from above? Because that seems to be incorrect and instead udatpg_open("en") should be used.

[[AvailableLocales]] is a List [...]. Language tags on the list must not have a Unicode locale extension sequence.

That means the data-locale "en" needs to be passed to udatpg_open and not the resolved locale "en-u-ca-chinese".

And least that's how I interpret the spec and it's what we've implemented in SpiderMonkey. 😄

(cant test JSC+Intl right now)

JSC agrees with SpiderMonkey and also returns "35".

jackhorton

jackhorton commented on Mar 29, 2018

@jackhorton
ContributorAuthor

That makes sense. I can update Chakra to match.

With that being said, I am not sure I agree with the idea that the format for a date should be picked irrespective of the calendar being used.

EDIT: Also, what if a language-script-region has a default calendar that is the Chinese calendar?

anba

anba commented on Apr 12, 2018

@anba
Contributor

EDIT: Also, what if a language-script-region has a default calendar that is the Chinese calendar?

Thankfully https://github.com/unicode-cldr/cldr-core/blob/master/supplemental/calendarPreferenceData.json doesn't show any region which defaults to the Chinese calendar. So it's not (yet) an issue, at least when only considering CLDR data. 😄

jackhorton

jackhorton commented on Apr 12, 2018

@jackhorton
ContributorAuthor

Isn't it theoretically possible for a user to say on their system that they prefer a specific calendar? I know ICU has had something between trouble with and disregard for picking up that preference at least on Windows, but in general, a user should be able to say that they prefer the Chinese calendar and have new Intl.DateTimeFormat().format() print something reasonable for them.

zbraniecki

zbraniecki commented on Apr 12, 2018

@zbraniecki
Member

but in general, a user should be able to say that they prefer the Chinese calendar and have new Intl.DateTimeFormat().format() print something reasonable for them.

Historically ICU just didn't look into OS Regional Prefs (I did try to champion an idea for ECMA402 to allow implementers to do that), but user can still do that by setting their locale to en-US-u-ca-chinese - that literally means I want en-US with chinese calendar which is the same as if we looked into OS Regional Prefs.

But the crux is what happens next. With no data for chinese calendar, we'll just negotiate it and fallback on gregorian as the only calendar we have data for.

So it's not about the API (at least not primarily) but about data availability. It's not different from trying to format a 2-letter month if we only have numeric or 23h hourCycle if we only have 12/24. We'll fallback on the next best thing.

jackhorton

jackhorton commented on Apr 12, 2018

@jackhorton
ContributorAuthor

In this case, if im not mistaken, we do have data for the Chinese calendar, though? Perhaps I am misinterpreting what youre saying, but when I ask ICU for something with the Chinese calendar, it knows how to use the Chinese calendar.

littledan

littledan commented on Apr 13, 2018

@littledan
Member

@jackhorton We've talked about exposing the UA calendar settings in navigator.locales, see whatwg/html#3046

jungshik

jungshik commented on Apr 14, 2018

@jungshik

@littledan: aside from that, the current spec about 'u-ca-' is a bit unexpected from the API user's point of view.

d8> new Intl.DateTimeFormat("en-u-ca-chinese").format()
"2/28/35"   <==  year 35 of 60-year cycle, 2nd month, 28th day 
d8> new Intl.DateTimeFormat("en").format()
"4/13/2018"

CLDR does have separate formatting data for non-Gregorian calendars, but when picking up the best match patten for formatting, the current spec dictates that the base locale without 'u-ca-' be used while actual date is calculated using the calendar specified in 'u-ca'.

v8 switched to using the base locale (per spec) for now, but we need to revisit whether using base locale is a good idea.
( https://chromium-review.googlesource.com/c/v8/v8/+/1006915 ).

littledan

littledan commented on Apr 16, 2018

@littledan
Member

Overall, the idea in the original post sounds good to me. I imagine the change would be in two parts:

  • Key the pattern sets off both the base locale and the calendar (but still not all Unicode extension keys)
  • For formatToParts, add the "related year" type.

If this sort of fix sounds reasonable to everyone, we could whip up the spec text and have a PR pretty soon. Would this solve the problem?

added a commit that references this issue on Apr 17, 2018
aaf6a7c

4 remaining items

added a commit that references this issue on Apr 23, 2019
fd46391
srl295

srl295 commented on Nov 19, 2019

@srl295
Member

2018(wu-xu) is the correct (linguistically) behavior. Using nodejs HEAD (v8 versoin 7.9.317.20-node.20 ) with a harmony flag:

v8 = 7.9.317.20-node.20
$  ./out/Release/node --harmony-intl-other-calendars -p 'new Date().toLocaleString("en-u-ca-chinese", {year: "numeric", era: "short"})'
2019(ji-hai)
$ ./out/Release/node --harmony-intl-other-calendars -p 'new Date().toLocaleString("en-u-ca-chinese", {year: "numeric"})'
2019(ji-hai)
srl295

srl295 commented on Nov 19, 2019

@srl295
Member

EDIT: Also, what if a language-script-region has a default calendar that is the Chinese calendar?

Thankfully https://github.com/unicode-cldr/cldr-core/blob/master/supplemental/calendarPreferenceData.json doesn't show any region which defaults to the Chinese calendar. So it's not (yet) an issue, at least when only considering CLDR data. 😄

http://demo.icu-project.org/icu4jweb/flexTest.jsp?pat=y&_=th Thai defaults to buddhist.

sffc

sffc commented on Jun 5, 2020

@sffc
Contributor

Is this issue fixed now that #349 is checked in? Can you split out remaining work into a new ticket?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    c: datetimeComponent: dates, times, timezoness: in progressStatus: the issue has an active proposal

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Participants

    @littledan@zbraniecki@srl295@sffc@anba

    Issue actions

      Clarifying the expected behavior for non-gregorian calendars in DateTimeFormat.formatToParts · Issue #225 · tc39/ecma402