-
Notifications
You must be signed in to change notification settings - Fork 244
Optimise the Japanese calendar
#7323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
0c70d86 to
578eda7
Compare
81161a7 to
076477a
Compare
|
We intentionally kept Japanese data-driven since the new eras get announced with short notice and we want people to be able to pull them in if they use dynamic data without having to ship a library update. |
|
And? This is still data-driven. |
6d26de8 to
3a25082
Compare
3a25082 to
2d248bb
Compare
sffc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's an interesting proposal. It means that at most 1 post-reiwa era will be available until the library is updated. The advantage is that the type is a bit smaller and faster.
My concern would be, what if there are two eras in fast succession? Like, if there is a new emperor who leaves power very quickly after ascending to power. In that case, we'd format dates correctly for the latest emperor, which is good, but we would "forget" about the previous emperor until the library is updated. Maybe that's fine?
|
We could store two eras. Each era costs 11 bytes. All other "good" calendars are ZSTs, so this basically determines the size of any future enum calendar. 23 bytes might be fine, 34 bytes is probably too big. We have to weigh the risks; I personally don't expect Japan to have any kind of succession drama in the next 30 years - by then ICU4X will probably look different anyway. |
|
I think we should support either 1 or many data-driven eras. I don't see a really good reason to support 2 or 3. I'm not comfortable making a unilateral decision in a code review to support just 1. |
|
I've been considering a change like this for a while, especially if we get rid of JapaneseExtended (then AnyCalendar can be cheaply cloned without an Arc). However, this design was a deliberate choice because we wanted to use ICU4X's data architecture for this. In retrospect: I'm not sure if we needed to, data that updates once every few decades does not need our data architecture. Our hardcoded calendar data is more likely to change than this. Overall I think we should work on this with care and discussion. Optimizing datetimeformat is a good motivator, and fixing the Arc is a good motivator, but there are other ways to fix the Arc, and there may be other ways to optimize datetime format. An early implementation of this type had the data being "empty" in cases where there were just five eras. Worth considering optimizations of that shape. But yes, it would be nice to get rid of all AnyCalendar data loading completely. |
|
This still does data loading, I think, it just picks off one value from the data struct instead of storing the whole data struct in order to reduce stack size. |
|
Yep, I'm idly wondering about designs that remove data loading entirely. I think the "load data but only store the last era" is a clever fix. |
With the Japanese calendars split (#7334 ), there is some optimisation potential:
Japanese, we don't need to store theDataPayloadwith the full map anymore. The five known eras are hardcoded in code anyway, so all we have to store in the calendar object is the potential next era. The start date can be packed, because the year does not need to bei32, we can store it relative to the year 2000. This makes bothJapaneseandAnyCalendarsignificantly smaller andCopy. However, as we only store one era, it requires clients to update ICU4X at least once per era. Given that the last eras were 30 and 62 years long, I think that is acceptable.Japanese, we don't need to use a lookup map in DTF anymore, we can now assignicu4x_era_indexes and use linear storage. This cuts down era name size by 20kB.