Skip to content

Conversation

@robertbastian
Copy link
Member

@robertbastian robertbastian commented Dec 16, 2025

With the Japanese calendars split (#7334 ), there is some optimisation potential:

  • For Japanese, we don't need to store the DataPayload with the full map anymore. The five known eras are hardcoded in code anyway, so all we have to store in the calendar object is the potential next era. The start date can be packed, because the year does not need to be i32, we can store it relative to the year 2000. This makes both Japanese and AnyCalendar significantly smaller and Copy. However, as we only store one era, it requires clients to update ICU4X at least once per era. Given that the last eras were 30 and 62 years long, I think that is acceptable.
  • For Japanese, we don't need to use a lookup map in DTF anymore, we can now assign icu4x_era_indexes and use linear storage. This cuts down era name size by 20kB.

@sffc
Copy link
Member

sffc commented Dec 18, 2025

We intentionally kept Japanese data-driven since the new eras get announced with short notice and we want people to be able to pull them in if they use dynamic data without having to ship a library update.

@robertbastian
Copy link
Member Author

And? This is still data-driven.

@robertbastian robertbastian force-pushed the japanext2 branch 5 times, most recently from 6d26de8 to 3a25082 Compare December 19, 2025 14:15
@robertbastian robertbastian marked this pull request as ready for review December 19, 2025 19:31
@robertbastian robertbastian requested review from a team, Manishearth and sffc as code owners December 19, 2025 19:31
Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an interesting proposal. It means that at most 1 post-reiwa era will be available until the library is updated. The advantage is that the type is a bit smaller and faster.

My concern would be, what if there are two eras in fast succession? Like, if there is a new emperor who leaves power very quickly after ascending to power. In that case, we'd format dates correctly for the latest emperor, which is good, but we would "forget" about the previous emperor until the library is updated. Maybe that's fine?

@robertbastian
Copy link
Member Author

We could store two eras. Each era costs 11 bytes. All other "good" calendars are ZSTs, so this basically determines the size of any future enum calendar. 23 bytes might be fine, 34 bytes is probably too big. We have to weigh the risks; I personally don't expect Japan to have any kind of succession drama in the next 30 years - by then ICU4X will probably look different anyway.

@robertbastian robertbastian requested a review from sffc December 19, 2025 19:54
@sffc
Copy link
Member

sffc commented Dec 19, 2025

I think we should support either 1 or many data-driven eras. I don't see a really good reason to support 2 or 3. I'm not comfortable making a unilateral decision in a code review to support just 1.

@Manishearth
Copy link
Member

Manishearth commented Dec 19, 2025

I've been considering a change like this for a while, especially if we get rid of JapaneseExtended (then AnyCalendar can be cheaply cloned without an Arc). However, this design was a deliberate choice because we wanted to use ICU4X's data architecture for this. In retrospect: I'm not sure if we needed to, data that updates once every few decades does not need our data architecture. Our hardcoded calendar data is more likely to change than this.

Overall I think we should work on this with care and discussion. Optimizing datetimeformat is a good motivator, and fixing the Arc is a good motivator, but there are other ways to fix the Arc, and there may be other ways to optimize datetime format.

An early implementation of this type had the data being "empty" in cases where there were just five eras. Worth considering optimizations of that shape.

But yes, it would be nice to get rid of all AnyCalendar data loading completely.

@sffc
Copy link
Member

sffc commented Dec 19, 2025

This still does data loading, I think, it just picks off one value from the data struct instead of storing the whole data struct in order to reduce stack size.

@Manishearth
Copy link
Member

Yep, I'm idly wondering about designs that remove data loading entirely.

I think the "load data but only store the last era" is a clever fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants