Skip to content

Comments

Correctly check the RD range for Date::try_from_fields and Date::try_add*#7676

Merged
Manishearth merged 18 commits intounicode-org:mainfrom
Manishearth:range-checks
Feb 24, 2026
Merged

Correctly check the RD range for Date::try_from_fields and Date::try_add*#7676
Manishearth merged 18 commits intounicode-org:mainfrom
Manishearth:range-checks

Conversation

@Manishearth
Copy link
Member

@Manishearth Manishearth commented Feb 20, 2026

Progress on #7076

This does a bunch of things:

  • Adds GENEROUS_YEAR_RANGE, which is a year range that is slightly wider than VALID_RD_RANGE for all calendars, "good enough" for early checks.
  • Uses GENEROUS_YEAR_RANGE as a precondition in try_from_fields, and also checks VALID_RD_RANGE as a postcondition.
  • Tests for try_from_fields.
  • Adds (internal) YearOverflowError used for internal functions which can only error when out of year range
  • Changes (internal) DateFieldsResolver::year_info_from_era to extended_year_from_era_year. This splits its responsibilities more cleanly: it doesn't have to bother with generating a YearInfo (which is risky), it just adjusts the era year.
  • Adds a (internal) DateFieldsResolver::year_info_from_extended_checked which checks the generous range and uses it in from_fields and arithmetic code.
  • Adds DateAddError based on consensus in Consider narrower error type between try_new_from_codes and try_from_fields #7010.
  • Updates the fuzzer to remove guards around addition

If there are issues in DateAddError or the test, I would prefer to work on them separately, since this PR is getting pretty large.

) -> Result<Self, DateAddError> {
// We preemptively protect ourselves from overly-large values
//
// This is mostly needed for avoiding overflows on Duration arithmetic,
Copy link
Member Author

@Manishearth Manishearth Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open to different approaches here. It feels a bit strange to have all of this nice infrastructure range-checking years and we still need a coarse preemptive check because the duration math overflows.

One thing we could do is have surpasses_inner returning a YearOverflowError, and surpasses() returning a bool where the error is turned into true.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know this code enough to comment on that. However, if you're establishing an invariant here, propagate it in documentation to the relevant functions (I guess add_year_to, month_from_ordinal?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the "coarse preemptive check" should be the only check, until the final RD check at the end.

) -> Result<Self, DateAddError> {
// We preemptively protect ourselves from overly-large values
//
// This is mostly needed for avoiding overflows on Duration arithmetic,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know this code enough to comment on that. However, if you're establishing an invariant here, propagate it in documentation to the relevant functions (I guess add_year_to, month_from_ordinal?)

base_month,
DateFromFieldsOptions::from_add_options(options),
)
.map_err(|e| {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer this as well

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? In this case the errors map cleanly. If the errors stop mapping cleanly we can introduce a new match the way we did before.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is less readable if you're trying to figure out how an error was produced. rust-analyzer cannot tell you which ?s implicitly use the From implementation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's just a general problem with Rust errors: that's an argument against using ? period. These errors are going to go through ? a half dozen more times before hitting the user. Changing that here won't do anything.

There are Rust error libraries that will attach context to ? for you but they're heavier so we don't use them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's just a general problem with Rust errors: that's an argument against using ? period.

yep. doesn't mean it's not an argument here.

These errors are going to go through ? a half dozen more times before hitting the user.

that's not our problem. they don't go through another ? before being returned from our public API, and appearing in our unit tests where it's often needed to figure out where they're from

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, hold on, I already fixed this instance of this pattern because the errors do not map cleanly.. I thought you were talking about the other instance of this pattern.

But still, I disagree that we should avoid ?. It's annoying in unit tests, but it's also annoying to wade through extra match statements.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well it's not an extra match statement, the MonthError to LunisolarDateError conversion only happens in once, let's just keep it there and not move it to another file

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so you are talking about the other instance of this pattern.

It's an extra match statement within complicated code. Moving the match statement elsewhere is better. All of the calendar arithmetic/construction code is already messy.

I could introduce a .map_err(MonthError::into)? if you really want but I still think that is kind of silly. The Rust community has pretty universally settled on the style of using ? wherever it works and I don't plan on changing how I use ? unless you argue for and land a style rule for it.

cal,
))
)?;
// We early checked for a generous range of years, now we must check
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, here we go. move this check into new_balanced as that calls ArithmeticDate::new_unchecked and violates its invariant

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot, new_balanced must be able to produce slightly-out-of-range years.

I can't see a way to make the code work without it. I documented this new invariant on new_balanced, and I'm about to push a commit that documents it more. I also updated the documentation on new_unchecked with the new invariant.

When I wrote the tests my core desired property was that for arithmetic should work as long as the input and output were in RD range. The arithmetic algorithm needs to be able to temporarily "peek" out of range to do so, see the end_of_month calculation here (and basically all of the surpasses stuff used in until). So new_balanced needs to work for near-edge dates on either side of the range boundary, which means we cannot have it follow the strict RD range.

If we want to give up that property we need to more carefully figure out what the desired behavior is here. I find that property to be rather sensible and nice to have.

We could add new_unchecked_with_assertion that debug asserts in all of the other call sites that it is within RD range. But I don't really think it's necessary.

/// use icu::calendar::options::{DateAddOptions, Overflow};
///
/// // Hebrew year 5784 is a leap year, 5785 is not.
/// // Adar I (the leap month) is month 5 in a leap year.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't it month 6?

#[inline]
fn year_info_from_extended(&self, extended_year: i32) -> Self::YearInfo {
debug_assert!(crate::calendar_arithmetic::VALID_YEAR_RANGE.contains(&extended_year));
debug_assert!(crate::calendar_arithmetic::GENEROUS_YEAR_RANGE.contains(&extended_year));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought: This check doesn't really achieve anything since GENEROUS_YEAR_RANGE is bigger than the RD range. I guess it's harmless

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this was added deliberately because previously it was quite haphazard, it's nice to have a check right before we call the tricky calendrical APIs.

let extended_year = if let Some(era) = era {
calendar.extended_year_from_era_year(
era.as_bytes(),
range_check(year, "era_year", CONSTRUCTOR_YEAR_RANGE)?,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: I think this one should check GENEROUS_YEAR_RANGE because it seems really weird for era year and extended year ranges to be double-checked. It results in a smaller effective range. This is especially noticeable in a calendar where the era and extended years are not the same, like Ethiopian with input era "aa"; only ~15000 dates can be constructed instead of ~20000 dates when the era and extended year are the same.

Alternatively, don't check the extended year range if the input was an era year.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is existing documented behavior (on from_codes). Separate discussion to change if desired.

Comment on lines +500 to +506
let rd = C::to_rata_die_inner(year, month, day);

// We early checked for a generous range of years, now we must check
// to ensure we are actually in range for our core invariant.
if !VALID_RD_RANGE.contains(&rd) {
return Err(DateFromFieldsError::Overflow);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: I don't love converting to RD just to check the RD range. I would rather have a calendar-specific check. This function is already generic in the calendar. (ok for follow-up)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really wish to maintain calendar-specific range checking code, it's too easy to get wrong.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine for follow-up. to_rata_die is slow and we otherwise don't need it. Maintaining a const calendar-specific min and max date where we use Ord to check if in-range is fine and better.

year, // == year_info.to_extended_year() + offset
"year",
(VALID_YEAR_RANGE.start() + offset)..=(VALID_YEAR_RANGE.end() + offset),
(CONSTRUCTOR_YEAR_RANGE.start() + offset)..=(CONSTRUCTOR_YEAR_RANGE.end() + offset),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue (here and elsewhere, similar to above): I really don't like checking the year against CONSTRUCTOR_YEAR_RANGE multiple times

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preexisting behavior, often (but not always?) documented. @robertbastian introduced this I believe.

Comment on lines 580 to 581
resolved_year =
cal.year_info_from_extended_checked(resolved_year.to_extended_year() - 1)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: I thought we were going to make sure the input to this function (new_balanced) was such that we never cause calendar-specific integer overflow. A precondition should be that year is in the generous range, ordinal_month does not exceed the generous range converted to months, and day does not exceed the generous range converted to days.

) -> Result<Self, DateAddError> {
// We preemptively protect ourselves from overly-large values
//
// This is mostly needed for avoiding overflows on Duration arithmetic,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the "coarse preemptive check" should be the only check, until the final RD check at the end.

@Manishearth Manishearth requested a review from sffc February 23, 2026 23:17
@Manishearth
Copy link
Member Author

Manishearth commented Feb 23, 2026

Regarding the coarse presumtpive checks and the other checks:

My main worry is that YearInfo calculations should never get a year that is too large for them to handle, and should be able to preemptively assert that they get a good year. I don't want calendar-specific code to need to get into the business of worrying too much about range.

My first commit here introduced SAFE_YEAR_RANGE, which was "the year range that is safe for all internal arithmetic. I got rid of it, though, since it would need to be much larger to work in the arithmetic space. I don't like expanding the safe year range too much (we have previously seen fuzzer issues from years in the millions. This might have since been fixed.)

I also want to limit overly-slow APIs. #7077 is the issue for fixing that, and I'm hoping to work on it, but it's not a release blocker.

I'm happy to make a new PR that reintroduces that change. It ought to simplify the code significantly. But I'd like to land this first.

I made the change. It's a separate commit. If people later disagree about this architecture, please open a PR with a revert of that commit.

@Manishearth Manishearth merged commit 7054d89 into unicode-org:main Feb 24, 2026
32 checks passed
@Manishearth Manishearth deleted the range-checks branch February 24, 2026 01:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants