Add expect_number_as_string_or_null in aws-smithy-json to preserve arbitrary precision for BigInteger/BigDecimal#4444
Conversation
…Extracts JSON numbers as strings without converting to u64/i64/f64, preventing precision loss for BigInteger/BigDecimal support
| ) -> Result<Option<&'a str>, Error> { | ||
| match token.transpose()? { | ||
| Some(Token::ValueNull { .. }) => Ok(None), | ||
| Some(Token::ValueNumber { offset, .. }) => { |
There was a problem hiding this comment.
Note to reviewers
My reference for adding this is that the JSON validator has already validated that this is a number - https://github.com/smithy-lang/smithy-rs/blob/main/rust-runtime/aws-smithy-json/src/deserialize.rs#L285
| [package] | ||
| name = "aws-smithy-json" | ||
| version = "0.61.8" | ||
| version = "0.61.9" |
There was a problem hiding this comment.
Question for reviewers for my understanding:
- How does this version get published? Is there a scheduled release cycle or does this automatically get published when the PR is merged?
- How does it get published to the Rust registry? - https://crates.io/crates/aws-smithy-json
- Will the version bump in
aws-smithy-jsonautomatically trigger version bumps in crates that depend on it (like generated SDK crates), or is that handled separately?
There was a problem hiding this comment.
a manual action is required by the SDK team but its basically automated. you can use this code in this repository (e.g. we can work on / merge your other PR)
we don't have to wait for it to be merged
aws-smithy-json to preserve arbitrary precision for BigInteger/BigDecimal
aws-smithy-json to preserve arbitrary precision for BigInteger/BigDecimal
rcoh
left a comment
There was a problem hiding this comment.
Nice! Lets add some property/fuzz testing to validate your work more thoroughly
| ); | ||
| } | ||
|
|
||
| #[test] |
There was a problem hiding this comment.
lets also do property / fuzz testing to validate this against a the value that you wrote into the input
There was a problem hiding this comment.
Added more tests and found that the parser itself rejected the numbers before creating a token. The expect_number_as_string_or_null function then panicked. Looking at expect_number to see if we can modify it to make it more lenient.
…e wrapping_neg() with explicit range checks for clarity. Add test_integer_boundaries and property-based tests for full u64/i64 ranges and overflow cases.
| match u64::from_str(&number_str[1..]) { | ||
| Ok(positive) => { | ||
| // Check if the positive value fits in i64's negative range | ||
| if positive <= i64::MAX as u64 { | ||
| Number::NegInt(-(positive as i64)) | ||
| } else if positive == (i64::MAX as u64) + 1 { | ||
| // Special case: i64::MIN | ||
| Number::NegInt(i64::MIN) | ||
| } else { | ||
| // Too large for i64, use f64 | ||
| Number::Float(-(positive as f64)) | ||
| } | ||
| } | ||
| Err(_) => { | ||
| // Number too large for u64, parse as f64 (may be infinity) | ||
| Number::Float( | ||
| f64::from_str(number_str) | ||
| .map_err(|_| self.error_at(start, InvalidNumber))?, | ||
| ) | ||
| } |
There was a problem hiding this comment.
This allows overflow to f64, enabling expect_number_as_string_or_null() to extract the original string for BigInteger/BigDecimal. Not a breaking change: more permissive behavior, public API unchanged, existing code paths unaffected
| fn large_integers_overflow_to_float( | ||
| // u64::MAX = 18_446_744_073_709_551_615 (20 digits) | ||
| // Generate numbers with 21+ digits to guarantee overflow | ||
| num_str in "1[0-9]{20,49}" | ||
| ) { | ||
| let input_bytes = num_str.as_bytes(); | ||
| let mut iter = json_token_iter(input_bytes); | ||
|
|
||
| match iter.next() { | ||
| Some(Ok(Token::ValueNumber { value: Number::Float(f), .. })) => { | ||
| prop_assert!(f.is_finite()); | ||
| prop_assert!(f > 0.0); | ||
| } | ||
| other => { | ||
| return Err(proptest::test_runner::TestCaseError::fail( | ||
| format!("Expected Float for large number, got {:?}", other) | ||
| )); | ||
| } | ||
| } |
There was a problem hiding this comment.
this test should also validate that the large int is actually correctly extracted by your expect_number_as_string code
| } | ||
|
|
||
| let number_slice = &input[start..end]; | ||
| let number_str = unsafe { std::str::from_utf8_unchecked(number_slice) }; |
There was a problem hiding this comment.
there's no reason to use unchecked here, since this isn't a performance sensitive API. just use from_utf8
rcoh
left a comment
There was a problem hiding this comment.
only a couple of questions, basicaly LGTM but since we are altering the happy path parsing, we need to be very careful
… validation to large_integers_overflow_to_float test
This PR merges the latest main branch into feature/http-1.x and resolves all CI failures that resulted from the merge.
Changes from Main
This merge brings in recent work from main, including:
Token bucket success rewards and time-based refill (
Add token bucket success rewards, time based refill, fractional tokens #4450)
Added expect_number_as_string_or_null for BigInteger/BigDecimal support (
Add expect_number_as_string_or_null in aws-smithy-json to preserve arbitrary precision for BigInteger/BigDecimal #4444)
Lockfile synchronization (
Run cargo update on the runtime lockfiles and the SDK lockfile #4445)
Free disk space action for CI (
Add free-disk-space to acquire base images that were missing it #4448)
Layer issue fixes (
Kicking the can down the road on fixing our layer issue #4456)
Release 1.x.y merge (
Merge smithy-rs-release-1.x.y into main #4455)
Fixes Applied (Post-Merge)
After merging main, the following changes were needed to restore CI:
Dependency Version Updates
http-1x made non-optional in aws-inlineable, aws-types, and aws-smithy-types (commit: 6709ac8)
Changed from { version = "1", optional = true } to { version = "1.3.1" } (non-optional)
Pinned exact versions for minimal-versions check (commit: f4ca6fb)
http-1x: 1 → 1.3.1
http-body-1x: 1 → 1.0.1
http-body-util: 0.1 → 0.1.3
hyper: 1.8 → 1.6
tokio: 1.40 → 1.46
This ensures the minimal-versions CI check passes with compatible minimum versions
Version bumps due to cargo minimal-check requirements (commits: 1dad092, 6580c42)
Server & Example Updates
Upgraded examples to use http/hyper@1 (commit: a8706b7)
Python server requires legacy http server (commit: 7d79f12)
Created legacy integration tests (commit: 744483c)
Updated legacy-http with required changes (commit: d513709)
Codegen Changes
Added request/response 1x types in RuntimeType.kt (commit: eb11df8)
Marked SDK with -http0x suffix to make errors obvious (commit: ececd4a)
Fixed includeFluentClient server configuration (commit: 54c16a5)
Housekeeping
Fixed all clippy warnings (commits: ee82703, e80902c, cf2257a)
Fixed all cargo fmt issues (commits: 0449c32, daa8aa2)
Updated lockfiles (commits: 427293e, 30d6700, a7f9de8)
Removed duplicate code (commit: d6ec3f5)
Added sync_wrapper to false-positives (commit: 4cdc7a6)
Motivation and Context
Currently,
expect_number_or_null()parses JSON numbers through this flow:"9007199254740993"→ parsed tou64→ stored inNumber::PosInt(9007199254740993)f64for certain operations → precision lost (f64 has only 53 bits of precision)"9007199254740992"(wrong value!)expect_number_or_null()converts JSON numbers tou64/i64/f64, which causes precision loss for numbers larger than these types can represent. This defeats the purpose of BigInteger/BigDecimal support which are meant to handle arbitrarily large numbers without precision loss.This commit addresses comments #4418
Description
Adds
expect_number_as_string_or_null()function toaws-smithy-jsonthat:offsetfromToken::ValueNumberto extract the raw number string from the original JSON inputTesting
Checklist
.changelogdirectory, specifying "client," "server," or both in theapplies_tokey.