Add support for Smithy bigInteger and bigDecimal types as string wrappers in aws-smithy-types, allowing users to parse with their preferred big number library.#4418
Conversation
rcoh
left a comment
There was a problem hiding this comment.
nice! we need to clean up those representation in aws-smithy-types so that are forward compatible with eventual improvements
| /// Returns the string representation. | ||
| pub fn as_str(&self) -> &str { | ||
| &self.0 | ||
| } |
There was a problem hiding this comment.
we shouldn't expose methods like this since they will probably be impossible to implement if we eventually switch to using a real internal representation that's not a string
There was a problem hiding this comment.
- Remove as_str() method - it's redundant and limiting
- Keep AsRef trait - works with any internal representation
- Update codegen to use .as_ref() instead of .as_str()
^^ Does this work?
There was a problem hiding this comment.
Addressed this in the latest revision of PR
|
|
||
| impl BigInteger { | ||
| /// Creates a new `BigInteger` from a string. | ||
| pub fn new(value: impl Into<String>) -> Self { |
There was a problem hiding this comment.
should implement FromStr instead
There was a problem hiding this comment.
Ok. Will use FromStr instead of new in next revision.
There was a problem hiding this comment.
Addressed this in the latest revision of PR
| when (val target = model.expectShape(memberShape.target)) { | ||
| is StringShape -> deserializeString(target) | ||
| is BooleanShape -> rustTemplate("#{expect_bool_or_null}(tokens.next())?", *codegenScope) | ||
| is BigIntegerShape -> deserializeBigInteger() |
There was a problem hiding this comment.
we probably need to support this for more than just json protocols. also need protocol tests. does smithy have any protocol tests for these yet?
There was a problem hiding this comment.
Ok. I will add serialization and deserialization code for XML, CBOR protocols in the next revision.
There was a problem hiding this comment.
does smithy have any protocol tests for these yet?
Protocol tests exist in misc.smithy but BigInteger/BigDecimal are commented out - https://github.com/smithy-lang/smithy-rs/blob/main/codegen-core/common-test-models/misc.smithy#L100. I will uncomment them now that the implementation is complete. However, it seems like misc.smithy only tests JSON. I will look at references and add protocol tests.
There was a problem hiding this comment.
Addressed this in the latest revision of PR
|
|
||
| private fun RustWriter.deserializeBigInteger() { | ||
| rustTemplate( | ||
| "#{expect_string_or_null}(tokens.next())?.map(|s| s.to_unescaped().map(|u| #{BigInteger}::new(u.into_owned()))).transpose()?", |
There was a problem hiding this comment.
Also many (all?) of the supported JSON protocols represent these as regular JSON numbers: https://smithy.io/2.0/aws/protocols/aws-json-1_0-protocol.html#shape-serialization
There was a problem hiding this comment.
Thanks for the reference. I will use expect_number_or_null instead of expect_string_or_null
There was a problem hiding this comment.
Addressed this in the latest revision of PR
| /// Consumes the `BigInteger` and returns the inner string. | ||
| pub fn into_inner(self) -> String { | ||
| self.0 | ||
| } |
There was a problem hiding this comment.
For the same reason as
if we eventually switch to using a real internal representation that's not a string
We could consider delaying adding this to leave an option for the future, unless this conversion is required right now.
There was a problem hiding this comment.
Ok. Will remove this.
There was a problem hiding this comment.
Addressed this in the latest revision of PR
343e932 to
019c154
Compare
|
|
||
| is TimestampShape -> rust("decoder.timestamp()") | ||
|
|
||
| is BigIntegerShape -> |
There was a problem hiding this comment.
Note to reviewers:
I have added serialization and parsing logic for the following protocols:
- JSON
- CBOR
- XML
- AWS Query
- AWS EC2
Let me know if there are any other protocols.
|
|
||
| impl Default for BigInteger { | ||
| fn default() -> Self { | ||
| Self("0".to_string()) |
There was a problem hiding this comment.
Question for reviewrs: Default Values for BigInteger/BigDecimal
I've implemented Default trait for both types to support error correction in client codegen:
Context:
ErrorCorrection.kt line 67 generates Some(Default::default()) for all NumberShape types, including BigInteger/BigDecimal, when required fields are missing during deserialization.
Are "0" and "0.0" appropriate defaults for error correction scenarios?
There was a problem hiding this comment.
we should match whatever we do for normal integers
There was a problem hiding this comment.
My thought process:
BigInteger/BigDecimal Default implementations match primitive number behavior:
i32::default() = 0
i64::default() = 0
f32::default() = 0
f64::default() = 0
u32::default() = 0
u64::default() = 0
i8::default() = 0
i16::default() = 0
BigInteger::default()returnsBigInteger("0")(string "0" representing zero)BigDecimal::default()returnsBigDecimal("0.0")(string "0.0" representing zero)
All number types default to their zero representation. BigInteger/BigDecimal use string storage for arbitrary precision, but semantically represent the same zero value as primitive numbers.
rcoh
left a comment
There was a problem hiding this comment.
Overall looks pretty good. We can decide how we want to handle large numbers in JSON -- currently as you have implemented there will be a loss of precision (but there is no inherent need for that since we control aws-smithy-json and can have it parse a number as a string directly.
| bodyMediaType: "application/xml", | ||
| headers: {"Content-Type": "application/xml"}, | ||
| params: { | ||
| bigInt: 987654321, |
There was a problem hiding this comment.
shouldn't we actually use numbers that don't fit into int/decimals?
There was a problem hiding this comment.
The test values are limited by Smithy's Java-based model parser, which converts numeric literals to Java Number types. When these are serialized back to strings, Java uses scientific notation for large values.
Example of the parser limitation:
params: {
bigDec: 123456789012345.123456789
}
After Smithy parses this, the codegen sees: 1.2345678901234512E14 (scientific notation with precision loss)
However, this is only a test limitation - the actual runtime code handles arbitrary precision correctly:
- XML/JSON input is parsed as strings from the wire
- BigDecimal/BigInteger use
FromStrto parse directly from those strings - Serialization writes the string back via
.as_ref() - No precision loss occurs in production code
There was a problem hiding this comment.
gotcha. then we should write an integration test for this against the generated code manually. You can do this by writing a test in kotlin that generates the service, then utilizes the serializers
There was a problem hiding this comment.
I have done this as part of https://github.com/smithy-lang/smithy-rs/pull/4418/files#diff-ae00c010398af1c58c9129aef4393407a833d72c73b2063ad28e7754d3ae4eedR385
let input = crate::test_input::BigNumberOpInput::builder().payload(
crate::test_model::BigNumberData::builder()
.big_int("12345678901234567890".parse().unwrap())
.big_dec("3.141592653589793238".parse().unwrap())
.build()
).build().unwrap();
let serialized = ${format(operationSerializer)}(&input.payload.unwrap()).unwrap();
let output = std::str::from_utf8(&serialized).unwrap();
assert!(output.contains("<bigInt>12345678901234567890</bigInt>"));
assert!(output.contains("<bigDec>3.141592653589793238</bigDec>"));
There was a problem hiding this comment.
I am going to remove this unit test which has numbers that don't fit into int/decimals in next revison of the PR
|
|
||
| is BigIntegerShape -> | ||
| rustTemplate( | ||
| "<#{BigInteger} as ::std::str::FromStr>::from_str(decoder.str()?.as_ref()).map_err(|_| #{Error}::custom(\"infallible\", decoder.position()))", |
There was a problem hiding this comment.
I assume this is what the spec said for CBOR?
There was a problem hiding this comment.
The current implementation uses decoder.str() / encoder.str() (CBOR text strings, Major Type 3) as a temporary approach.
According to the Smithy RPC v2 CBOR spec, BigInteger/BigDecimal should use:
- BigInteger: Major Type 6, tags 2 (unsigned bignum) or 3 (negative bignum)
- BigDecimal: Major Type 6, tag 4 (decimal fraction)
However, aws-smithy-cbor doesn't currently expose methods for these CBOR tags. The underlying minicbor library supports tags (used internally for timestamps), but we'd need to add public methods like encoder.bignum() and encoder.decimal() to properly implement the spec.
Any suggestions on how to address this? How do you recommend I proceed here?
There was a problem hiding this comment.
modify aws-smithy-cbor — you can find the source of it in this repo
There was a problem hiding this comment.
Need some direction:
The CBOR spec requires binary encoding (tags 2/3/4), but BigInteger/BigDecimal are string wrappers to avoid choosing a bignum library.
Should we:
- Keep current text string encoding (non-compliant but simple)
- Document that BigInteger/BigDecimal don't work with CBOR
- Add
num-bigintdependency toaws-smithy-cborfor spec compliance
There was a problem hiding this comment.
Adding a dependency to aws-smithy-cbor seems OK.
Option 1 is a non-starter — we can't have non-compliant code in smithy-rs.
I would prefer 3, but for simplicity, we could have 2 (but it must FAIL to codegen at runtime, it can't be only a documented feature).
There was a problem hiding this comment.
Going ahead with Option 2 - failing codegen at runtime
| ) | ||
| is BigDecimalShape -> | ||
| rustTemplate( | ||
| "<#{BigDecimal} as ::std::str::FromStr>::from_str(decoder.str()?.as_ref()).map_err(|_| #{Error}::custom(\"infallible\", decoder.position()))", |
There was a problem hiding this comment.
I don't think this error is infallible is it? wouldn't this happen if the string wasn't a valid big decimal? this error seems worth preserving?
There was a problem hiding this comment.
Should we validate the string is a valid number format?
Options:
- Keep infallible - Accept any string, let users validate when they parse it
- Add validation - Check the string is a valid number format, return error if not
What do you recommend?
There was a problem hiding this comment.
ah I see...a bit of a can of worms. I forgot we were basically doing nothing with the numeric values. We can punt this for now.
|
|
||
| rustTemplate( | ||
| """ | ||
| #{expect_number_or_null}(tokens.next())? |
There was a problem hiding this comment.
I think do actually do this properly you need to add some additional code to aws-smithy-json to parse a number as a string? not sure how hard that would be.
As it is, this isn't terrible, but its not ideal since it defeats the point
|
|
||
| impl Default for BigInteger { | ||
| fn default() -> Self { | ||
| Self("0".to_string()) |
There was a problem hiding this comment.
we should match whatever we do for normal integers
aacd195 to
0f0fecf
Compare
| let s = format!("{f}"); | ||
| // f64 formatting drops ".0" for whole numbers (0.0 -> "0") | ||
| // Restore it to preserve that the original JSON had decimal notation | ||
| if !s.contains('.') && !s.contains('e') && !s.contains('E') { | ||
| format!("{s}.0") | ||
| } else { | ||
| s | ||
| } |
There was a problem hiding this comment.
is this correct? I definitely want to see some tests for this.
| let xml = br##"<BigNumberData> | ||
| <bigInt>12345678901234567890</bigInt> | ||
| <bigDec>3.141592653589793238</bigDec> | ||
| </BigNumberData> | ||
| "##; | ||
| let output = ${format(operationParser)}(xml, test_output::BigNumberOpOutput::builder()).unwrap().build(); | ||
| assert_eq!(output.big_int.as_ref().map(|v| v.as_ref()), Some("12345678901234567890")); | ||
| assert_eq!(output.big_dec.as_ref().map(|v| v.as_ref()), Some("3.141592653589793238")); | ||
| """, |
There was a problem hiding this comment.
nice test! please add something similar for JSON
There was a problem hiding this comment.
bump on this test — we need a test that actually tests that we are preserving E2E precision with JSON.
There was a problem hiding this comment.
Agreed. Adding a full end to end Kotlin test for XML protocol that actually serializes and deserializes big numbers in the next commit.
| operations: [ProcessBigNumbers] | ||
| } | ||
|
|
||
| @http(uri: "/process", method: "POST") |
There was a problem hiding this comment.
there appears to be some code that handles E / scientific notation but I don't see any tests of that here
There was a problem hiding this comment.
Adding tests for these in the next commit.
…bitrary precision for BigInteger/BigDecimal (#4444) ## Motivation and Context Currently, `expect_number_or_null()` parses JSON numbers through this flow: 1. JSON string `"9007199254740993"` → parsed to `u64` → stored in `Number::PosInt(9007199254740993)` 2. Later converted to `f64` for certain operations → **precision lost** (f64 has only 53 bits of precision) 3. Converted back to string → `"9007199254740992"` (wrong value!) `expect_number_or_null()` converts JSON numbers to `u64`/`i64`/`f64`, which causes precision loss for numbers larger than these types can represent. This defeats the purpose of BigInteger/BigDecimal support which are meant to handle arbitrarily large numbers without precision loss. This commit addresses comments #4418 ## Description Adds `expect_number_as_string_or_null()` function to `aws-smithy-json` that: - Extracts JSON numbers as strings without intermediate numeric conversion - Uses the `offset` from `Token::ValueNumber` to extract the raw number string from the original JSON input - Preserves arbitrary precision for BigInteger and BigDecimal ## Testing - Added comprehensive tests for various number formats (large integers, decimals, scientific notation) - Added error case tests (string, boolean, object, array tokens) - All tests pass ## Checklist - [x] For changes to the smithy-rs codegen or runtime crates, I have created a changelog entry Markdown file in the `.changelog` directory, specifying "client," "server," or both in the `applies_to` key. --------- Co-authored-by: Amit Kulkarni <kulami@amazon.com> Co-authored-by: Landon James <lnj@amazon.com>
…pers in aws-smithy-types, allowing users to parse with their preferred big number library.
…eger/BigDecimal precision
0f0fecf to
20e566b
Compare
rcoh
left a comment
There was a problem hiding this comment.
Must fix:
- Must validate inputs in to BigInteger / BigDecimal because we are raw-writing them into JSON. We may want to also improve that API to safe guard that we are only writing "safe" characters?
- Must add a Kotlin test that validates we successfully round trip large values through the serializers (since the protocol tests do not) Can the protocol tests use a number that is actually out of range?
- Few other more minor inline comments
Thanks for your continued hard work on this!
| bodyMediaType: "application/json", | ||
| headers: {"Content-Type": "application/json"}, | ||
| params: { | ||
| bigInt: 9007199254740991, |
There was a problem hiding this comment.
this isn't larger than u64::max — I guess that's a protocol test limitation? This is 2^53-1 (max safe integer in Javascript), so we aren't really testing that big integers work (e.g. this code would pass even without your changes right?)
| rustBlockTemplate( | ||
| "pub(crate) fn $fnName(value: &[u8], ${unusedMut}mut builder: #{Builder}) -> #{Result}<#{Builder}, #{Error}>", | ||
| """ | ||
| ##[allow(unused)] |
There was a problem hiding this comment.
why allow(unused)? parsers should only be generated when they are actually used.
There was a problem hiding this comment.
When I was running one build commands, compilation was failing. Rust was complaining that some of the variuables were unused. Therefore, I had added this exception. Will try to reproduce this and add more details here.
| rustTemplate( | ||
| """ | ||
| // Alias for nested parsers that expect `input` parameter name | ||
| let input = value; |
There was a problem hiding this comment.
can you just change your parser to match all the other ones and use value?
There was a problem hiding this comment.
The nested parsers are called with input as the second parameter. Since the top-level function has value as its parameter, I created the alias let input = value; so we could pass input
to the nested parsers. I will try to incorporate this review comment.
| rustTemplate( | ||
| """ | ||
| #{expect_number_as_string_or_null}(tokens.next(), input)? | ||
| .map(|s| <#{BigInteger} as ::std::str::FromStr>::from_str(s).expect("infallible")) |
There was a problem hiding this comment.
We need to make this fallible — see other comments.
There was a problem hiding this comment.
Agreed. Here is my plan for the next commit:
-
Add validation function
is_valid_number_string()that only allows valid JSON number characters:- Digits:
0-9 - Signs:
-,+ - Decimal:
. - Scientific notation:
e,E - Rejects JSON special characters: quotes, commas, braces, brackets, etc.
- Digits:
-
Create proper error type:
#[derive(Debug, Clone, PartialEq, Eq)] #[non_exhaustive] pub enum BigNumberError { InvalidFormat(String), }
| // Infallible because any string is valid - we just store it without validation | ||
| type Err = std::convert::Infallible; |
There was a problem hiding this comment.
this is a semver hazard — it should return an enum marked with #[non_exhaustive] so an error could be added in the future
There was a problem hiding this comment.
Understood. Adding the below in the next revision/commit
#[derive(Debug, Clone, PartialEq, Eq)]
#[non_exhaustive]
pub enum BigNumberError {
InvalidFormat(String),
}
| impl From<String> for BigInteger { | ||
| fn from(value: String) -> Self { | ||
| Self(value) | ||
| } | ||
| } |
There was a problem hiding this comment.
this impl is probably too hazardous to keep
There was a problem hiding this comment.
Here is my understanding of what this comment means:
Fromtrait must always succeed (infallible)- With validation, invalid strings should return errors, not panic
- Panicking in
Fromis unexpected and dangerous - Users should use
FromStrinstead, which is properly fallible
^^ Assuming that this is right, I am going to remove From<String> implementations for BigInteger and BigDecimal. Users must now use FromStr::from_str() which properly returns Result<T, BigNumberError>.
| #[derive(Debug, Clone, PartialEq, Eq, Hash)] | ||
| pub struct BigInteger(String); | ||
|
|
||
| impl BigInteger {} |
There was a problem hiding this comment.
empty impl block does nothing
| impl BigInteger {} |
There was a problem hiding this comment.
Agreed. Removing this in next revision/commit
| "$writer.write_raw_value(${value.name}.as_ref());", | ||
| *codegenScope, | ||
| ) | ||
| is BigDecimalShape -> | ||
| rustTemplate( | ||
| "$writer.write_raw_value(${value.name}.as_ref());", | ||
| *codegenScope, |
There was a problem hiding this comment.
hmm...there is actually a vulnerability here or at least the possibility to introduce invalid JSON — we are not validating the input to BigInteger and BigDecimal and then we're writing them untrusted directly into the JSON.
We need to validate that they are valid before storing them.
There was a problem hiding this comment.
Understood.
Introducing a simple validation function here:
fn is_valid_number_string(s: &str) -> bool {
if s.is_empty() {
return false;
}
s.chars().all(|c| matches!(c, '0'..='9' | '-' | '+' | '.' | 'e' | 'E'))
}
^^ These are all the valid characters that I could think of in any BigNumber
And using it as:
impl std::str::FromStr for BigInteger {
type Err = BigNumberError;
fn from_str(s: &str) -> Result<Self, Self::Err> {
if !is_valid_number_string(s) {
return Err(BigNumberError::InvalidFormat(s.to_string()));
}
Ok(Self(s.to_string()))
}
}
There was a problem hiding this comment.
big integer is only numbers — the larger set is only for BigDecimal
| let xml = br##"<BigNumberData> | ||
| <bigInt>12345678901234567890</bigInt> | ||
| <bigDec>3.141592653589793238</bigDec> | ||
| </BigNumberData> | ||
| "##; | ||
| let output = ${format(operationParser)}(xml, test_output::BigNumberOpOutput::builder()).unwrap().build(); | ||
| assert_eq!(output.big_int.as_ref().map(|v| v.as_ref()), Some("12345678901234567890")); | ||
| assert_eq!(output.big_dec.as_ref().map(|v| v.as_ref()), Some("3.141592653589793238")); | ||
| """, |
There was a problem hiding this comment.
bump on this test — we need a test that actually tests that we are preserving E2E precision with JSON.
| rustBlockTemplate( | ||
| """ | ||
| pub(crate) fn $fnName<'a, I>(tokens: &mut #{Peekable}<I>) -> #{Result}<Option<#{ReturnType}>, #{Error}> | ||
| ##[allow(unused)] |
…nerability; Make FromStr fallible with non-exhaustive error enum; Remove hazardous From<String> implementations; Use _value parameter consistently and remove unnecessary #[allow(unused)] attributes; Add integration tests for E2E precision preservation; Implement NaN saturation for values > f64::MAX
rcoh
left a comment
There was a problem hiding this comment.
I think we're really close here!
| is BigDecimalShape -> { | ||
| val value = data.toString() | ||
| rustTemplate( | ||
| "<#{BigDecimal} as ::std::str::FromStr>::from_str(${value.dq()}).unwrap()", |
There was a problem hiding this comment.
| "<#{BigDecimal} as ::std::str::FromStr>::from_str(${value.dq()}).unwrap()", | |
| "<#{BigDecimal} as ::std::str::FromStr>::from_str(${value.dq()}).expect("invalid string for BigDecimal")", |
| is BigIntegerShape -> { | ||
| val value = data.toString() | ||
| rustTemplate( | ||
| "<#{BigInteger} as ::std::str::FromStr>::from_str(${value.dq()}).unwrap()", |
There was a problem hiding this comment.
| "<#{BigInteger} as ::std::str::FromStr>::from_str(${value.dq()}).unwrap()", | |
| "<#{BigInteger} as ::std::str::FromStr>::from_str(${value.dq()}).expect("Invalid string for big integer")", |
| // (binary bignum representation), but aws-smithy-cbor doesn't implement these tags yet. | ||
| is BigIntegerShape -> | ||
| throw CodegenException( | ||
| "BigInteger is not supported with Concise Binary Object Representation (CBOR) protocol", |
There was a problem hiding this comment.
is there an open ticket for this? If not, please open one and then link to the ticket in the error
There was a problem hiding this comment.
#4473 - Linking this in the error as well
| rustBlockTemplate( | ||
| "pub(crate) fn $fnName(value: &[u8], ${unusedMut}mut builder: #{Builder}) -> #{Result}<#{Builder}, #{Error}>", | ||
| """ | ||
| pub(crate) fn $fnName(_value: &[u8], ${unusedMut}mut builder: #{Builder}) -> #{Result}<#{Builder}, #{Error}> |
There was a problem hiding this comment.
I'm really confused — this was value before...why did you need to make it _value?
There was a problem hiding this comment.
ah—I see the issue. Nested functions need access to it now that previously didn't have access to it at all.
| "$writer.write_raw_value(${value.name}.as_ref());", | ||
| *codegenScope, | ||
| ) | ||
| is BigDecimalShape -> | ||
| rustTemplate( | ||
| "$writer.write_raw_value(${value.name}.as_ref());", | ||
| *codegenScope, |
There was a problem hiding this comment.
big integer is only numbers — the larger set is only for BigDecimal
| .and_then(|f| { | ||
| must_be_finite(f).map_err(|_| self.error_at(start, InvalidNumber)) | ||
| })?, | ||
| .map(|f| if f.is_finite() { f } else { f64::NAN })?, |
There was a problem hiding this comment.
you need to change expect_number to move the check for finite-ness there I think?
|
|
||
| /// Validates that a string contains only valid JSON number characters. | ||
| /// Prevents JSON injection by rejecting strings with quotes, commas, braces, etc. | ||
| fn is_valid_number_string(s: &str) -> bool { |
There was a problem hiding this comment.
integers have just 0..9 — probably want a tighter restriction there
There was a problem hiding this comment.
Agreed. Miss on my part. Changing this in next revision/commit.
…idation; Move finite check to token consumer; Use expect() in Instantiator
|
|
||
| // Check first character (can be sign or digit) | ||
| match chars.next() { | ||
| Some('-') | Some('+') | Some('0'..='9') => {} |
There was a problem hiding this comment.
can you really proceed a number with +?
| // Values exceeding f64::MAX should tokenize successfully with NaN | ||
| // to support BigInteger/BigDecimal arbitrary precision types | ||
| let expect_nan = |input| { | ||
| fn out_of_range_floats_produce_infinity() { |
There was a problem hiding this comment.
did this change to infinity? I thought we chose nan?
There was a problem hiding this comment.
ah, I see. the behavior of the parser is Infinity, that's fine.
There was a problem hiding this comment.
Is +/- infinity a valid value for Smithy bigInteger or bigDecimal shapes ? (e.g. infinity or -infinity)?
|
A new generated diff is ready to view.
A new doc preview is ready to view. |
3b26323 to
df1dd7e
Compare
…lkarni23/smithy-rs into add-biginteger-bigdecimal-support
…lkarni23/smithy-rs into add-biginteger-bigdecimal-support
Motivation and Context
Fixes #312
Smithy defines
bigIntegerandbigDecimaltypes for arbitrary-precision numbers, but smithy-rs had TODO placeholders instead of implementations. This prevented users from working with services that use these types.Description
BigIntegerandBigDecimalruntime types inaws-smithy-typesas string wrappersnum-bigint,rust_decimal)Testing
SymbolVisitorTest.ktbig-numbers.smithywith protocol testsChecklist
.changelogdirectory, specifying "client," "server," or both in theapplies_tokey.By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.