feat(edecimal): extend parse to decimal-point and scientific notation#865
Conversation
Previously edecimal::parse only accepted integer literals matching
[+-]*[0-9]+ via std::regex_match. Anything with a decimal point or
an exponent suffix was rejected, even when the value was exactly
representable as an integer ("3.14e2" = 314).
Switch the tokenizer to sw::universal::string_parse::scan_decimal_float
(the foundation from #838) which yields int_part, frac_part, and a
signed exp10. The effective decimal exponent is exp10 - frac.size():
when non-negative, the value is an exact integer and we accept it; when
negative the value has fractional digits that edecimal cannot represent
without precision loss, so we reject ("3.14", "1.5e-100", "0.001").
This matches the issue spec's "preserve those digits exactly" rule.
Accepted forms now include:
"42", "-1000" -- integer (unchanged)
"3.14e2", "-2.5e1" -- decimal point with shift to integer
"1e10", "1.5e10" -- pure exponent or compatible
"3.14e+200" -- 201-digit exact integer
"5.", ".5e1" -- edge syntax that scan_decimal_float allows
Side effects:
- Call unpad() after parsing so "0042" / "0.0042e4" no longer carry
leading-zero limbs.
- Collapse "-0" / "-0.0e5" to +0 (no negative zero).
operator>> hygiene (failbit + extraction guard) was already shipped in
#858 (Phase E of #835); the test file now also pins it.
Test (elastic/decimal/conversion/string_parse.cpp) extended to 9 groups:
- integer parse (canonical + large)
- scientific exact (with the 201-digit "3.14e+200" reference)
- decimal-point form that produces an integer
- fractional input rejected (3.14, -0.5, 0.001, 1.5e-100, 3.0, 10.50e0)
- malformed reject (empty, alpha, "1e", ".", "1.2.3", "1e3.5", "42x", "0x1F")
- negative-zero collapse
- operator>> failbit on bad token
- operator>> success on scientific token in whitespace
- operator>> empty stream
Resolves #854
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughExtended edecimal::parse() to accept decimal-point and scientific-notation literals using scan_decimal_float, reject inputs with negative effective exponent, cap expansion size, normalize digits, and expand regression tests and operator>> edge-case checks. ChangesExtended edecimal string parsing for decimal and scientific notation
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@include/sw/universal/number/edecimal/edecimal_impl.hpp`:
- Around line 393-405: The exponent expansion can grow eff_exp (computed from
scan.exp10 - scan.frac_part.size()) to an unbounded value and then push_back
zeros in a loop; guard this by validating eff_exp before any allocations: in the
code around eff_exp, add a check that eff_exp is non-negative and below a
configured safe limit (e.g. MAX_ALLOWED_EXPANSION or derived from a
MAX_TOTAL_DIGITS), and also ensure (scan.int_part.size() + scan.frac_part.size()
+ eff_exp) does not exceed a MAX_TOTAL_DIGITS cap; if the check fails, return
false (as done for negative) to avoid the clear()/push_back loops in push_back
and the trailing-zero loop, and consider using reserve/resize instead of
repeated push_back when within limits (referencing symbols eff_exp, scan.exp10,
scan.frac_part.size(), clear(), push_back).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: feb64424-b290-4e02-b85e-69a5846843e5
📒 Files selected for processing (3)
elastic/decimal/conversion/string_parse.cppinclude/sw/universal/number/edecimal/edecimal.hppinclude/sw/universal/number/edecimal/edecimal_impl.hpp
CodeRabbit round 1 on #865 flagged a DoS surface: scan_decimal_float returns an int32 exponent, so an input like "1e2000000000" would loop push_back ~2 billion trailing zeros inside parse, allocating ~2 GiB. Cap the post-expansion digit count at 1 MiB (1,048,576 digits) and use vector::reserve + vector::insert in place of repeated push_back so the accepted path also stays O(N) without amortized growth. Test additions: - "1e2000000000", "1e2147483647" (INT32_MAX), "1e10000000", "1e1048576" (cap+1) -- all rejected. - The cap is exclusive: 1e1048575 (1 MiB significand) still parses. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Coverage Report for CI Build 26009791606Warning Build has drifted: This PR's base is out of sync with its target branch, so coverage data may include unrelated changes. Coverage remained the same at 84.166%Details
Uncovered ChangesNo uncovered changes found. Coverage Regressions10 previously-covered lines in 1 file lost coverage.
Coverage Stats
💛 - Coveralls |
Summary
`edecimal::parse` previously matched only `[+-]*[0-9]+` via
`std::regex_match`, rejecting every form with a decimal point or an
exponent suffix. This PR routes parse through
`sw::universal::string_parse::scan_decimal_float` (the foundation from
#838) and now accepts decimal-point and scientific-notation inputs that
yield an exact integer.
Design
`scan_decimal_float` returns `int_part`, `frac_part`, and a signed
`exp10`. The value's effective decimal exponent is
`exp10 - frac_part.size()`.
`int_part || frac_part`, append `eff_exp` trailing zeros, store
the digits in edecimal's LSB-first vector. Examples:
`"3.14e2" -> 314`, `"1.5e10" -> 15000000000`, `"3.14e+200" ->
314 followed by 198 zeros`.
(an integer-only number system) cannot represent without precision
loss. Reject. Examples: `"3.14"`, `"-0.5"`, `"1.5e-100"`,
`"3.0"` (eff_exp = -1), `"10.50e0"` (eff_exp = -2).
The strict rejection matches the issue spec's "preserve those digits
exactly" rule.
Changes
`utility/string_parse.hpp` (provides `scan_decimal_float`).
`parse()` against `scan_decimal_float`; call `unpad()` so leading
zeros do not survive in the digit vector; collapse negative zero to
+0 across all accepted forms.
suite to 9 groups covering the new grammar, the strict rejection,
and the pre-existing operator>> hygiene (shipped in feat: operator>> hygiene + ereal nan/inf for decimal/elastic family (Phase E of #835) #858).
Test Results
Test plan
Resolves #854
Generated with Claude Code
Summary by CodeRabbit
New Features
Tests