Skip to content

feat(edecimal): extend parse to decimal-point and scientific notation#865

Merged
Ravenwater merged 2 commits into
mainfrom
feat/issue-854-edecimal-parse-decimal-and-sci
May 18, 2026
Merged

feat(edecimal): extend parse to decimal-point and scientific notation#865
Ravenwater merged 2 commits into
mainfrom
feat/issue-854-edecimal-parse-decimal-and-sci

Conversation

@Ravenwater

@Ravenwater Ravenwater commented May 18, 2026

Copy link
Copy Markdown
Contributor

Summary

`edecimal::parse` previously matched only `[+-]*[0-9]+` via
`std::regex_match`, rejecting every form with a decimal point or an
exponent suffix. This PR routes parse through
`sw::universal::string_parse::scan_decimal_float` (the foundation from
#838) and now accepts decimal-point and scientific-notation inputs that
yield an exact integer.

Design

`scan_decimal_float` returns `int_part`, `frac_part`, and a signed
`exp10`. The value's effective decimal exponent is
`exp10 - frac_part.size()`.

  • eff_exp >= 0: the value is an exact integer. Concatenate
    `int_part || frac_part`, append `eff_exp` trailing zeros, store
    the digits in edecimal's LSB-first vector. Examples:
    `"3.14e2" -> 314`, `"1.5e10" -> 15000000000`, `"3.14e+200" ->
    314 followed by 198 zeros`.
  • eff_exp < 0: the value carries fractional digits that edecimal
    (an integer-only number system) cannot represent without precision
    loss. Reject. Examples: `"3.14"`, `"-0.5"`, `"1.5e-100"`,
    `"3.0"` (eff_exp = -1), `"10.50e0"` (eff_exp = -2).

The strict rejection matches the issue spec's "preserve those digits
exactly" rule.

Changes

  • `include/sw/universal/number/edecimal/edecimal.hpp` -- include
    `utility/string_parse.hpp` (provides `scan_decimal_float`).
  • `include/sw/universal/number/edecimal/edecimal_impl.hpp` -- rewrite
    `parse()` against `scan_decimal_float`; call `unpad()` so leading
    zeros do not survive in the digit vector; collapse negative zero to
    +0 across all accepted forms.
  • `elastic/decimal/conversion/string_parse.cpp` -- extend the test
    suite to 9 groups covering the new grammar, the strict rejection,
    and the pre-existing operator>> hygiene (shipped in feat: operator>> hygiene + ereal nan/inf for decimal/elastic family (Phase E of #835) #858).

Test Results

Target gcc build gcc test clang build clang test
edec_string_parse OK PASS (9/9 groups) OK PASS (9/9 groups)
edec_api OK exit 0 OK exit 0
edec_assignment OK exit 0 OK exit 0
edec_comparison OK exit 0 OK exit 0
edec_constexpr OK exit 0 OK exit 0
edec_addition / subtraction / multiplication / division OK exit 0 OK exit 0

Test plan

  • Fast CI (gcc + clang CI_LITE) passes
  • Promote when satisfied: `gh pr ready`

Resolves #854

Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Enhanced decimal parsing: supports scientific notation, decimal points with fractional parts, and signed literals; enforces limits to prevent overly large expansions and rejects inputs that would require fractional loss.
  • Tests

    • Expanded regression suite with structured per-case reporting and helpers to validate successful parses and expected rejection/fail behavior.

Review Change Stack

Previously edecimal::parse only accepted integer literals matching
[+-]*[0-9]+ via std::regex_match.  Anything with a decimal point or
an exponent suffix was rejected, even when the value was exactly
representable as an integer ("3.14e2" = 314).

Switch the tokenizer to sw::universal::string_parse::scan_decimal_float
(the foundation from #838) which yields int_part, frac_part, and a
signed exp10.  The effective decimal exponent is exp10 - frac.size():
when non-negative, the value is an exact integer and we accept it; when
negative the value has fractional digits that edecimal cannot represent
without precision loss, so we reject ("3.14", "1.5e-100", "0.001").
This matches the issue spec's "preserve those digits exactly" rule.

Accepted forms now include:
  "42", "-1000"              -- integer (unchanged)
  "3.14e2",   "-2.5e1"       -- decimal point with shift to integer
  "1e10",     "1.5e10"       -- pure exponent or compatible
  "3.14e+200"                -- 201-digit exact integer
  "5.", ".5e1"               -- edge syntax that scan_decimal_float allows

Side effects:
- Call unpad() after parsing so "0042" / "0.0042e4" no longer carry
  leading-zero limbs.
- Collapse "-0" / "-0.0e5" to +0 (no negative zero).

operator>> hygiene (failbit + extraction guard) was already shipped in
#858 (Phase E of #835); the test file now also pins it.

Test (elastic/decimal/conversion/string_parse.cpp) extended to 9 groups:
- integer parse (canonical + large)
- scientific exact (with the 201-digit "3.14e+200" reference)
- decimal-point form that produces an integer
- fractional input rejected (3.14, -0.5, 0.001, 1.5e-100, 3.0, 10.50e0)
- malformed reject (empty, alpha, "1e", ".", "1.2.3", "1e3.5", "42x", "0x1F")
- negative-zero collapse
- operator>> failbit on bad token
- operator>> success on scientific token in whitespace
- operator>> empty stream

Resolves #854

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented May 18, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9afb4d5a-a6b3-438f-8a3c-07cac4152104

📥 Commits

Reviewing files that changed from the base of the PR and between 4bd7897 and 2370c77.

📒 Files selected for processing (2)
  • elastic/decimal/conversion/string_parse.cpp
  • include/sw/universal/number/edecimal/edecimal_impl.hpp
🚧 Files skipped from review as they are similar to previous changes (2)
  • include/sw/universal/number/edecimal/edecimal_impl.hpp
  • elastic/decimal/conversion/string_parse.cpp

📝 Walkthrough

Walkthrough

Extended edecimal::parse() to accept decimal-point and scientific-notation literals using scan_decimal_float, reject inputs with negative effective exponent, cap expansion size, normalize digits, and expand regression tests and operator>> edge-case checks.

Changes

Extended edecimal string parsing for decimal and scientific notation

Layer / File(s) Summary
Parse implementation with decimal and scientific notation support
include/sw/universal/number/edecimal/edecimal_impl.hpp
parse() algorithm rewritten to use scan_decimal_float for tokenization, compute effective exponent (base-10 exponent minus fractional-digit count), reject negative effective exponent cases to avoid fractional loss, enforce a maximum expanded digit count, rebuild internal digit vector with significand plus exponent-derived trailing zeros, reverse to [0]==10^0, normalize via unpad(), and force zero to be positive.
Header dependency and wiring
include/sw/universal/number/edecimal/edecimal.hpp
Adds #include <universal/utility/string_parse.hpp> so edecimal can use scan_decimal_float.
Test helpers and regression suite
elastic/decimal/conversion/string_parse.cpp
Adds CheckParse/CheckReject helpers, reorganizes main into labeled test suites covering canonical integers, scientific-notation integers, allowed decimal-point forms (non-negative effective exponent), rejection of fractional/negative-effective-exponent inputs, malformed-token rejection, overly-large-expansion guard, negative-zero normalization, and operator>> behavior tests (failbit on bad token/empty stream, successful scientific parsing and consumption). Also standardizes exception reporting and final test reporting.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

Possibly related PRs

Suggested labels

enhancement

Poem

"I nibble digits, hop through dots and e,
From integers to exponents, parse sets me free.
No tiny fractions lost on my trail,
Zero sheds its sign without fail—🐰
Hooray for precise decimal tea!"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 42.86% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely summarizes the main change: extending edecimal's parse function to handle decimal-point and scientific notation forms.
Linked Issues check ✅ Passed All primary objectives from issue #854 are met: parse extended to accept integers, decimals, and scientific notation with effective exponent >= 0; scan_decimal_float reused for tokenization; negative fractional exponents rejected; operator>> hygiene added with failbit on parse failure; comprehensive regression tests added covering all specified cases.
Out of Scope Changes check ✅ Passed All code changes are directly within scope: parse rewrite uses scan_decimal_float as specified, string_parse.hpp header included as required, test suite expanded to cover all acceptance criteria with no unrelated changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/issue-854-edecimal-parse-decimal-and-sci

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@include/sw/universal/number/edecimal/edecimal_impl.hpp`:
- Around line 393-405: The exponent expansion can grow eff_exp (computed from
scan.exp10 - scan.frac_part.size()) to an unbounded value and then push_back
zeros in a loop; guard this by validating eff_exp before any allocations: in the
code around eff_exp, add a check that eff_exp is non-negative and below a
configured safe limit (e.g. MAX_ALLOWED_EXPANSION or derived from a
MAX_TOTAL_DIGITS), and also ensure (scan.int_part.size() + scan.frac_part.size()
+ eff_exp) does not exceed a MAX_TOTAL_DIGITS cap; if the check fails, return
false (as done for negative) to avoid the clear()/push_back loops in push_back
and the trailing-zero loop, and consider using reserve/resize instead of
repeated push_back when within limits (referencing symbols eff_exp, scan.exp10,
scan.frac_part.size(), clear(), push_back).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: feb64424-b290-4e02-b85e-69a5846843e5

📥 Commits

Reviewing files that changed from the base of the PR and between 842e381 and 4bd7897.

📒 Files selected for processing (3)
  • elastic/decimal/conversion/string_parse.cpp
  • include/sw/universal/number/edecimal/edecimal.hpp
  • include/sw/universal/number/edecimal/edecimal_impl.hpp

Comment thread include/sw/universal/number/edecimal/edecimal_impl.hpp
@Ravenwater Ravenwater self-assigned this May 18, 2026
@Ravenwater Ravenwater added this to the V4 milestone May 18, 2026
@Ravenwater Ravenwater moved this to In progress in Universal Number Library May 18, 2026
CodeRabbit round 1 on #865 flagged a DoS surface: scan_decimal_float
returns an int32 exponent, so an input like "1e2000000000" would loop
push_back ~2 billion trailing zeros inside parse, allocating ~2 GiB.

Cap the post-expansion digit count at 1 MiB (1,048,576 digits) and use
vector::reserve + vector::insert in place of repeated push_back so the
accepted path also stays O(N) without amortized growth.

Test additions:
- "1e2000000000", "1e2147483647" (INT32_MAX), "1e10000000",
  "1e1048576" (cap+1) -- all rejected.
- The cap is exclusive: 1e1048575 (1 MiB significand) still parses.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Ravenwater Ravenwater marked this pull request as ready for review May 18, 2026 02:09
@Ravenwater Ravenwater merged commit 861e6e6 into main May 18, 2026
32 checks passed
@github-project-automation github-project-automation Bot moved this from In progress to Done in Universal Number Library May 18, 2026
@Ravenwater Ravenwater deleted the feat/issue-854-edecimal-parse-decimal-and-sci branch May 18, 2026 02:34
@coveralls

Copy link
Copy Markdown

Coverage Report for CI Build 26009791606

Warning

Build has drifted: This PR's base is out of sync with its target branch, so coverage data may include unrelated changes.
Quick fix: rebase this PR. Learn more →

Coverage remained the same at 84.166%

Details

  • Coverage remained the same as the base build.
  • Patch coverage: 22 of 22 lines across 1 file are fully covered (100%).
  • 10 coverage regressions across 1 file.

Uncovered Changes

No uncovered changes found.

Coverage Regressions

10 previously-covered lines in 1 file lost coverage.

File Lines Losing Coverage Coverage
include/sw/universal/verification/test_suite_randoms.hpp 10 31.43%

Coverage Stats

Coverage Status
Relevant Lines: 55519
Covered Lines: 46728
Line Coverage: 84.17%
Coverage Strength: 6419966.99 hits per line

💛 - Coveralls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

feat(edecimal): extend parse to decimal floating-point and scientific notation

2 participants