Background
Phase C of #835 (PR #847) brought the extended-precision FP family up to API parity for decimal string parsing -- nan/inf tokens, `operator>>` failbit, etc. The digit-accumulation core was deliberately left unchanged because it already uses the type's own high-precision arithmetic, qualitatively better than the stod funnel that posit/cfloat had.
The one residual precision gap is the decimal-exponent step. Each `parse()` ends with roughly:
```cpp
floatcascade ten(10.0);
if (exp > 0) {
r *= pown(ten, exp);
} else if (exp < 0) {
r /= pown(ten, -exp);
}
```
`pown(ten, exp)` is computed iteratively (~exp multiplications). Each multiplication adds ULP-level error. For `|exp|` ~ 10-20 the error is negligible; for `|exp|` ~ 100-300 (the IEEE double exponent range) it becomes visible.
So the parsed value carries:
- exact accumulation of decimal digits (no loss)
- ~|exp| ULP error in `pown`
- 1 ULP from the final multiplication / division
That is strictly better than the legacy stod path posit/cfloat used, but not bit-exact.
Proposed fix
Replace the `pown(10, e)` step with a route through `decimal_to_binary` (the Phase B2a utility, see PR #841):
- Call `decimal_to_binary::convert(str, target_mantissa_bits)` with `target_mantissa_bits` sized to the type's full precision (53 * N + headroom).
- Get back an exact rational represented as a normalized mantissa + binary scale + guard/sticky.
- Split the mantissa into N IEEE-754 double components:
- hi = round-to-nearest-double of the top 53 bits at the value's exponent
- mi / mi2 / ... = round-to-nearest-double of subsequent 53-bit windows
- lo = residual at the lowest exponent
Each component carries an exponent ~53 bits below the previous one. The standard "distillation" algorithm from the Bailey / Hida quad-double paper applies.
- Pack into the cascade in canonical form (each component < previous component's ULP).
This gives bit-exact correctly-rounded conversion regardless of input magnitude.
Affected types
| Type |
Components |
Precision |
| dd |
2 |
~106 bits |
| qd |
4 |
~212 bits |
| dd_cascade |
2 |
~106 bits |
| td_cascade |
3 |
~159 bits |
| qd_cascade |
4 |
~212 bits |
dd_cascade / td_cascade / qd_cascade all delegate to `floatcascade::parse` in `include/sw/universal/internal/floatcascade/floatcascade.hpp`, so the centralized fix is in `floatcascade::parse`. dd and qd have their own `parse()` bodies.
Why deferred
- The (hi, lo) / quad-component splitter is non-trivial new code per topology. Estimated 2-3x the LoC of Phase B2c (cfloat).
- The current accumulator is already more accurate than what posit/cfloat had before B2b/B2c -- the parity-only Phase C closed the API gap without introducing new conversion algorithms.
- Useful test oracle requires bigint comparison or a known-correct reference (e.g., mpfr) to validate the splitter; that infrastructure lands more naturally with a follow-up phase.
Acceptance criteria
Related
Background
Phase C of #835 (PR #847) brought the extended-precision FP family up to API parity for decimal string parsing -- nan/inf tokens, `operator>>` failbit, etc. The digit-accumulation core was deliberately left unchanged because it already uses the type's own high-precision arithmetic, qualitatively better than the stod funnel that posit/cfloat had.
The one residual precision gap is the decimal-exponent step. Each `parse()` ends with roughly:
```cpp
floatcascade ten(10.0);
if (exp > 0) {
r *= pown(ten, exp);
} else if (exp < 0) {
r /= pown(ten, -exp);
}
```
`pown(ten, exp)` is computed iteratively (~exp multiplications). Each multiplication adds ULP-level error. For `|exp|` ~ 10-20 the error is negligible; for `|exp|` ~ 100-300 (the IEEE double exponent range) it becomes visible.
So the parsed value carries:
That is strictly better than the legacy stod path posit/cfloat used, but not bit-exact.
Proposed fix
Replace the `pown(10, e)` step with a route through `decimal_to_binary` (the Phase B2a utility, see PR #841):
Each component carries an exponent ~53 bits below the previous one. The standard "distillation" algorithm from the Bailey / Hida quad-double paper applies.
This gives bit-exact correctly-rounded conversion regardless of input magnitude.
Affected types
dd_cascade / td_cascade / qd_cascade all delegate to `floatcascade::parse` in `include/sw/universal/internal/floatcascade/floatcascade.hpp`, so the centralized fix is in `floatcascade::parse`. dd and qd have their own `parse()` bodies.
Why deferred
Acceptance criteria
Related