perf(elreal): Phase L.1 -- depth-1 refinement for division#917
Conversation
… + IEEE residual Phase L.1 of follow-up epic #903 (#906). Lifts the "elreal / is depth-0 only" caveat that has held since Phase G. Algorithm --------- Let r = a/b be the true value. At the leading doubles: r = c0 + Δa/b0 - (a0/b0^2) Δb + (a - b*c0)/b0 + O(eps^2) where the IEEE residual `a - b*c0` is recoverable exactly from the leading doubles via EFTs: two_prod(b0, c0) -> (prod_hi, prod_err) with b0*c0 = prod_hi + prod_err two_diff(a0, prod_hi) -> (diff_hi, diff_err) ieee_residual = (diff_hi + diff_err) - prod_err The depth-1 component is then c1 = (ieee_residual + a.at(1) - c0 * b.at(1)) / b0 which fits the existing gen_binary_linear variant alternative with: constant = ieee_residual / b0 ca = 1/b0 cb = -c0/b0 No new generator shape needed; just populate gen_binary_linear in operator/. The PR is small (~ 40 lines changed in elreal_impl.hpp). Validation ---------- - 1/3 produces c0 = 0.333...331 (IEEE rounded) and c1 = 1.85e-17, the exact positive residual. Sanity check: c0 + c1 ~= 1/3 to ~ 32 digits. - All 30 elreal regression tests PASS under gcc 13.3 and clang 18.1. - Phase J oracle sweep PASS under both compilers. - benchmark_elreal_performance: division now in the same 13 Mops/s range as +/-/* (it was an artifact ~ 1 Gops/s pre-L.1 from gcc inlining the entire depth-0-only operator away). Doc updates ----------- - docs/number-systems/elreal.md: known-limitations entry on "depth-1 ceiling" updated to mention `/` is now included alongside the other operators. - docs/algorithmic-details/lazy-real-arithmetic.md: Section 6 depth-1 generator table updated with the / row's actual formula. - docs/algorithmic-details/elreal-performance-baseline.md: division cost-shape section rewritten to describe the post-L.1 reality (1 Gops/s artifact is gone; throughput now ~ 13 Mops/s, the same range as the other binary operators). - docs/algorithmic-details/multi-component-arithmetic.md: section 7.1 picker table updated -- `elreal /` now beats `ereal<N> /` by ~ 19x at matched precision (was apples-to-oranges pre-L.1). - docs/multi-component/exact-lazy-arithmetic.md: design narrative no longer says "depth-0-only for /". What this does NOT do --------------------- - Newton iteration for depth 2+ -- that's Phase L.2. - Depth-2+ for sqrt -- also Phase L.2 (sqrt already has depth 1). Part of #906 (Phase L of follow-up epic #903). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis PR implements depth-1 refinement for ChangesPhase L.1 Depth-1 Division Refinement
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@include/sw/universal/number/elreal/elreal_impl.hpp`:
- Around line 949-977: Compute the depth-1 coefficients (ieee_residual/b0,
1.0/b0, -c0/b0) and verify each is finite before assigning result._generator =
gen_binary_linear; if any coefficient is not finite (e.g. due to tiny non-zero
b0 producing inf/NaN) do not install the generator and simply return result
as-is. Update the operator/… division code around variables c0, b0, prod_err,
ieee_residual and the gen_binary_linear assignment to perform std::isfinite
checks on the three coefficients (or on ca/cb/constant) and only set
result._generator when all three pass. Ensure this also prevents
evaluate_generator(gen_binary_linear) from receiving inf/NaN multipliers
(affecting elreal::at(1) usage).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 9e9a647c-7ef1-467e-b9c1-9b9070ee1268
📒 Files selected for processing (6)
docs/algorithmic-details/elreal-performance-baseline.mddocs/algorithmic-details/lazy-real-arithmetic.mddocs/algorithmic-details/multi-component-arithmetic.mddocs/multi-component/exact-lazy-arithmetic.mddocs/number-systems/elreal.mdinclude/sw/universal/number/elreal/elreal_impl.hpp
Add finite-check guard on the depth-1 coefficients of operator/. CodeRabbit caught the edge case: if b0 is a denormal whose reciprocal overflows to inf, the computed ca = 1/b0, cb = -c0/b0, and constant = ieee_residual/b0 can each become non-finite even though c0 = a0/b0 itself was finite. Without a guard, installing the generator would propagate inf/NaN into every depth-1 walk that touches at(1). Fix: precompute ca, cb, cconst into named locals; std::isfinite check all three; bail out to depth-0-only if any is non-finite. The bail-out preserves the leading double (which is correct per IEEE-754) and returns 0.0 for at(k >= 1). Verified: - denorm / denorm: at(0) = 1.0 (correct), at(1) = 0.0 (would have been inf without the guard). - Normal case 2.0/3.0: at(0) = 0.666...663, at(1) = +3.7e-17 (correct positive residual). Behavior unchanged for finite-coefficient cases. - All sampled elreal tests + Phase J oracle sweep PASS under gcc 13.3 and clang 18.1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Addressed the CodeRabbit nitpick in 8ed3622. Edge case caught: if Fix: precompute Verified:
|
Coverage Report for CI Build 26266561859Warning Build has drifted: This PR's base is out of sync with its target branch, so coverage data may include unrelated changes. Coverage decreased (-0.01%) to 84.233%Details
Uncovered ChangesNo uncovered changes found. Coverage Regressions10 previously-covered lines in 1 file lost coverage.
Coverage Stats
💛 - Coveralls |
Summary
Phase L.1 of follow-up epic #903. Lifts the "elreal / is depth-0 only" caveat that has held since Phase G.
Algorithm
Let `r = a/b` be the true value. At the leading doubles:
```
r = c0 + Δa/b0 - (a0/b0^2) Δb + (a - b*c0)/b0 + O(eps^2)
```
where the IEEE residual `a - b*c0` is recoverable exactly from the leading doubles via EFTs:
```
two_prod(b0, c0) -> (prod_hi, prod_err) with b0*c0 = prod_hi + prod_err
two_diff(a0, prod_hi) -> (diff_hi, diff_err)
ieee_residual = (diff_hi + diff_err) - prod_err
```
The depth-1 component is then `c1 = (ieee_residual + a.at(1) - c0 * b.at(1)) / b0`, which fits the existing `gen_binary_linear` variant alternative with `constant = ieee_residual / b0`, `ca = 1/b0`, `cb = -c0/b0`. No new generator shape needed; just populate `gen_binary_linear` in `operator/`.
Files
Validation
Picker shift
elrealpost-L.1ereal<2>/With L.1 in place, `elreal` matches `ereal<2>` at depth 1 across the four elementary arithmetic operators, AND beats it on division by ~ 19x. `ereal /` runs the iterative `expansion_quotient` algorithm; `elreal /` produces depth-1 in a single generator-emplace.
What this PR does NOT do
Part of #906 (Phase L of follow-up epic #903).
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation