Skip to content

perf(elreal): Phase L.1 -- depth-1 refinement for division#917

Merged
Ravenwater merged 2 commits into
mainfrom
feat/elreal-phase-l1
May 22, 2026
Merged

perf(elreal): Phase L.1 -- depth-1 refinement for division#917
Ravenwater merged 2 commits into
mainfrom
feat/elreal-phase-l1

Conversation

@Ravenwater

@Ravenwater Ravenwater commented May 22, 2026

Copy link
Copy Markdown
Contributor

Summary

Phase L.1 of follow-up epic #903. Lifts the "elreal / is depth-0 only" caveat that has held since Phase G.

Algorithm

Let `r = a/b` be the true value. At the leading doubles:

```
r = c0 + Δa/b0 - (a0/b0^2) Δb + (a - b*c0)/b0 + O(eps^2)
```

where the IEEE residual `a - b*c0` is recoverable exactly from the leading doubles via EFTs:

```
two_prod(b0, c0) -> (prod_hi, prod_err) with b0*c0 = prod_hi + prod_err
two_diff(a0, prod_hi) -> (diff_hi, diff_err)
ieee_residual = (diff_hi + diff_err) - prod_err
```

The depth-1 component is then `c1 = (ieee_residual + a.at(1) - c0 * b.at(1)) / b0`, which fits the existing `gen_binary_linear` variant alternative with `constant = ieee_residual / b0`, `ca = 1/b0`, `cb = -c0/b0`. No new generator shape needed; just populate `gen_binary_linear` in `operator/`.

Files

  • `include/sw/universal/number/elreal/elreal_impl.hpp` -- depth-1 generator added to `operator/` (~40 lines changed).
  • `docs/number-systems/elreal.md` -- known-limitations entry on "depth-1 ceiling" updated.
  • `docs/algorithmic-details/lazy-real-arithmetic.md` -- Section 6 depth-1 generator table updated with the `/` row's actual formula.
  • `docs/algorithmic-details/elreal-performance-baseline.md` -- division cost-shape section rewritten for post-L.1.
  • `docs/algorithmic-details/multi-component-arithmetic.md` -- section 7.1 picker table updated: `elreal /` now beats `ereal /` by ~ 19x at matched precision.
  • `docs/multi-component/exact-lazy-arithmetic.md` -- "depth-0-only for /" caveat removed.

Validation

  • 1/3 sanity: c0 = 0.333...331 (IEEE rounded), c1 = 1.85e-17 (the exact positive residual). Sum c0+c1 ≈ 1/3 to ~ 32 digits.
  • All 30 elreal regression tests PASS under gcc 13.3 and clang 18.1.
  • Phase J oracle sweep PASS under both compilers.
  • `benchmark_elreal_performance`: division now ~ 13 Mops/s, the same range as +/-/* (was an artifact ~ 1 Gops/s pre-L.1 from gcc inlining the entire depth-0-only operator away).

Picker shift

Op elreal post-L.1 ereal<2> Winner
/ 13 Mops/s 680 Kops/s `elreal` (~ 19x)

With L.1 in place, `elreal` matches `ereal<2>` at depth 1 across the four elementary arithmetic operators, AND beats it on division by ~ 19x. `ereal /` runs the iterative `expansion_quotient` algorithm; `elreal /` produces depth-1 in a single generator-emplace.

What this PR does NOT do

  • Newton iteration for depth 2+. That's Phase L.2.
  • Depth-2+ for sqrt. Also Phase L.2 (sqrt already has depth 1 via gen_sqrt).

Part of #906 (Phase L of follow-up epic #903).

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • elreal division now uses depth‑1 refinement, improving division precision and yielding ~13–16 Mops/s; elreal remains the only provider for sqrt/exp/log at ~36–43 Mops/s. Comparison clarifies other implementations’ iterative division behavior (~680 Kops/s).
  • Documentation

    • Updated algorithmic and performance docs to reflect Phase L.1 depth‑1 behavior and defer deeper (depth‑2+) refinement to Phase L.2.

Review Change Stack

… + IEEE residual

Phase L.1 of follow-up epic #903 (#906). Lifts the "elreal / is depth-0
only" caveat that has held since Phase G.

Algorithm
---------

Let r = a/b be the true value. At the leading doubles:

  r = c0 + Δa/b0 - (a0/b0^2) Δb + (a - b*c0)/b0 + O(eps^2)

where the IEEE residual `a - b*c0` is recoverable exactly from the
leading doubles via EFTs:

  two_prod(b0, c0)  -> (prod_hi, prod_err) with b0*c0 = prod_hi + prod_err
  two_diff(a0, prod_hi) -> (diff_hi, diff_err)
  ieee_residual = (diff_hi + diff_err) - prod_err

The depth-1 component is then

  c1 = (ieee_residual + a.at(1) - c0 * b.at(1)) / b0

which fits the existing gen_binary_linear variant alternative with:
  constant = ieee_residual / b0
  ca       = 1/b0
  cb       = -c0/b0

No new generator shape needed; just populate gen_binary_linear in
operator/. The PR is small (~ 40 lines changed in elreal_impl.hpp).

Validation
----------

- 1/3 produces c0 = 0.333...331 (IEEE rounded) and c1 = 1.85e-17, the
  exact positive residual. Sanity check: c0 + c1 ~= 1/3 to ~ 32 digits.
- All 30 elreal regression tests PASS under gcc 13.3 and clang 18.1.
- Phase J oracle sweep PASS under both compilers.
- benchmark_elreal_performance: division now in the same 13 Mops/s
  range as +/-/* (it was an artifact ~ 1 Gops/s pre-L.1 from gcc
  inlining the entire depth-0-only operator away).

Doc updates
-----------

- docs/number-systems/elreal.md: known-limitations entry on "depth-1
  ceiling" updated to mention `/` is now included alongside the
  other operators.
- docs/algorithmic-details/lazy-real-arithmetic.md: Section 6 depth-1
  generator table updated with the / row's actual formula.
- docs/algorithmic-details/elreal-performance-baseline.md: division
  cost-shape section rewritten to describe the post-L.1 reality
  (1 Gops/s artifact is gone; throughput now ~ 13 Mops/s, the same
  range as the other binary operators).
- docs/algorithmic-details/multi-component-arithmetic.md: section
  7.1 picker table updated -- `elreal /` now beats `ereal<N> /` by
  ~ 19x at matched precision (was apples-to-oranges pre-L.1).
- docs/multi-component/exact-lazy-arithmetic.md: design narrative
  no longer says "depth-0-only for /".

What this does NOT do
---------------------

- Newton iteration for depth 2+ -- that's Phase L.2.
- Depth-2+ for sqrt -- also Phase L.2 (sqrt already has depth 1).

Part of #906 (Phase L of follow-up epic #903).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c9ea929a-3d23-4451-a76c-98329cfce62c

📥 Commits

Reviewing files that changed from the base of the PR and between 35e8d5d and 8ed3622.

📒 Files selected for processing (1)
  • include/sw/universal/number/elreal/elreal_impl.hpp

📝 Walkthrough

Walkthrough

This PR implements depth-1 refinement for elreal division and updates documentation across the codebase. The operator/ now computes an IEEE residual via EFT primitives and installs a gen_binary_linear generator for depth-1 correction, replacing the prior single-double limitation. Documentation updates reflect the Phase L.1 completion, new throughput metrics (~13–16 Mops/s), and deferral of deeper Newton refinement to Phase L.2.

Changes

Phase L.1 Depth-1 Division Refinement

Layer / File(s) Summary
Depth-1 division generator implementation
include/sw/universal/number/elreal/elreal_impl.hpp
operator/ now computes leading quotient and conditionally installs a gen_binary_linear depth-1 correction generator using EFT-derived residuals and operand depth-1 Taylor partials, replacing the prior leading-only behavior.
Algorithmic specification for depth-1 division
docs/algorithmic-details/lazy-real-arithmetic.md
The depth-1 generator formula is documented as combining IEEE division residual with Taylor-partial operand corrections. Phase L.1 covers depth-1 for arithmetic operators; Phase L.2 defers deeper Newton refinement and lazy-division walking.
Performance metrics and design narrative updates
docs/algorithmic-details/elreal-performance-baseline.md, docs/algorithmic-details/multi-component-arithmetic.md, docs/number-systems/elreal.md, docs/multi-component/exact-lazy-arithmetic.md
Performance tables and narratives updated to show elreal / at depth-1 post-L.1 (~13–16 Mops/s) and expanded comparison rows; roadmap reference changed to Phase L.2 / Phase M epic (#903) for depth-2+ refinement.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

Possibly related PRs

Suggested labels

enhancement

Poem

🐰 A rabbit refines the division with care,
Installing generators, layer by layer,
EFT residuals and Taylor dreams,
Depth-1 precision in elegant schemes,
Phase L.1 hops toward the next frontier! 🌙

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main change: introducing depth-1 refinement for the division operator in elreal as part of Phase L.1, which is the primary implementation focus of this PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/elreal-phase-l1

Comment @coderabbitai help to get the list of available commands and usage tips.

@Ravenwater Ravenwater self-assigned this May 22, 2026
@Ravenwater Ravenwater added this to the V4 milestone May 22, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@include/sw/universal/number/elreal/elreal_impl.hpp`:
- Around line 949-977: Compute the depth-1 coefficients (ieee_residual/b0,
1.0/b0, -c0/b0) and verify each is finite before assigning result._generator =
gen_binary_linear; if any coefficient is not finite (e.g. due to tiny non-zero
b0 producing inf/NaN) do not install the generator and simply return result
as-is. Update the operator/… division code around variables c0, b0, prod_err,
ieee_residual and the gen_binary_linear assignment to perform std::isfinite
checks on the three coefficients (or on ca/cb/constant) and only set
result._generator when all three pass. Ensure this also prevents
evaluate_generator(gen_binary_linear) from receiving inf/NaN multipliers
(affecting elreal::at(1) usage).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9e9a647c-7ef1-467e-b9c1-9b9070ee1268

📥 Commits

Reviewing files that changed from the base of the PR and between e666ed5 and 35e8d5d.

📒 Files selected for processing (6)
  • docs/algorithmic-details/elreal-performance-baseline.md
  • docs/algorithmic-details/lazy-real-arithmetic.md
  • docs/algorithmic-details/multi-component-arithmetic.md
  • docs/multi-component/exact-lazy-arithmetic.md
  • docs/number-systems/elreal.md
  • include/sw/universal/number/elreal/elreal_impl.hpp

Comment thread include/sw/universal/number/elreal/elreal_impl.hpp
Add finite-check guard on the depth-1 coefficients of operator/.

CodeRabbit caught the edge case: if b0 is a denormal whose reciprocal
overflows to inf, the computed ca = 1/b0, cb = -c0/b0, and constant
= ieee_residual/b0 can each become non-finite even though c0 = a0/b0
itself was finite. Without a guard, installing the generator would
propagate inf/NaN into every depth-1 walk that touches at(1).

Fix: precompute ca, cb, cconst into named locals; std::isfinite check
all three; bail out to depth-0-only if any is non-finite. The bail-out
preserves the leading double (which is correct per IEEE-754) and
returns 0.0 for at(k >= 1).

Verified:
- denorm / denorm:  at(0) = 1.0 (correct), at(1) = 0.0 (would have been
  inf without the guard).
- Normal case 2.0/3.0: at(0) = 0.666...663, at(1) = +3.7e-17 (correct
  positive residual). Behavior unchanged for finite-coefficient cases.
- All sampled elreal tests + Phase J oracle sweep PASS under gcc 13.3
  and clang 18.1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Ravenwater

Copy link
Copy Markdown
Contributor Author

Addressed the CodeRabbit nitpick in 8ed3622.

Edge case caught: if b0 is a denormal whose reciprocal overflows to inf, the computed ca = 1/b0, cb = -c0/b0, and cconst = ieee_residual/b0 can each become non-finite even though c0 = a0/b0 itself was finite. Without a guard, installing the generator would propagate inf/NaN into every depth-1 walk that touches at(1).

Fix: precompute ca, cb, cconst into named locals; std::isfinite check all three; bail out to depth-0-only if any is non-finite. The bail-out preserves the leading double (which is correct per IEEE-754) and returns 0.0 for at(k >= 1).

Verified:

  • denorm / denorm: at(0) = 1.0 (correct), at(1) = 0.0 (would have been inf without the guard)
  • Normal case 2.0/3.0: at(0) = 0.666...663, at(1) = +3.7e-17 (correct positive residual). Behavior unchanged for finite-coefficient cases.
  • All sampled elreal tests + Phase J oracle sweep PASS under gcc 13.3 and clang 18.1.

@Ravenwater Ravenwater marked this pull request as ready for review May 22, 2026 03:22
@Ravenwater Ravenwater merged commit e673df8 into main May 22, 2026
32 checks passed
@Ravenwater Ravenwater deleted the feat/elreal-phase-l1 branch May 22, 2026 03:36
@coveralls

Copy link
Copy Markdown

Coverage Report for CI Build 26266561859

Warning

Build has drifted: This PR's base is out of sync with its target branch, so coverage data may include unrelated changes.
Quick fix: rebase this PR. Learn more →

Coverage decreased (-0.01%) to 84.233%

Details

  • Coverage decreased (-0.01%) from the base build.
  • Patch coverage: No coverable lines changed in this PR.
  • 10 coverage regressions across 1 file.

Uncovered Changes

No uncovered changes found.

Coverage Regressions

10 previously-covered lines in 1 file lost coverage.

File Lines Losing Coverage Coverage
include/sw/universal/verification/test_suite_randoms.hpp 10 31.07%

Coverage Stats

Coverage Status
Relevant Lines: 55729
Covered Lines: 46942
Line Coverage: 84.23%
Coverage Strength: 5288105.08 hits per line

💛 - Coveralls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants