perf(elreal): Phase K.1 -- small-buffer optimisation on _components by Ravenwater · Pull Request #912 · stillwater-sc/universal

Ravenwater · 2026-05-21T22:25:56Z

Summary

Phase K.1 of follow-up epic #903. First sub-phase of Phase K (#905): allocator hot-path optimisation. The Phase I baseline identified per-operator vector allocation as the dominant cost; this PR closes that part of the gap.

K is a multi-PR phase. K.1 is the first and largest payoff item per the Phase I baseline doc: replace `std::vector` storage with a small-buffer-optimised type so the common case (depth 1-4 components) pays no heap allocation.

What landed

`include/sw/universal/number/elreal/lazy_component_buffer.hpp` -- new small-buffer storage type: 4-double inline array + spill via `std::vector`. 64 bytes total (one cache line). API surface is the minimum elreal needs: `push_back`, `operator[]`, `size`, `clear`, `reserve`.
`include/sw/universal/number/elreal/elreal_impl.hpp` -- migrated `_components` from `std::vector` to `lazy_component_buffer`. The `components()` accessor now returns `const lazy_component_buffer&`; the single external user in `manipulators.hpp` uses `.size()` and `operator[]` which are unchanged.
Two range-for loops rewritten as indexed loops (the buffer does not provide iterators -- intentional, see the design header).
`docs/algorithmic-details/elreal-performance-baseline.md` -- new headline-numbers table for the post-K.1 measurements. Phase I baseline numbers retained for the before/after comparison.
`docs/algorithmic-details/multi-component-arithmetic.md` -- section 7.1 picker rule updated: multiplication now favors elreal over `ereal<2>` at matched precision (was 1.2x ereal-favored, now 1.9x elreal-favored).

Headline numbers (gcc 13.3 on 12th Gen Intel i7-12700K)

Op	Phase I	Phase K.1	Speedup
elreal + d1	9 Mops/s	12 Mops/s	1.3x
elreal - d1	9 Mops/s	14 Mops/s	1.6x
elreal * d1	8 Mops/s	19 Mops/s	2.4x
elreal / d0	36 Mops/s	~1 Gops/s	dominated by inlining once heap alloc is gone
elreal sqrt d1	14 Mops/s	30 Mops/s	2.1x
elreal exp d1	14 Mops/s	31 Mops/s	2.2x
elreal log d1	14 Mops/s	24 Mops/s	1.7x

clang 18.1 shows even larger gains -- the Phase I clang-vs-gcc gap on multiplication (4 vs 8 Mops/s) is closed; both compilers now land at ~ 20 Mops/s.

Crossover with `ereal<2>` at matched precision

Addition and subtraction: ereal<2> still wins by 1.4-2x (was 2-2.7x). The remaining gap is the `std::function` generator capture, which Phase K.2 targets.
Multiplication: elreal now beats ereal<2> by 1.9x (was 1.2x ereal-favored). `ereal` multiplication is O(N) in the eager expansion product; `elreal *` is essentially a single `two_prod` plus the now-inline result envelope.
Division: not apples-to-apples (elreal depth-0 only until Phase L elreal Phase L: Depth-2+ arithmetic refinement -- Newton for / and sqrt #906).
sqrt / exp / log / pow / trig / hyperbolic: elreal-only (ereal has no math functions today).

Test plan

Builds clean under gcc 13.3 (`build_elreal/`)
Builds clean under clang 18.1 (`build_clang_elreal/`)
All 30 elreal regression tests PASS under both compilers (api, conversion, logic, arithmetic, math, geometry, oracle)
Phase J oracle sweep across dd / qd / dd_cascade / td_cascade / qd_cascade / ereal PASS under both compilers
`benchmark_elreal_performance` shows the documented gains under both compilers
CI fast tier green (gcc + clang lite)
CodeRabbit review

What this PR does NOT do

Eliminate the `std::function` generator capture. That's Phase K.2 -- the natural next target now that K.1 has shrunk the component allocation.
Eliminate the operand copies into the lambda capture. That's Phase K.3 (refcounted operand sharing).
Add SIMD/FMA batching to the EFTs. That's Phase K.4.

Part of #905 (Phase K of follow-up epic #903).

🤖 Generated with Claude Code

Summary by CodeRabbit

Documentation
- Updated performance baseline with Phase K.1 post-migration results, new post-K.1 tables, per-operator throughput comparisons, and clarified winner profiles and notes on compiler behaviour
- Revised arithmetic benchmark docs: new throughput table and narrative showing multiplication now favors elreal at matched precision while addition/subtraction still favor ereal<2>
Refactor
- Optimized component storage with a small-buffer optimization, improving throughput in key cases

Phase K.1 of follow-up epic #903. First sub-phase of Phase K (#905): allocator hot-path optimisation. The Phase I baseline identified per-operator vector allocation as the dominant cost; this PR closes that part of the gap. What landed ----------- - include/sw/universal/number/elreal/lazy_component_buffer.hpp -- new small-buffer storage type: 4-double inline array + spill via std::vector. 64 bytes total (one cache line). API surface is the minimum elreal needs: push_back, operator[], size, clear, reserve. - include/sw/universal/number/elreal/elreal_impl.hpp -- migrated _components from std::vector<double> to lazy_component_buffer. The components() accessor now returns const lazy_component_buffer&; the single external user in manipulators.hpp uses .size() and operator[] which are unchanged. - docs/algorithmic-details/elreal-performance-baseline.md -- new headline-numbers table for the post-K.1 measurements. Phase I baseline numbers retained for the before/after comparison. - docs/algorithmic-details/multi-component-arithmetic.md -- section 7.1 picker rule updated: multiplication now favors elreal over ereal<2> at matched precision (was 1.2x ereal-favored, now 1.9x elreal-favored). Headline numbers (gcc 13.3 on 12th Gen Intel i7-12700K) -------------------------------------------------------- | Op | Phase I | Phase K.1 | Speedup | |-----------------|--------:|----------:|--------:| | elreal + d1 | 9 Mops | 12 Mops | 1.3x | | elreal - d1 | 9 Mops | 14 Mops | 1.6x | | elreal * d1 | 8 Mops | 19 Mops | 2.4x | | elreal / d0 | 36 Mops | (gops) | dominated by inlining once heap alloc is gone | | elreal sqrt d1 | 14 Mops | 30 Mops | 2.1x | | elreal exp d1 | 14 Mops | 31 Mops | 2.2x | | elreal log d1 | 14 Mops | 24 Mops | 1.7x | clang 18.1 shows even larger gains (the Phase I clang-vs-gcc gap on multiplication, 4 vs 8 Mops/s, is closed -- both now land at ~ 20 Mops/s). The crossover with ereal<2> at matched precision: - Addition and subtraction: ereal<2> still wins by 1.4-2x (was 2-2.7x). - Multiplication: elreal now beats ereal<2> by 1.9x. - Division: not apples-to-apples (elreal depth-0 only until Phase L). - sqrt / exp / log / pow / trig / hyperbolic: elreal-only (ereal has no math functions). Validation ---------- All 30 elreal regression tests PASS under both gcc 13.3 and clang 18.1. Phase J oracle sweep across dd / qd / dd_cascade / td_cascade / qd_cascade / ereal<N> PASS under both compilers. What this does NOT do --------------------- - Eliminate the std::function generator capture. That's Phase K.2. - Eliminate the operand copies into the lambda capture. That's Phase K.3 (refcounted operand sharing). - Add SIMD/FMA batching to the EFTs. That's Phase K.4. Part of #905 (Phase K of the elreal follow-up epic #903). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-21T22:26:12Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 705a2a69-e0bd-40a3-b1f4-f7bc4aa73ff5

📥 Commits

Reviewing files that changed from the base of the PR and between 6cb2071 and 329c989.

📒 Files selected for processing (1)

include/sw/universal/number/elreal/lazy_component_buffer.hpp

📝 Walkthrough

Walkthrough

This PR implements Phase K.1: introduces a small-buffer-optimized lazy_component_buffer, migrates elreal's _components from std::vector to that buffer (updating code and APIs), and updates performance documentation with post-K.1 benchmark results and revised operator crossover narratives.

Changes

Phase K.1 _components Buffer Optimization

Layer / File(s)	Summary
Small-buffer container implementation `include/sw/universal/number/elreal/lazy_component_buffer.hpp`	Introduces `lazy_component_buffer`: hybrid inline (4 doubles) / spill (`std::vector<double>`) storage with constructors, `push_back`, `clear`, indexed `operator[]`, `size`/`empty`, and `reserve`.
elreal type migration to lazy_component_buffer `include/sw/universal/number/elreal/elreal_impl.hpp`	Updates elreal to use `lazy_component_buffer` for `_components`: adds include, changes member and `components()` accessor types, updates documentation comments, and rewrites `operator-()` to iterate by `size()`/`operator[]`.
Post-K.1 performance documentation `docs/algorithmic-details/elreal-performance-baseline.md`, `docs/algorithmic-details/multi-component-arithmetic.md`	Refreshes baseline and Phase I docs to include post-K.1 measurements and per-operator throughput deltas, revises winner profile (elreal now wins on multiplication at matched precision), and marks the `_components` small-buffer optimisation as DONE while reordering remaining allocation-phase items.

Sequence Diagram(s)

sequenceDiagram
  participant Caller
  participant elreal
  participant lazy_component_buffer as buffer
  Caller->>elreal: components()
  elreal->>buffer: return const ref
  Caller->>buffer: size()
  buffer-->>Caller: size
  Caller->>buffer: operator[](i)
  buffer-->>Caller: double value (inline or spill)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

Epic: elreal follow-up work -- depth-2+ refinement, range reduction, perf, oracle sweep #903: Directly related to the Phase K allocator hot-path optimisation implemented here.
elreal Phase K: Allocator hot-path optimisation -- close the elreal-vs-ereal gap #905: Also targets the Phase K.1 small-buffer optimisation and the elreal _components migration.

Possibly related PRs

stillwater-sc/universal#883: Earlier elreal skeleton changes touching _components representation; related to this migration.
stillwater-sc/universal#893: Adds elreal::from_expansion(...) that populates components; relevant to the new buffer representation.
stillwater-sc/universal#885: Implements unary negation and related ring ops in elreal_impl.hpp, interacting with the updated component iteration.

Poem

🐰 A clever buffer snug and small,
Four doubles inline, then vector for all,
Phase K.1 hops through components with glee,
Multiplication speeds up — hooray for me!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 41.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately and specifically describes the main change: implementing Phase K.1 of elreal optimization by replacing std::vector with a small-buffer-optimized lazy_component_buffer for _components storage.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/elreal-phase-k1

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

include/sw/universal/number/elreal/lazy_component_buffer.hpp (1)

96-98: ⚡ Quick win

Add a debug bounds check in operator[].

This accessor currently allows silent UB on stale/out-of-range indexes. A debug-only assert keeps zero-cost release behavior while catching misuse quickly.

Proposed patch

+#include <cassert>
 ...
 	double operator[](std::size_t i) const noexcept {
+		assert(i < _size && "lazy_component_buffer index out of range");
 		return (i < inline_capacity) ? _inline[i] : _spill[i - inline_capacity];
 	}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@include/sw/universal/number/elreal/lazy_component_buffer.hpp` around lines 96
- 98, Add a debug-only bounds check to the const subscript operator[] to prevent
silent UB: inside double operator[](std::size_t i) const noexcept (the function
using inline_capacity, _inline and _spill) assert that i is less than the
current element count (e.g., assert(i < size()) or assert(i < _size) depending
on the class member that tracks length); include <cassert> and keep the function
noexcept so this only affects debug builds and preserves zero-cost release
behavior.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@include/sw/universal/number/elreal/lazy_component_buffer.hpp`:
- Around line 96-98: Add a debug-only bounds check to the const subscript
operator[] to prevent silent UB: inside double operator[](std::size_t i) const
noexcept (the function using inline_capacity, _inline and _spill) assert that i
is less than the current element count (e.g., assert(i < size()) or assert(i <
_size) depending on the class member that tracks length); include <cassert> and
keep the function noexcept so this only affects debug builds and preserves
zero-cost release behavior.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5913b625-0ed2-46c0-940f-fbea0e2d795d

📥 Commits

Reviewing files that changed from the base of the PR and between 0571b64 and 6cb2071.

📒 Files selected for processing (4)

docs/algorithmic-details/elreal-performance-baseline.md
docs/algorithmic-details/multi-component-arithmetic.md
include/sw/universal/number/elreal/elreal_impl.hpp
include/sw/universal/number/elreal/lazy_component_buffer.hpp

Add debug-only bounds check in lazy_component_buffer::operator[]. CodeRabbit flagged the missing assert as a quick win: zero-cost in release (NDEBUG strips the assert), catches misuse early in debug builds. Applied directly per the proposed patch. Validation: all sampled elreal tests PASS under gcc 13.3 and clang 18.1 with the assert added (the invariant `i < size()` already held across all elreal indexing sites; the assert just makes it explicit). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Ravenwater · 2026-05-21T22:33:54Z

Addressed the single CodeRabbit nitpick in 329c989: added assert(i < _size && "lazy_component_buffer index out of range"); to operator[] along with #include <cassert>. Zero-cost in release (NDEBUG strips it), catches misuse in debug builds.

The invariant i < _components.size() already held across all elreal indexing sites (every _components[i] call is guarded by a check against _computed_depth, which is kept equal to _components.size()). The assert just makes it explicit so future contributors get an early signal if they break the invariant.

Validation: all sampled elreal tests (api, arithmetic, math, constants, oracle sweep) PASS under both gcc 13.3 and clang 18.1.

coveralls · 2026-05-21T23:09:38Z

Coverage Report for CI Build 26257273704

Coverage increased (+0.01%) to 84.245%

Details

Coverage increased (+0.01%) from the base build.
Patch coverage: No coverable lines changed in this PR.
2 coverage regressions across 1 file.

Uncovered Changes

No uncovered changes found.

Coverage Regressions

2 previously-covered lines in 1 file lost coverage.

File	Lines Losing Coverage	Coverage
include/sw/universal/number/posito/posito_impl.hpp	2	89.78%

Coverage Stats


Relevant Lines:	55729
Covered Lines:	46949
Line Coverage:	84.25%
Coverage Strength:	5277739.23 hits per line

💛 - Coveralls

…std::function Phase K.2 of follow-up epic #903. The Phase I baseline identified the per-op std::function heap allocation (~216-byte capture per binary op) as a major cost. K.1 (#912) closed the _components vector alloc; this PR closes the _generator function-object alloc. Design (#905 K.2 + K.3 combined) -------------------------------- elreal's _generator field migrates from `std::function<double(std::size_t)>` to a `std::variant` of small POD shapes: gen_unary_linear (1 handle + 1 double = 24 bytes) coeff * a.at(1) at depth 1; covers all 19 math functions gen_binary_linear (2 handles + 3 doubles = 56 bytes) c0 + ca*a.at(1) + cb*b.at(1); covers +, -, *, pow, atan2 gen_sqrt (1 handle + 1 double + 1 size_t = 32 bytes) sqrt-specific EFT residual gen_unary_neg (1 handle = 16 bytes) trampoline through wrapped.at(k) gen_rational_residual (3 doubles = 24 bytes) elreal(p, q) constructor residual std::monostate (0 bytes) default; depth-0-only result Operand captures are std::shared_ptr<const elreal> (16 bytes each) so the variant fits comfortably inline. No std::function, no heap allocation per generator state. Files ----- - include/sw/universal/number/elreal/elreal_data.hpp -- NEW. Variant alternative types + lazy_generator alias + evaluate_generator forward declaration. - include/sw/universal/number/elreal/elreal_impl.hpp -- _generator migrated. All ~25 operator/math function generators rewritten as variant emplacements. at() walks via evaluate_generator. The evaluator is defined after the class so each alternative can call elreal::at() on its operand handle. - docs/algorithmic-details/elreal-performance-baseline.md -- K.2 numbers added. K.1 numbers retained for the before/after comparison. - docs/algorithmic-details/multi-component-arithmetic.md -- section 7.1 picker rule updated: arithmetic gap with ereal<2> has nearly closed. Results ------- 12th Gen Intel i7-12700K, gcc 13.3, -O3: | Op | Phase I | K.1 | K.2 | vs Phase I | |---------------|--------:|--------:|--------:|-----------:| | elreal + d1 | 9 Mops | 12 Mops | 17 Mops | 1.9x | | elreal - d1 | 9 Mops | 14 Mops | 17 Mops | 1.9x | | elreal * d1 | 8 Mops | 19 Mops | 16 Mops | 2.0x | | elreal sqrt | 14 Mops | 30 Mops | 36 Mops | 2.6x | | elreal exp | 14 Mops | 31 Mops | 43 Mops | 3.1x | | elreal log | 14 Mops | 24 Mops | 38 Mops | 2.7x | Versus ereal<2> at matched precision: + (17 vs 19, ereal +12%), - (17 vs 20, ereal +18%), * (16 vs 11, elreal +45%). The arithmetic gap with ereal<2> on +/- has shrunk from 2x at K.1 to within ~20% at K.2. Multiplication still favors elreal. Validation ---------- All 30 elreal regression tests PASS under gcc 13.3 and clang 18.1. Phase J oracle sweep across dd / qd / dd_cascade / td_cascade / qd_cascade / ereal<N> PASS under both compilers. No public API changes (elreal looks the same to existing callers). What this does NOT do --------------------- - Convert elreal itself to a shared-ptr handle (would shrink sizeof(elreal) from 104 to 16 bytes at the cost of an additional indirection on every access). Trade-off not clearly worth it given the current numbers; deferred. - Address per-op make_shared<const elreal>(a) operand wrapping. Each binary op still pays 2 small heap allocs (one per operand). On modern allocators with small-object pools this is typically cheap; measured throughput is excellent. - Phase K.4 (SIMD batching on EFT primitives). Part of #905 (Phase K of the elreal follow-up epic #903). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…std::function (#916) * fix(cmake): support FetchContent in dependent repo workflows * perf(elreal): Phase K.2 -- tagged-union generator (variant) replaces std::function Phase K.2 of follow-up epic #903. The Phase I baseline identified the per-op std::function heap allocation (~216-byte capture per binary op) as a major cost. K.1 (#912) closed the _components vector alloc; this PR closes the _generator function-object alloc. Design (#905 K.2 + K.3 combined) -------------------------------- elreal's _generator field migrates from `std::function<double(std::size_t)>` to a `std::variant` of small POD shapes: gen_unary_linear (1 handle + 1 double = 24 bytes) coeff * a.at(1) at depth 1; covers all 19 math functions gen_binary_linear (2 handles + 3 doubles = 56 bytes) c0 + ca*a.at(1) + cb*b.at(1); covers +, -, *, pow, atan2 gen_sqrt (1 handle + 1 double + 1 size_t = 32 bytes) sqrt-specific EFT residual gen_unary_neg (1 handle = 16 bytes) trampoline through wrapped.at(k) gen_rational_residual (3 doubles = 24 bytes) elreal(p, q) constructor residual std::monostate (0 bytes) default; depth-0-only result Operand captures are std::shared_ptr<const elreal> (16 bytes each) so the variant fits comfortably inline. No std::function, no heap allocation per generator state. Files ----- - include/sw/universal/number/elreal/elreal_data.hpp -- NEW. Variant alternative types + lazy_generator alias + evaluate_generator forward declaration. - include/sw/universal/number/elreal/elreal_impl.hpp -- _generator migrated. All ~25 operator/math function generators rewritten as variant emplacements. at() walks via evaluate_generator. The evaluator is defined after the class so each alternative can call elreal::at() on its operand handle. - docs/algorithmic-details/elreal-performance-baseline.md -- K.2 numbers added. K.1 numbers retained for the before/after comparison. - docs/algorithmic-details/multi-component-arithmetic.md -- section 7.1 picker rule updated: arithmetic gap with ereal<2> has nearly closed. Results ------- 12th Gen Intel i7-12700K, gcc 13.3, -O3: | Op | Phase I | K.1 | K.2 | vs Phase I | |---------------|--------:|--------:|--------:|-----------:| | elreal + d1 | 9 Mops | 12 Mops | 17 Mops | 1.9x | | elreal - d1 | 9 Mops | 14 Mops | 17 Mops | 1.9x | | elreal * d1 | 8 Mops | 19 Mops | 16 Mops | 2.0x | | elreal sqrt | 14 Mops | 30 Mops | 36 Mops | 2.6x | | elreal exp | 14 Mops | 31 Mops | 43 Mops | 3.1x | | elreal log | 14 Mops | 24 Mops | 38 Mops | 2.7x | Versus ereal<2> at matched precision: + (17 vs 19, ereal +12%), - (17 vs 20, ereal +18%), * (16 vs 11, elreal +45%). The arithmetic gap with ereal<2> on +/- has shrunk from 2x at K.1 to within ~20% at K.2. Multiplication still favors elreal. Validation ---------- All 30 elreal regression tests PASS under gcc 13.3 and clang 18.1. Phase J oracle sweep across dd / qd / dd_cascade / td_cascade / qd_cascade / ereal<N> PASS under both compilers. No public API changes (elreal looks the same to existing callers). What this does NOT do --------------------- - Convert elreal itself to a shared-ptr handle (would shrink sizeof(elreal) from 104 to 16 bytes at the cost of an additional indirection on every access). Trade-off not clearly worth it given the current numbers; deferred. - Address per-op make_shared<const elreal>(a) operand wrapping. Each binary op still pays 2 small heap allocs (one per operand). On modern allocators with small-object pools this is typically cheap; measured throughput is excellent. - Phase K.4 (SIMD batching on EFT primitives). Part of #905 (Phase K of the elreal follow-up epic #903). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * perf(elreal): address CodeRabbit review on PR #916 Three findings, all valid: 1. elreal-performance-baseline.md mixed pre-K.1 narrative with K.2 current numbers. Restructured the "Reading the table" and "When is elreal faster than ereal" sections to reflect K.2-current state. Moved the pre-K.1 analysis into a new "Historical: Phase I baseline analysis (pre-K.1)" section for archival reference. 2. multi-component-arithmetic.md section 7.1 heading said "Phase I baseline" but the table held post-K.2 numbers. Updated heading to "post-Phase-K.2" and rewrote the body paragraph that referenced "Phase II of epic #873 targets the allocation hot path" -- K.1 and K.2 have shipped, so the picker rule conclusion is now "elreal is throughput-competitive AND exposes correctness features ereal<N> lacks". 3. fetch-content-architecture.md contained em-dashes (U+2014) that violate the project's ASCII-only rule. Replaced with -- per the convention used elsewhere in the docs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…#919) Two related changes: 1. The lazy_component_buffer member `_inline` collides with the MSVC- specific `_inline` reserved keyword (a legacy alias for `inline` that MSVC still treats specially in some parses). Symptom: MSVC's parser sees `_inline{}` in the constructor's member-initializer list and interprets it as `inline {}`, cascading into a string of C2059 syntax errors. gcc and clang accept the original name. Renamed to `_inl_buf` (the inline buffer) -- non-reserved, descriptive, mechanical search/replace. 2. UNIVERSAL_BUILD_CI_LITE (the lite CI tier used by the Windows MSVC CI runner) did not enable UNIVERSAL_BUILD_NUMBER_ELREALS. As a result the K.1 bug above shipped in PR #912 and stayed undetected on main. Added ELREALS to CI_LITE so subsequent MSVC-specific issues in elreal land at PR-review time, not after merge. Background ---------- The CI matrix in `.github/workflows/cmake.yml` separates: - Linux gcc + clang: UNIVERSAL_BUILD_CI=ON (full) - Windows MSVC, macOS, cross-compilers: UNIVERSAL_BUILD_CI_LITE=ON The lite tier exists to keep the Windows runner fast (~ 8 min vs ~ 25 min for full). It includes one representative type from each major category: integer, fixpnt, cfloat, posit, lns, one cascade, the MX block formats, one elastic (einteger). It did NOT include the lazy elreal -- the recent multi-PR follow-up epic #903 grew elreal but the CI matrix wasn't updated to track it. MinGW cross-compilation, also in the CI matrix, uses gcc as the compiler (just targeting Windows) and so doesn't surface the MSVC `_inline` keyword issue. Only the actual MSVC build catches it. Validation ---------- Local gcc 13.3 and clang 18.1 builds pass after the rename (behavior unchanged; just a member-name swap). The CI matrix change adds elreal coverage to the Windows MSVC runner, which would have caught this bug at the K.1 (#912) PR-review stage. Fixes the MSVC build break observed at HEAD of main as of 2026-05-22.

Ravenwater self-assigned this May 21, 2026

Ravenwater added this to Universal Number Library May 21, 2026

Ravenwater added this to the V4 milestone May 21, 2026

Ravenwater added the enhancement label May 21, 2026

coderabbitai Bot reviewed May 21, 2026

View reviewed changes

Ravenwater marked this pull request as ready for review May 21, 2026 22:40

Ravenwater merged commit 9eddae1 into main May 21, 2026
33 checks passed

github-project-automation Bot moved this to Done in Universal Number Library May 21, 2026

Ravenwater deleted the feat/elreal-phase-k1 branch May 21, 2026 23:12

Ravenwater mentioned this pull request May 22, 2026

perf(elreal): Phase K.2 -- tagged-union generator (variant) replaces std::function #916

Merged

7 tasks

Ravenwater mentioned this pull request May 22, 2026

fix(elreal): rename _inline member to _inl_buf for MSVC compatibility #919

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(elreal): Phase K.1 -- small-buffer optimisation on _components#912

perf(elreal): Phase K.1 -- small-buffer optimisation on _components#912
Ravenwater merged 2 commits into
mainfrom
feat/elreal-phase-k1

Ravenwater commented May 21, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 21, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Ravenwater commented May 21, 2026

Uh oh!

coveralls commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Ravenwater commented May 21, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What landed

Headline numbers (gcc 13.3 on 12th Gen Intel i7-12700K)

Crossover with `ereal<2>` at matched precision

Test plan

What this PR does NOT do

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Ravenwater commented May 21, 2026

Uh oh!

coveralls commented May 21, 2026

Coverage Report for CI Build 26257273704

Coverage increased (+0.01%) to 84.245%

Details

Uncovered Changes

Coverage Regressions

Coverage Stats

💛 - Coveralls

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Ravenwater commented May 21, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 21, 2026 •

edited

Loading