Zubax Kulibin Float (ZKF) -- a compact IEEE 754-like simple floating point library#4
Draft
pavel-kirienko wants to merge 242 commits into
Draft
Zubax Kulibin Float (ZKF) -- a compact IEEE 754-like simple floating point library#4pavel-kirienko wants to merge 242 commits into
pavel-kirienko wants to merge 242 commits into
Conversation
The shared _zkf_cordic_m* table exposes inv_tau/kinv_mag/kinv for atan2's use; zkf_sincos (MODE=0) does not consume them. Leaving them unconnected is fine for Icarus but verilator treats the missing pins as fatal (Warning-PINMISSING), failing the sincos-verilator PR matrix. Tie them off with explicit empty connections, matching the existing .busy() style. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
zkf_atan2: the generic angle unmap emitted the out-of-range -1/2 turn (sign bit set) for finite x<0, |y|->0, but the documented theta range is the half-open (-0.5, +0.5]. Fold -1/2 to the canonical +1/2 (mirrors turn8's k==4 / _atan2_turn frac==1/2 clamp). Timing-neutral: an exponent-only detect + sign-bit-only fold keeps the wide magnitude datapath a plain mux into the registered output (atan2_w8m36 stays at ~111 MHz). Mirrored in atan2_reference/atan2_true, which shared the defect via a false "no endpoint clamp needed" assumption. Remaining items from the float HDL audit: - Harden exp2/log2/sincos/atan2 _true mpmath oracles to high working precision; the default mis-rounds the last bit at WMAN >= 32. - Enforce the <=1 ULP faithful-rounding contract in CI: new verify-float-accuracy target (zkf_transcendental/zkf_trig --check, reference vs _true) as the float-accuracy deep job. The per-PR cocotb suite only checks RTL==reference (a shared oracle). - Coverage: log2 STAGE_NORMALIZE_OUTPUT, from_int @WMAN53 + STAGE_NORMALIZE=2, div/resize/round @WMAN48, directed atan2 negative-x-axis (both y signs) and div inf/0. - Docs: hypot overflow-boundary wording, to_int knob/guard note, zkf_div_eq pipeline-depth prose, zkf_matrix sno comment; drop the vestigial lod_reference model helper. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
CI verify-deep on 55722d0 surfaced two issues in the just-added atan2 work: - float-accuracy (zkf_trig --check) reported theta_ulp=28671 at 5/11 for a near-(-0.5)-axis pair: the reference rounds the magnitude to exactly 1/2 and canonicalizes to +0.5 while the mpmath oracle keeps -0.5+1ULP. The two are 1 ULP apart on the angle circle, but the linear signed-magnitude metric maps +0.5 and -0.5 to opposite ends. Add _theta_ulp_diff, a circular metric (min(d, ring-d), ring = 2*index(+0.5)) used for atan2 THETA only; sin/cos and hypot magnitude keep the linear metric. A real >1 ULP boundary error still measures its true (small) distance, so only the spurious wrap collapses. Verified: the 5/11 gate now passes (theta_ulp=1 over 1,032,313 pairs). - The new constant half_pos net added uncovered toggle points to the deep verilator coverage gate; wrap it in // verilator coverage_off like the module's other constant cones. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The CI toolchain image is upgraded to 2026-06-14 (Verilator 5.048), which instruments generate-block and wide-datapath toggle/branch coverage the prior 5.032 build skipped. Close the resulting deep coverage-float-gate-full gap by preferring bona-fide coverage over suppression. Bona-fide coverage (deep matrix, float/tb/zkf_matrix.py): - exp2/log2 w8m36 across STAGE_PRODUCT=2/3/4 fill the shared _zkf_pmul DSP-grid reduction trees (csum/r_p full-width product sums) and widen the significands so the hidden-bit MSBs become ordinary toggling bits. - exp2 STAGE_REDUCE=1 at the wide WEXP drives the range-reduction lost-sticky path (r0_lost_sticky). - atan2 w8m24 turns the divider/shamt/significand/magnitude bits into ordinary mid-bits instead of leaving them above the w5m11 format ceiling. - sincos w8m24 plus a PARALLEL=1 row cover the wide CORDIC datapath, the decoupled sigma-replay engine, and the phi_seen fast-skip branch. This adds ~12.8k bona-fide-covered toggle points. Honest structural suppression (residual only), each per-net justified: - per-row/column partial-product accumulators bounded below WP at any format (the full-sum csum/r_p are covered, not suppressed); - sign/hidden bits at the widest covered format, engine constants, the phase FSM top state bit, accumulator/divider/CORDIC datapath headroom; - the back-pressure output-hold path (the streaming matrix holds out_ready high); - config-fixed phi/MODE branches and a few constant-sigma replay slots. CI: verify.yml / verify-deep.yml -> ghcr image 2026-06-14. #ci-float Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lost_sticky Two deep-tier follow-ups to the 5.048 coverage closure (the prior commit's float-extended-verilator coverage gate already passes; these fix float-accuracy and the per-PR line gate). float-accuracy: the atan2 faithful-rounding check reported theta_ulp=1024 on a large-ratio input -- a tiny angle that one side rounds to +-MIN_NORMAL and the other to 0. _ordered_index counted the no-subnormal gap between 0 and the smallest normal as a full binade (1<<wfrac) rather than one representable step, so a genuine <=1 ULP straddle of the zero/underflow boundary measured 1024. Make _ordered_index a dense rank (adjacent representable values one apart); the +-0.5 circular wrap and genuine >1-ULP straddles are unchanged. per-PR coverage-float-gate: zkf_exp2.v r0_lost_sticky's line is uncovered under Verilator 5.048 -- at the only STAGE_REDUCE=1 coverage format (w2m16, WEXP<=4) lost_sticky is structurally 0, so its registered store folds to a constant the line gate cannot reach. Re-suppress the registered copy (eval_lost_sticky stays covered and the logic is verified alive by a directed sim) and drop the now purposeless wide STAGE_REDUCE=1 grid rows. #ci-float Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…#ci-float
The hard toggle-coverage gate proved unsustainable: keeping it green required
262 `// verilator coverage_off`/`coverage_on` suppressions littering float/hdl/.
Line and branch coverage stay mandatory exactly as before; toggle coverage is
demoted to advisory -- reported during RTL development but never fatal.
- zkf_coverage.py: --full now gates on v_line + v_branch only; v_toggle is
reported as [advisory] and never returns nonzero. Docstrings/help updated.
- Makefile: deep-gate comment block updated to line+branch mandatory / toggle
advisory.
- zkf_trig.py, zkf_transcendental.py: stop emitting the blanket coverage_off
wrapper around generated tables; regenerated all 25 _tables/*.v.
- float/hdl/*.v: removed all 262 toggle-driven coverage_off pragmas and their
justification comments (net -744 lines). Re-introduced 14 narrow blocks for
the genuinely-unreachable line/branch points only:
* 8 deep line/branch breakers (dead default arms, MODE-fixed ternaries,
elaboration-only constant functions, small-W overshoot, r0_lost_sticky).
* 6 per-PR-only line breakers (unused sideband tie-offs, backpressure-hold
registers, deep-only force_zero path).
- zkf_matrix.py: simplified a stale sincos comment that referenced removed
pragma line numbers.
RTL logic is byte-identical modulo comments (one benign cd_xn/cd_yn
declaration re-merge). Lint clean; Verilator-parsed all touched modules
(no BADVLTPRAGMA).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…#ci-float The per-PR `--gate` was failing on `_zkf_fixed_to_float.v:45` (the `force_zero` control input) even though every per-PR line was covered. Root cause: `--gate` gated on the merged LCOV info, which is ptype-blind -- `verilator_coverage --write-info` collapses line/branch/toggle into one DA record per source line. force_zero is never asserted in the per-PR matrix, so its isolated *toggle* point (line 45 has no line point of its own) showed as a 0-hit DA record and failed the line gate. This ptype-blindness is exactly what forced the old `// verilator coverage_off` sprawl: toggle points had to be suppressed to keep the "line" gate green. Fix: gate the per-PR tier on ptype-aware v_line points only (the same merged_points data the --full tier already uses), consistent with the demotion of toggle to advisory. Branch stays mandatory at the deep tier; toggle is advisory everywhere and never gated. With the gate no longer ptype-blind, the 6 per-PR `coverage_off` wraps added in the previous commit are unnecessary -- all 6 suppressed toggle-only points (unused sideband tie-offs and backpressure-hold register declarations, none of which carry a v_line point, verified by building each module with --coverage-line). Removed them; the 8 deep line/branch breakers remain. Removed the now-dead ptype-blind `uncovered_lines()` helper. RTL logic is byte-identical (comment-only deltas). Verified: both gates RC=0 against the 163-config deep coverage set (0 line, 0 branch, 4462 toggle advisory); lint clean; no BADVLTPRAGMA. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Follow-up review caught three stale doc spots that still asserted the old toggle-mandatory / ptype-blind behavior after the gate semantics changed: - zkf_coverage.py module docstring: --gate is per-PR line-only (not "historical"), --full is line+branch mandatory with toggle advisory (not "line, branch, OR toggle"). - zkf_coverage.py HTML report empty-state: "every line and branch covered (toggle advisory)" instead of implying toggle is gated. - Makefile: the verify-float-extended block said "line+branch+toggle coverage closure"; toggle is advisory, so "line+branch coverage closure (toggle advisory)". Docs/comments only; no logic or RTL change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The verify-deep workflow took ~2h41m, gated by two near-equal poles: float-extended-icarus (161m, 789 configs at 2-way xdist on a 2-core runner) and verify-base (155m), which ran in parallel. - Shard float-extended-icarus across 4 runners via strategy.matrix. The Icarus sweep is pure pass/fail correctness with no coverage artifacts, so it shards cleanly with no cross-job merge. New FLOAT_SHARD='k/N' env in test_float_matrix.py keeps configs whose stable md5(run.id)%N == k-1; unset keeps everything (bare `make verify` unchanged). Stable hash so every shard process partitions identically -- shards exactly tile the matrix (verified 196+208+172+213 = 789). ~161m -> ~40m; bump to 6 shards to reach the ~30m float-accuracy floor. - Remove the verify-base job. It re-ran the exact `make lint && make verify` that verify.yml already runs on the same push, so it was pure duplication (155m). The per-PR baseline (lint + sims + line-coverage gate) now lives only in verify.yml. Note: a manual `workflow_dispatch` of verify-deep no longer runs that baseline; normal #ci-float/main pushes still get it from verify.yml. Expected deep wall-clock: ~2h41m -> ~40m (next pole float-accuracy at ~30m). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
https://zubax.github.io/fpga-floating-point-eval/report/