Re-land 'Refactor float8 conversion logic.' #349
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Re-land 'Refactor float8 conversion logic.'
Original message:
This change completely rewrites the float8 conversion template to a more structured approach. It now explicitly handles special values (Inf/NaN), zero, unpacks and normalizes the input, calculates target parameters, applies rounding and shifting, renormalizes, and packs the result, including overflow and sign handling. The logic for handling subnormals and different exponent ranges between From and To types is integrated into this new flow.
Roll-forward explanation:
This CL fixes a spurious
RuntimeWarning: invalid value encountered in castobserved in NumPy during int32 to float8 conversions.The warning was caused by LLVM auto-vectorizing the integer shift logic in
RoundBitsToNearestEven, specifically the(Bits{1} << (roundoff - 1))calculation. The compiler generated a sequence using the floating-point unit (e.g.,
vcvttps2dq) to perform this shift. Crucially, this logic was hoisted out of a branch (that ensuredroundoff - 1would be a valid shift amount) and executed speculatively.For inputs where
roundoffwould be large, the intermediate floating-point representation of2**(roundoff-1)exceeded the representable integer range. Whenvcvttps2dqattempted to convert this back to an integer, it raised theFE_INVALIDfloating-point exception.Because
#pragma STDC FENV_ACCESSis not enabled, the compiler is permitted to perform such optimizations, unaware that the side effects (sticky flags) would be caught by NumPy's use offetestexcept.The fix caps the
alignment_shift(and thusroundoff) to a safe upper bound, ensuring that even if the shift logic is hoisted, the operands remain within a range that does not trigger floating-point exceptions.