Skip to content

Conversation

@copybara-service
Copy link

Re-land 'Refactor float8 conversion logic.'

Original message:
This change completely rewrites the float8 conversion template to a more structured approach. It now explicitly handles special values (Inf/NaN), zero, unpacks and normalizes the input, calculates target parameters, applies rounding and shifting, renormalizes, and packs the result, including overflow and sign handling. The logic for handling subnormals and different exponent ranges between From and To types is integrated into this new flow.

Roll-forward explanation:
This CL fixes a spurious RuntimeWarning: invalid value encountered in cast observed in NumPy during int32 to float8 conversions.

The warning was caused by LLVM auto-vectorizing the integer shift logic in
RoundBitsToNearestEven, specifically the (Bits{1} << (roundoff - 1))
calculation. The compiler generated a sequence using the floating-point unit (e.g., vcvttps2dq) to perform this shift. Crucially, this logic was hoisted out of a branch (that ensured roundoff - 1 would be a valid shift amount) and executed speculatively.

For inputs where roundoff would be large, the intermediate floating-point representation of 2**(roundoff-1) exceeded the representable integer range. When vcvttps2dq attempted to convert this back to an integer, it raised the FE_INVALID floating-point exception.

Because #pragma STDC FENV_ACCESS is not enabled, the compiler is permitted to perform such optimizations, unaware that the side effects (sticky flags) would be caught by NumPy's use of fetestexcept.

The fix caps the alignment_shift (and thus roundoff) to a safe upper bound, ensuring that even if the shift logic is hoisted, the operands remain within a range that does not trigger floating-point exceptions.

Original message:
This change completely rewrites the float8 conversion template to a more structured approach. It now explicitly handles special values (Inf/NaN), zero, unpacks and normalizes the input, calculates target parameters, applies rounding and shifting, renormalizes, and packs the result, including overflow and sign handling. The logic for handling subnormals and different exponent ranges between From and To types is integrated into this new flow.

Roll-forward explanation:
This CL fixes a spurious `RuntimeWarning: invalid value encountered in cast` observed in NumPy during int32 to float8 conversions.

The warning was caused by LLVM auto-vectorizing the integer shift logic in
`RoundBitsToNearestEven`, specifically the `(Bits{1} << (roundoff - 1))`
 calculation. The compiler generated a sequence using the floating-point unit (e.g., `vcvttps2dq`) to perform this shift. Crucially, this logic was hoisted out of a branch (that ensured `roundoff - 1` would be a valid shift amount) and executed speculatively.

For inputs where `roundoff` would be large, the intermediate floating-point representation of `2**(roundoff-1)` exceeded the representable integer range. When `vcvttps2dq` attempted to convert this back to an integer, it raised the `FE_INVALID` floating-point exception.

Because `#pragma STDC FENV_ACCESS` is not enabled, the compiler is permitted to perform such optimizations, unaware that the side effects (sticky flags) would be caught by NumPy's use of `fetestexcept`.

The fix caps the `alignment_shift` (and thus `roundoff`) to a safe upper bound, ensuring that even if the shift logic is hoisted, the operands remain within a range that does not trigger floating-point exceptions.

PiperOrigin-RevId: 847923342
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant