Skip to content

[IR][AsmParser] Revamp how floating-point literals work in LLVM IR. #121838

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 41 additions & 30 deletions llvm/docs/LangRef.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4595,11 +4595,13 @@ Simple Constants
zeros. So '``s0x0001``' of type '``i16``' will be -1, not 1.
**Floating-point constants**
Floating-point constants use standard decimal notation (e.g.
123.421), exponential notation (e.g. 1.23421e+2), or a more precise
hexadecimal notation (see below). The assembler requires the exact
decimal value of a floating-point constant. For example, the
assembler accepts 1.25 but rejects 1.3 because 1.3 is a repeating
decimal in binary. Floating-point constants must have a
123.421), exponential notation (e.g. 1.23421e+2), standard hexadecimal
notation (e.g., 0x1.3effp-43), one of several special values, or a
precise bitstring for the underlying value. When converting decimal and
hexadecimal literals to the floating-point type, the value is converted
using the default rounding mode (round to nearest, half to even). String
conversions that underflow to 0 or overflow to infinity are not permitted.
Comment on lines +4602 to +4603
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allowing rounding seems nice. Any particular reason not to allow overflow/underflow? Just being conservative?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main motivation was to keep the parsing code as strict as possible, so that if you saw a constant in the code, you could be certain you knew what it was. Despite the existing documentation, we already allow inexact conversions from decimal strings to double (we check for exactness on conversion of the resulting double to the actual type, though).

There's an argument to be made that allowing 0.1 as a constant, even if we didn't already allow it; I don't see a strong argument for allowing 1e99999 or 1e-99999 when we have easy syntax for infinity already.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always thought the parser behavior was ensuring the constant was exact with no rounding. If it wasn't doing that, I'd consider it to be a bug and we shouldn't be more lax

Floating-point constants must have a
:ref:`floating-point <t_floating>` type.
**Null pointer constants**
The identifier '``null``' is recognized as a null pointer constant
Expand All @@ -4608,31 +4610,40 @@ Simple Constants
The identifier '``none``' is recognized as an empty token constant
and must be of :ref:`token type <t_token>`.

The one non-intuitive notation for constants is the hexadecimal form of
floating-point constants. For example, the form
'``double 0x432ff973cafa8000``' is equivalent to (but harder to read
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you removing this old double 0x432ff973cafa8000 syntax? So is this change not backwards compatible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original parsing code is still being kept, which is why this patch can go in without a few hundred test changes.

The documentation I removed partially because it's deprecated, partially because it's flat out wrong, and partially because describing the correct behavior is annoying (e.g., 0xM and 0xL doesn't work like the documentation suggests).

than) '``double 4.5e+15``'. The only time hexadecimal floating-point
constants are required (and the only time that they are generated by the
disassembler) is when a floating-point constant must be emitted but it
cannot be represented as a decimal floating-point number in a reasonable
number of digits. For example, NaN's, infinities, and other special
values are represented in their IEEE hexadecimal format so that assembly
and disassembly do not cause any bits to change in the constants.

When using the hexadecimal form, constants of types bfloat, half, float, and
double are represented using the 16-digit form shown above (which matches the
IEEE754 representation for double); bfloat, half and float values must, however,
be exactly representable as bfloat, IEEE 754 half, and IEEE 754 single
precision respectively. Hexadecimal format is always used for long double, and
there are three forms of long double. The 80-bit format used by x86 is
represented as ``0xK`` followed by 20 hexadecimal digits. The 128-bit format
used by PowerPC (two adjacent doubles) is represented by ``0xM`` followed by 32
hexadecimal digits. The IEEE 128-bit format is represented by ``0xL`` followed
by 32 hexadecimal digits. Long doubles will only work if they match the long
double format on your target. The IEEE 16-bit format (half precision) is
represented by ``0xH`` followed by 4 hexadecimal digits. The bfloat 16-bit
format is represented by ``0xR`` followed by 4 hexadecimal digits. All
hexadecimal formats are big-endian (sign bit at the left).
Floating-point constants support the following kinds of strings:

+----------------+---------------------------------------------------+
| Syntax | Description |
+================+===================================================+
| ``+4.5e-13`` | Common decimal literal. Signs are optional, as is |
| | the exponent portion. The decimal point is |
| | required, as is one or more leading digits before |
| | the decimal point. |
+----------------+---------------------------------------------------+
| ``-0x1.fp13`` | Common hexadecimal literal. Signs are optional. |
| | The decimal point is required, as is the exponent |
| | portion of the literal (after the ``p``). |
+----------------+---------------------------------------------------+
| ``+inf``, | Positive or negative infinity. The sign is |
| ``-inf`` | required. |
+----------------+---------------------------------------------------+
| ``+qnan``, | Positive or negative preferred quiet NaN, i.e., |
| ``-qnan`` | the quiet bit is set, and all other payload bits |
| | are 0. The sign is required. |
+----------------+---------------------------------------------------+
| ``+nan(0x1)`` | qNaN value with a particular payload, specified |
| | as hexadecimal (not including the quiet bit as |
| | part of the payload). The sign is required. |
+----------------+---------------------------------------------------+
| ``+snan(0x1)`` | sNaN value with a particular payload, specified |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do nan and snan take an explicit payload but qnan does not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original proposal was qnan for the preferred qNaN, and nan(...) for every other NaN value. I discovered last night that APFloat::convertFromString didn't allow nan(...) to produce an sNaN value, and after staring at the IEEE 754 and C23 specifications for a bit to look at what they wanted for string->NaN conversions, I concluded that it was better to explicitly call out an snan(...) string than to make nan(...) produce a qNaN value.

There's not much keeping qnan from having a payload parameter, except that the APFloat::convertFromString doesn't support it. That's changeable, but the IEEE 754 specification I noticed doesn't ever use qnan for a qNaN string, so it doesn't entirely feel right to me to change APFloat::convertFromString to allow it.

FWIW, I also expect that virtually every NaN in practice ends up being +qnan or -qnan anyways.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we spell out this assumes the 2008 snan quiet bit pattern?

| | as hexadecimal (not including the quiet bit as |
| | part of the payload). The sign is required. |
+----------------+---------------------------------------------------+
| ``f0x3c00`` | Value of the floating-point number if bitcast to |
| | an integer. The number must have exactly as many |
| | hexadecimal digits as is necessary for the size |
| | of the floating-point number. |
+----------------+---------------------------------------------------+

There are no constants of type x86_amx.

Expand Down
19 changes: 19 additions & 0 deletions llvm/include/llvm/ADT/APFloat.h
Original file line number Diff line number Diff line change
Expand Up @@ -1347,6 +1347,25 @@ class APFloat : public APFloatBase {
APFLOAT_DISPATCH_ON_SEMANTICS(
convertFromZeroExtendedInteger(Input, InputSize, IsSigned, RM));
}

/// Fill this APFloat with the result of a string conversion.
///
/// The following strings are accepted for conversion purposes:
/// * Decimal floating-point literals (e.g., `0.1e-5`)
/// * Hexadecimal floating-point literals (e.g., `0x1.0p-5`)
/// * Positive infinity via "inf", "INFINITY", "Inf", "+Inf", or "+inf".
/// * Negative infinity via "-inf", "-INFINITY", or "-Inf".
/// * Quiet NaNs via "nan", "NaN", "nan(...)", or "NaN(...)", where the
/// "..." is either a decimal or hexadecimal integer representing the
/// payload. A negative sign may be optionally provided.
/// * Signaling NaNs via "snan", "sNaN", "snan(...)", or "sNaN(...)", where
/// the "..." is either a decimal or hexadecimal integer representing the
/// payload. A negative sign may be optionally provided.
///
/// If the input string is none of these forms, then an error is returned.
///
/// If a floating-point exception occurs during conversion, then no error is
/// returned, and the exception is indicated via opStatus.
Expected<opStatus> convertFromString(StringRef, roundingMode);
APInt bitcastToAPInt() const {
APFLOAT_DISPATCH_ON_SEMANTICS(bitcastToAPInt());
Expand Down
1 change: 1 addition & 0 deletions llvm/include/llvm/AsmParser/LLLexer.h
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,7 @@ namespace llvm {
lltok::Kind Lex0x();
lltok::Kind LexHash();
lltok::Kind LexCaret();
lltok::Kind LexFloatStr();

uint64_t atoull(const char *Buffer, const char *End);
uint64_t HexIntToVal(const char *Buffer, const char *End);
Expand Down
6 changes: 4 additions & 2 deletions llvm/include/llvm/AsmParser/LLToken.h
Original file line number Diff line number Diff line change
Expand Up @@ -497,12 +497,14 @@ enum Kind {
DwarfMacinfo, // DW_MACINFO_foo
ChecksumKind, // CSK_foo
DbgRecordType, // dbg_foo
FloatLiteral, // Unparsed float literal

// Type valued tokens (TyVal).
Type,

APFloat, // APFloatVal
APSInt // APSInt
FloatHexLiteral, // f0x..., stored as APSInt
APFloat, // APFloatVal
APSInt // APSInt
};
} // end namespace lltok
} // end namespace llvm
Expand Down
Loading
Loading