Skip to content

Add predefined datatypes for bfloat16 data #5402

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

jhendersonHDF
Copy link
Collaborator

Adds predefined datatypes for little- and big-endian bfloat16 data

Does not add support for any native bfloat16 types; datatype conversions are performed in software

Also adds missing float16 predefined types to fortran

Adds predefined datatypes for little- and big-endian bfloat16 data

Does not add support for any native bfloat16 types; datatype conversions
are performed in software
@jhendersonHDF jhendersonHDF added Priority - 1. High These are important issues that should be resolved in the next release Component - C Library Core C library issues (usually in the src directory) Component - Tools Command-line tools like h5dump, includes high-level tools Component - High-Level Library Component - Documentation Doxygen, markdown, etc. Component - Wrappers C++, Java & Fortran wrappers Component - Testing Code in test or testpar directories, GitHub workflows labels Mar 21, 2025
@jhendersonHDF
Copy link
Collaborator Author

Essentially finished, pending CI results and some potential testing on a big-endian system

H5T_IEEE_F32BE | H5T_IEEE_F32LE |
H5T_IEEE_F64BE | H5T_IEEE_F64LE |
H5T_NATIVE_FLOAT16 | H5T_NATIVE_FLOAT |
H5T_FLT_BF16BE | H5T_FLT_BF16LE |
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open to any alternative names that seem more fitting, but this type is distinct from the IEEE standard, so I basically created a new category for alternative floating-point formats.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps "FLOAT" instead of "FLT"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable to me

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree - I like "FLOAT" better also

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, maybe "NONSTD", since it's a contrast to the "IEEE" label?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So... H5T_NONSTD_BF16 or H5T_NONSTD_BFLOAT16? I like "FLOAT" explicitly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue I see with that is that it's not unlikely that these types get adopted into some standard in the future, whether it's IEEE or not.

if (NULL == (bf16_be_dt = H5I_object(H5T_FLT_BF16BE)))
HGOTO_ERROR(H5E_ARGS, H5E_BADTYPE, NULL, "not a data type");

/* Promote bfloat16 to float instead of float16, as it
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bfloat16 types should be promoted to float instead of float16, as the type is the same size as float16, but a different format. Converting between bfloat16 and float is also very simple (by design).

if (size == 2)
p_type = H5Tcopy(H5T_IEEE_F16LE);
if (size == 2) {
if (true == H5Tequal(tid, H5T_IEEE_F16LE) || true == H5Tequal(tid, H5T_IEEE_F16BE))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to make sure the correct type between float16 and bfloat16 is picked so that the data comes out correctly.

hsize_t dims[2], adims[1];

/*
* bfloat16 keeps approximately the same range as the IEEE 32-bit
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually better tests can be written, but for now this PR just adds support for predefined types and doesn't add support for a native type. GCC and Clang both have support for a __bf16 type at this point though, so it should be doable in the future.

@qkoziol
Copy link
Collaborator

qkoziol commented Mar 22, 2025

I'll try to review tomorrow and get you some feedback

@epourmal
Copy link

epourmal commented Mar 24, 2025 via email

@qkoziol
Copy link
Collaborator

qkoziol commented Mar 24, 2025

It was my undesrtanding that middle part "IEEE" or similar represents architecture or a standard. Should we follow the same rule here?

On Mon, Mar 24, 2025 at 12:11 PM Quincey Koziol @.> wrote: @.* commented on this pull request. ------------------------------ In doxygen/dox/DDLBNF200.dox <#5402 (comment)>: > H5T_IEEE_F32BE | H5T_IEEE_F32LE | H5T_IEEE_F64BE | H5T_IEEE_F64LE | - H5T_NATIVE_FLOAT16 | H5T_NATIVE_FLOAT | + H5T_FLT_BF16BE | H5T_FLT_BF16LE | Agree - I like "FLOAT" better also — Reply to this email directly, view it on GitHub <#5402 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADLFT3POWBKSASYVAQ4VSTD2WA4CRAVCNFSM6AAAAABZRCFMH2VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDOMJRGEZTANZUHA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Right - that was why I thought "NONSTD" might be better

@qkoziol
Copy link
Collaborator

qkoziol commented Mar 24, 2025

Maybe "NONSTD_BFLOAT16" ?

@jhendersonHDF
Copy link
Collaborator Author

It was my undesrtanding that middle part "IEEE" or similar represents architecture or a standard. Should we follow the same rule here?

I'm not sure that this is necessarily the case and I'd argue we should consider abandoning that. For example, H5T_STD_B32BE doesn't really mean anything, because AFAIK there's really no standard type for this. C23 does appear to introduce those, but we don't target C23 yet.

@jhendersonHDF
Copy link
Collaborator Author

Maybe "NONSTD_BFLOAT16" ?

That name is a bit weird to me because bfloat16 is the standard. There's no other standard for the type, it's just that the standard isn't an IEEE standard.

@jhendersonHDF
Copy link
Collaborator Author

Also consider I plan to add support for FP8, FP6 and FP4 following this PR, and in each case the formats essentially are the standard until they have wider adoption. https://arxiv.org/abs/2209.05433 for example.

@qkoziol
Copy link
Collaborator

qkoziol commented Mar 24, 2025

Since Google created it, maybe "GOOGLE_BFLOAT16" ?

@ajelenak
Copy link
Contributor

I was thinking the same...

@epourmal
Copy link

epourmal commented Mar 24, 2025

Also consider I plan to add support for FP8, FP6 and FP4 following this PR, and in each case the formats essentially are the standard until they have wider adoption. https://arxiv.org/abs/2209.05433 for example.

Frankly speaking, I would drop middle part for these types and document their implementation. I would vote against introducing GOOGLE.

@jhendersonHDF
Copy link
Collaborator Author

jhendersonHDF commented Mar 24, 2025

Since Google created it, maybe "GOOGLE_BFLOAT16" ?

Perhaps. The reason I used BF16 on the end part was just to match our existing conventions like H5T_IEEE_F32LE, especially because there will be types like F8E4M3 (FP8 with 4 exponent bits and 3 mantissa bits). That said, I also like FLOAT being spelled out explicitly, but I still think it's much more natural to just use H5T_FLOAT_ so that we don't tie the names to any particular standard, architecture or company unless it makes sense to. I see this as "one datatype of class H5T_FLOAT is called H5T_FLOAT_BF16". I'd definitely like to avoid the STD vs. NONSTD naming if possible, because I'm fairly certain these types will be adopted into some standard in the future.

@ajelenak
Copy link
Contributor

I was ok with H5T_FLOAT_BF16 to begin with but enjoy discussing alternative naming options.

@jhendersonHDF
Copy link
Collaborator Author

For the time being, I'm going to proceed on the other datatypes using H5T_FLOAT_ names until we come to a resolution.

I propose that we use a convention of H5T_<TYPE_CLASS>_<OPTIONAL(?)_ARCH_OR_STANDARD>_<SPECIFICS> going forward and welcome any feedback if others think differently. This leads to names like H5T_FLOAT_INTEL_F32, H5T_FLOAT_IEEE_F32LE, H5T_INTEGER_I32BE, etc. The question mark on optional is because I'm thinking it might be worth it to always mention a standard or architecture to be consistent, leading to, for example, H5T_INTEGER_C_I32BE for int32_t. Note that this is already the convention I used for predefined complex datatypes (H5T_COMPLEX_IEEE_F32LE, for example). I believe this form does a better job of actually distinguishing what a particular datatype is. For example, H5T_IEEE_F32LE tells you that the datatype comes from or relates to an IEEE standard, but you rely on the "F32" part to tell you that it's a floating-point type. If additional non-floating-point types are added to IEEE standards, this may get confusing. For example, if we wanted to add support for the decimal floating-point types, I imagine we'd currently use something like H5T_IEEE_D32LE (maybe H5T_IEEE_DF32LE). However, that "D" would be somewhat in conflict with the current H5T_UNIX_D32LE and related types, where "D" apparently represents time data. If an IEEE standard included time types (for whatever reason), this would just complicate things more. A different letter than "D" could of course be picked to represent decimal floating-point datatypes, but I feel adding the type class gives context to anything in the "specifics" part of the name and helps deal with any conflict between lettering. It may also be worth revisiting what exactly should be in the "specifics" part of the name.

@epourmal
Copy link

Makes sense, but I would leave old types as they are and use new standard only for the new types.

@jhendersonHDF
Copy link
Collaborator Author

Makes sense, but I would leave old types as they are and use new standard only for the new types.

I certainly wouldn't want to go changing old type names as it would be nothing but a source of annoyance (though we can always introduce new macros that just point to the old names if we wanted).

@fortnern
Copy link
Member

I don't think the C standard specifies the exact integer format, or even that it's two's complement, so we shouldn't name types "C". I'm not sure there's a better name for "normal" integer types than STD, but I might be wrong. Adding new names for consistency is fine, and something like H5T_INTEGER_U32LE is probably fine, though H5T_INTEGER_I32LE seems redundant. Maybe _S32LE or just _32LE?

@jhendersonHDF
Copy link
Collaborator Author

jhendersonHDF commented Mar 25, 2025

or even that it's two's complement

Note that this is the case in C23 at least.

so we shouldn't name types "C"

I tend to agree, it's just that STD doesn't really particularly mean anything.

though H5T_INTEGER_I32LE seems redundant.

Agree. Also why I mentioned we may want to revisit what should go in the "specifics" part of the name if we use the convention. FLOAT_XXX_F32LE is just as redundant, so maybe the single letter on the end of these isn't needed and we should consider something else?

@brtnfld
Copy link
Collaborator

brtnfld commented Apr 2, 2025

Jordan, is this a summary of what you are proposing:


H5T_<TYPE_CLASS>_<OPTIONAL_QUALIFIER>_<SPECIFICS>

    H5T_<TYPE_CLASS>: This explicitly states the data type (e.g., FLOAT, INTEGER, COMPLEX).
    _<OPTIONAL_QUALIFIER>: This qualifier provides additional context, such as:
        Architecture: (e.g., INTEL)
        Specific Organization: (e.g. GOOGLE)
        Standard: (e.g. IEEE)
        This section can be omitted if the datatype is very common or there is no specific qualifier.
    _<SPECIFICS>: This details the specific format (e.g., BF16, F32LE, I32BE, F8E4M3).

@jhendersonHDF
Copy link
Collaborator Author

Jordan, is this a summary of what you are proposing:

H5T_<TYPE_CLASS><OPTIONAL_QUALIFIER>

H5T_<TYPE_CLASS>: This explicitly states the data type (e.g., FLOAT, INTEGER, COMPLEX).
_<OPTIONAL_QUALIFIER>: This qualifier provides additional context, such as:
    Architecture: (e.g., INTEL)
    Specific Organization: (e.g. GOOGLE)
    Standard: (e.g. IEEE)
    This section can be omitted if the datatype is very common or there is no specific qualifier.
_<SPECIFICS>: This details the specific format (e.g., BF16, F32LE, I32BE, F8E4M3).

Yes, this looks like a good summary. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component - C Library Core C library issues (usually in the src directory) Component - Documentation Doxygen, markdown, etc. Component - Testing Code in test or testpar directories, GitHub workflows Component - Tools Command-line tools like h5dump, includes high-level tools Component - Wrappers C++, Java & Fortran wrappers Priority - 1. High These are important issues that should be resolved in the next release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants