Skip to content

Make Fp16x2 atomics its own capability. #11083

@csyonghe

Description

@csyonghe

Problem Description

Right now InterlockedAddF16 maps directly to single fp16 atomic add in SPIRV, which is supported by some vendors but not by NVIDIA. On NVIDIA, the user needs to call InterlockedAddF16Emulated instead that uses a fp16x2 atomic under the hood.

This means that a user writing cross platform/vendor code almost always need to call InterlockedAddFP16Emulated, which could mean slower execution on non NVIDIA hardware.

Preferred Solution

Make Fp16x2 atomics its own capability, separate from fp16 atomics, so the implementation of InterlockedAddFP16 can target_switch on the availability of the capabilities, and fallback to emulation when fp16 atomics is not available. This allows the user to select which code path to use from the -capability compiler option.

Metadata

Metadata

Assignees

No fields configured for Feature.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions