Make Fp16x2 atomics its own capability.

# Problem Description

Right now `InterlockedAddF16` maps directly to single fp16 atomic add in SPIRV, which is supported by some vendors but not by NVIDIA. On NVIDIA, the user needs to call `InterlockedAddF16Emulated` instead that uses a fp16x2 atomic under the hood.

This means that a user writing cross platform/vendor code almost always need to call `InterlockedAddFP16Emulated`, which could mean slower execution on non NVIDIA hardware.

# Preferred Solution

Make Fp16x2 atomics its own capability, separate from fp16 atomics, so the implementation of `InterlockedAddFP16` can `target_switch` on the availability of the capabilities, and fallback to emulation when fp16 atomics is not available. This allows the user to select which code path to use from the -capability compiler option.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make Fp16x2 atomics its own capability. #11083

Problem Description

Preferred Solution

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Make Fp16x2 atomics its own capability. #11083

Description

Problem Description

Preferred Solution

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions