shader-slang · jkwak-work · May 26, 2026 · May 26, 2026 · May 27, 2026 · May 28, 2026
@@ -1314,6 +1314,7 @@ A capability describes an optional feature that a target may or may not support.
 * `SPV_EXT_descriptor_indexing` : enables the SPV_EXT_descriptor_indexing extension 
 * `SPV_EXT_shader_atomic_float_add` : enables the SPV_EXT_shader_atomic_float_add extension 
 * `SPV_EXT_shader_atomic_float16_add` : enables the SPV_EXT_shader_atomic_float16_add extension 
+* `SPV_NV_shader_atomic_fp16_vector` : enables the SPV_NV_shader_atomic_fp16_vector extension 
 * `SPV_EXT_shader_atomic_float_min_max` : enables the SPV_EXT_shader_atomic_float_min_max extension 
 * `SPV_EXT_mesh_shader` : enables the SPV_EXT_mesh_shader extension 
 * `SPV_EXT_demote_to_helper_invocation` : enables the SPV_EXT_demote_to_helper_invocation extension 
@@ -1351,6 +1352,7 @@ A capability describes an optional feature that a target may or may not support.
 * `spvDeviceGroup` 
 * `spvAtomicFloat32AddEXT` 
 * `spvAtomicFloat16AddEXT` 
+* `spvAtomicFloat16VectorNV` 
 * `spvAtomicFloat64AddEXT` 
 * `spvInt64Atomics` 
 * `spvAtomicFloat32MinMaxEXT` 

@@ -170,13 +170,14 @@ GLSL 4.6 with [GLSL_EXT_shader_atomic_float](https://github.com/KhronosGroup/GLS
 GLSL 4.6 with [GLSL_EXT_shader_atomic_float2](https://github.com/KhronosGroup/GLSL/blob/main/extensions/ext/GLSL_EXT_shader_atomic_float2.txt) can use atomic operations for 16-bit float type.
 
 SPIR-V 1.5 with [SPV_EXT_shader_atomic_float_add](https://github.com/KhronosGroup/SPIRV-Registry/blob/main/extensions/EXT/SPV_EXT_shader_atomic_float_add.asciidoc) and [SPV_EXT_shader_atomic_float_min_max](https://github.com/KhronosGroup/SPIRV-Registry/blob/main/extensions/EXT/SPV_EXT_shader_atomic_float_min_max.asciidoc) can use atomic operations for 32-bit float type and 64-bit float type.
-SPIR-V 1.5 with [SPV_EXT_shader_atomic_float16_add](https://github.com/KhronosGroup/SPIRV-Registry/blob/main/extensions/EXT/SPV_EXT_shader_atomic_float16_add.asciidoc) can use atomic operations for 16-bit float type
-
-|        | 32-bit integer | 64-bit integer  | 32-bit float          | 64-bit float     | 16-bit float     |
-| ------ | -------------- | --------------- | --------------------- | ---------------- | ---------------- |
-| HLSL   | Yes (SM5.0)    | Yes (SM6.6)     | Only bit-wise (SM6.6) | No               | No               |
-| GLSL   | Yes (GL4.3)    | Yes (GL4.4+ext) | Yes (GL4.6+ext)       | Yes (GL4.6+ext)  | Yes (GL4.6+ext)  |
-| SPIR-V | Yes            | Yes             | Yes (SPV1.5+ext)      | Yes (SPV1.5+ext) | Yes (SPV1.5+ext) |
+SPIR-V 1.5 with [SPV_EXT_shader_atomic_float16_add](https://github.com/KhronosGroup/SPIRV-Registry/blob/main/extensions/EXT/SPV_EXT_shader_atomic_float16_add.asciidoc) can use atomic operations for 16-bit float type.
+SPIR-V 1.5 with [SPV_NV_shader_atomic_fp16_vector](https://github.com/KhronosGroup/SPIRV-Registry/blob/main/extensions/NV/SPV_NV_shader_atomic_fp16_vector.asciidoc) can use vector atomic add/min/max/exchange operations for 16-bit float vector types with 2 or 4 components. Vector atomic sub is emitted as a negated vector atomic add.
+
+|        | 32-bit integer | 64-bit integer  | 32-bit float          | 64-bit float     | 16-bit float     | 16-bit float vector     |
+| ------ | -------------- | --------------- | --------------------- | ---------------- | ---------------- | ----------------------- |
+| HLSL   | Yes (SM5.0)    | Yes (SM6.6)     | Only bit-wise (SM6.6) | No               | No               | No                      |
+| GLSL   | Yes (GL4.3)    | Yes (GL4.4+ext) | Yes (GL4.6+ext)       | Yes (GL4.6+ext)  | Yes (GL4.6+ext)  | Yes (GL_NV ext)         |
+| SPIR-V | Yes            | Yes             | Yes (SPV1.5+ext)      | Yes (SPV1.5+ext) | Yes (SPV1.5+ext) | Yes (SPV_NV ext)        |
 
 ## ConstantBuffer, StructuredBuffer and ByteAddressBuffer
 

@@ -701,6 +701,10 @@ Extensions
 `SPV_NV_ray_tracing_motion_blur`
 > Represents the SPIR-V extension for ray tracing motion blur.
 
+`SPV_NV_shader_atomic_fp16_vector`
+> Represents the SPIR-V extension for vector atomic float 16 add/min/max/exchange operations.
+> Vector atomic sub is emitted as a negated vector atomic add.
+
 `SPV_NV_shader_image_footprint`
 > Represents the SPIR-V extension for shader image footprint.
 
@@ -723,6 +727,11 @@ Extensions
 `spvAtomicFloat16MinMaxEXT`
 > Represents the SPIR-V capability for atomic float 16 min/max operations.
 
+`spvAtomicFloat16VectorNV`
+> Represents the SPIR-V capability for vector atomic float 16 add/min/max/exchange operations.
+> Vector atomic sub is emitted as a negated vector atomic add.
+> Implies scalar atomic float 16 add support.
+
 `spvAtomicFloat32AddEXT`
 > Represents the SPIR-V capability for atomic float 32 add operations.
 

@@ -6491,9 +6491,13 @@ $}
     /// @param byteAddress The address at which to perform the atomic add operation.
     /// @param fp16x2Value Two 16-bit floating point values are packed into a 32-bit unsigned integer.
     /// @return The 2 16-bit floating point values packed into a 32-bit unsigned integer.
+    /// @remarks For SPIR-V, this helper requires `SPV_NV_shader_atomic_fp16_vector`
+    /// and emits a `half2` `OpAtomicFAdd`; the packed fp16x2 representation matches
+    /// the NVAPI HLSL ABI, but the underlying operation is a vector atomic.
     [__requiresNVAPI]
     [ForceInline]
-    [require(cuda_hlsl_spirv)]
+    [require(cuda_hlsl, sm_5_0)]
+    [require(spirv, spvAtomicFloat16VectorNV)]
     uint _NvInterlockedAddFp16x2(uint byteAddress, uint fp16x2Value)
     {
         __target_switch
@@ -6511,14 +6515,17 @@ $}
     /// @param byteAddress The address at which to perform the atomic add operation.
     /// @param value The value to add to the value at `byteAddress`.
     /// @param originalValue The original value at `byteAddress` before the add operation.
-    /// @remarks For SPIR-V, this function maps to `OpAtomicFAdd` and requires `SPV_EXT_shader_atomic_float16_add` extension.
+    /// @remarks For SPIR-V, this function requires `SPV_EXT_shader_atomic_float16_add`
+    /// and maps to `OpAtomicFAdd` on a `half`. When `SPV_NV_shader_atomic_fp16_vector`
+    /// is available, it uses the half-vector atomic path instead.
     ///
     /// For HLSL, this function translates to an NVAPI call
     /// due to lack of native HLSL intrinsic for floating point atomic add. For CUDA, this function
     /// maps to `atomicAdd`.
     [__requiresNVAPI]
     [ForceInline]
-    [require(cuda_hlsl_spirv, sm_5_0)]
+    [require(cuda_hlsl, sm_5_0)]
+    [require(spirv, spvAtomicFloat16AddEXT)]
     void InterlockedAddF16(uint byteAddress, half value, out half originalValue)
     {
         __target_switch
@@ -6536,6 +6543,20 @@ $}
                 originalValue = asfloat16((uint16_t)(_NvInterlockedAddFp16x2(byteAddress, packedInput) >> 16));
             }
             return;
+        case spvAtomicFloat16VectorNV:
+            {
+                let buf = __getEquivalentStructuredBuffer<half2>(this);
+                if ((byteAddress & 2) == 0)
+                {
+                    originalValue = __atomic_add(buf[byteAddress/4], half2(value, half(0.0))).x;
+                }
+                else
+                {
+                    originalValue = __atomic_add(buf[byteAddress/4], half2(half(0.0), value)).y;
+                }
+                return;
+            }
+        case spvAtomicFloat16AddEXT:
         default:
             {
                 let buf = __getEquivalentStructuredBuffer<half>(this);

@@ -543,6 +543,11 @@ def SPV_EXT_shader_atomic_float_add : _spirv_1_0;
 /// [EXT]
 def SPV_EXT_shader_atomic_float16_add : SPV_EXT_shader_atomic_float_add;
 
+/// Represents the SPIR-V extension for vector atomic float 16 add/min/max/exchange operations.
+/// Vector atomic sub is emitted as a negated vector atomic add.
+/// [EXT]
+def SPV_NV_shader_atomic_fp16_vector : _spirv_1_0;
+
 /// Represents the SPIR-V extension for atomic float min/max operations.
 /// [EXT]
 def SPV_EXT_shader_atomic_float_min_max : _spirv_1_0;
@@ -700,6 +705,12 @@ def spvAtomicFloat32AddEXT : SPV_EXT_shader_atomic_float_add;
 /// [EXT]
 def spvAtomicFloat16AddEXT : SPV_EXT_shader_atomic_float16_add;
 
+/// Represents the SPIR-V capability for vector atomic float 16 add/min/max/exchange operations.
+/// Vector atomic sub is emitted as a negated vector atomic add.
+/// Implies scalar atomic float 16 add support.
+/// [EXT]
+def spvAtomicFloat16VectorNV : SPV_NV_shader_atomic_fp16_vector + spvAtomicFloat16AddEXT;
+
 /// Represents the SPIR-V capability for atomic float 64 add operations.
 /// [EXT]
 def spvAtomicFloat64AddEXT : SPV_EXT_shader_atomic_float_add;
@@ -1261,7 +1272,7 @@ alias GL_NV_ray_tracing_motion_blur = _GL_NV_ray_tracing_motion_blur | spvRayTra
 
 /// Represents the GL_NV_shader_atomic_fp16_vector extension.
 /// [EXT]
-alias GL_NV_shader_atomic_fp16_vector = _GL_NV_shader_atomic_fp16_vector + _GL_NV_gpu_shader5 | _spirv_1_0;
+alias GL_NV_shader_atomic_fp16_vector = _GL_NV_shader_atomic_fp16_vector + _GL_NV_gpu_shader5 | spvAtomicFloat16VectorNV;
 
 /// Represents the GL_NV_shader_invocation_reorder extension (NVIDIA-specific).
 /// [EXT]

@@ -1956,28 +1956,6 @@ void validateEntryPoint(EntryPoint* entryPoint, DiagnosticSink* sink)
         else
         {
             auto& targetOptionSet = target->getOptionSet();
-            bool specificProfileRequested =
-                targetOptionSet.hasOption(CompilerOptionName::Profile) &&
-                (targetOptionSet.getIntOption(CompilerOptionName::Profile) !=
-                 SLANG_PROFILE_UNKNOWN);
-            bool specificCapabilityRequested = false;
-            for (auto atomVal : targetOptionSet.getArray(CompilerOptionName::Capability))
-            {
-                switch (atomVal.kind)
-                {
-                case CompilerOptionValueKind::Int:
-                    if (atomVal.intValue != SLANG_CAPABILITY_UNKNOWN)
-                        specificCapabilityRequested = true;
-                    break;
-                case CompilerOptionValueKind::String:
-                    // User made a specific capability request
-                    specificCapabilityRequested = true;
-                    break;
-                }
-                if (specificCapabilityRequested)
-                    break;
-            }
-
             if (auto declaredCapsMod =
                     entryPointFuncDecl->findModifier<ExplicitlyDeclaredCapabilityModifier>())
             {
@@ -1988,7 +1966,7 @@ void validateEntryPoint(EntryPoint* entryPoint, DiagnosticSink* sink)
             }
 
             // Only attempt to error if a specific profile or capability is requested
-            if ((specificCapabilityRequested || specificProfileRequested) &&
+            if (isSpecificProfileOrCapabilityRequested(targetOptionSet) &&
                 targetCaps.atLeastOneSetImpliedInOther(
                     CapabilitySet{entryPointFuncDecl->inferredCapabilityRequirements}) ==
                     CapabilitySet::ImpliesReturnFlags::NotImplied)

@@ -213,6 +213,32 @@ enum class DiagnosticCategory
     None = 0,
     Capability = 1 << 0,
 };
+
+inline bool isSpecificProfileRequested(CompilerOptionSet& optionSet)
+{
+    return optionSet.hasOption(CompilerOptionName::Profile) &&
+           (optionSet.getIntOption(CompilerOptionName::Profile) != SLANG_PROFILE_UNKNOWN);
+}
+
+inline bool isSpecificCapabilityRequested(CompilerOptionSet& optionSet)
+{
+    for (auto atomVal : optionSet.getArray(CompilerOptionName::Capability))
+    {
+        if ((atomVal.kind == CompilerOptionValueKind::Int &&
+             atomVal.intValue != SLANG_CAPABILITY_UNKNOWN) ||
+            atomVal.kind == CompilerOptionValueKind::String)
+        {
+            return true;
+        }
+    }
+    return false;
+}
+
+inline bool isSpecificProfileOrCapabilityRequested(CompilerOptionSet& optionSet)
+{
+    return isSpecificProfileRequested(optionSet) || isSpecificCapabilityRequested(optionSet);
+}
+
 template<typename P, typename... Args>
 bool maybeDiagnose(
     DiagnosticSink* sink,

@@ -4808,6 +4808,20 @@ warning(
     span { loc = "location", message = "Slang's SPIR-V backend only supports SPIR-V version 1.3 and later. Use `-emit-spirv-via-glsl` option to produce SPIR-V 1.0 through 1.2." }
 )
 
+err(
+    "spirv-fp16-vector-atomic-unsupported-width",
+    50013,
+    "invalid SPIR-V fp16 vector atomic width",
+    span { loc = "location", message = "SPIR-V fp16 vector atomics only support half2 and half4." }
+)
+
+err(
+    "spirv-fp16-vector-atomic-unsupported-operation",
+    50014,
+    "invalid SPIR-V fp16 vector atomic operation",
+    span { loc = "location", message = "SPIR-V fp16 vector atomics only support add, sub, min, max, and exchange operations." }
+)
+
 err(
     "invalid-mesh-stage-output-topology",
     50060,