Skip to content

Commit bce5d76

Browse files
Validate ByteAddressBuffer alignment/location contract up front
Slice 1 of 4 decomposing #11545. Adds up-front validation of the `alignment`/`location` operands of the `(RW)ByteAddressBuffer` `LoadAligned` / `StoreAligned` / 3-argument `Store` accessors, emitted before the legal-type early-out so already-legal scalar accesses are covered too. No codegen change. - 41302: a non-zero `alignment` must be a compile-time constant (today a runtime alignment silently scalarizes with no diagnostic). - 41303: a compile-time-constant `location` must be a multiple of `alignment` (today a self-contradictory constant location is silently accepted). The existing aligned/array tests are re-expressed with explicit valid alignments because their single-argument constant-location calls become 41303 errors under this slice; the re-expressed forms produce identical codegen under the existing decision logic. Co-authored-by: Harsh Aggarwal <haaggarwal@nvidia.com>
1 parent db002db commit bce5d76

7 files changed

Lines changed: 159 additions & 48 deletions

docs/target-compatibility.md

Lines changed: 56 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -6,45 +6,46 @@ OpenGL compatibility is not listed here, because OpenGL isn't an officially supp
66
Items with a + means that the feature is anticipated to be added in the future.
77
Items with ^ means there is some discussion about support later in the document for this target.
88

9-
| Feature | D3D11 | D3D12 | VK | CUDA | Metal | CPU |
10-
| ---------------------------------------------------- | ----- | --------- | ------- | -------------- | ----- | --------- |
11-
| [Half Type](#half) | No | Yes ^ | Yes | Yes ^ | Yes | No + |
12-
| Double Type | Yes | Yes | Yes | Yes | No | Yes |
13-
| Double Intrinsics | No | Limited + | Limited | Most | No | Yes |
14-
| [u/int8_t Type](#int8_t) | No | No | Yes ^ | Yes | Yes | Yes |
15-
| [u/int16_t Type](#int16_t) | No | Yes ^ | Yes ^ | Yes | Yes | Yes |
16-
| [u/int64_t Type](#int64_t) | No | Yes ^ | Yes | Yes | Yes | Yes |
17-
| u/int64_t Intrinsics | No | No | Yes | Yes | Yes | Yes |
18-
| [int matrix](#int-matrix) | Yes | Yes | No + | Yes | No | Yes |
19-
| [tex.GetDimensions](#tex-get-dimensions) | Yes | Yes | Yes | No | Yes | Yes |
20-
| [SM6.0 Wave Intrinsics](#sm6-wave) | No | Yes | Partial | Yes ^ | No | No |
21-
| SM6.0 Quad Intrinsics | No | Yes | No + | No | No | No |
22-
| [SM6.5 Wave Intrinsics](#sm6.5-wave) | No | Yes ^ | No + | Yes ^ | No | No |
23-
| [WaveMask Intrinsics](#wave-mask) | Yes ^ | Yes ^ | Yes + | Yes | No | No |
24-
| [WaveShuffle](#wave-shuffle) | No | Limited ^ | Yes | Yes | No | No |
25-
| [Tesselation](#tesselation) | Yes ^ | Yes ^ | No + | No | No | No |
26-
| [Graphics Pipeline](#graphics-pipeline) | Yes | Yes | Yes | No | Yes | No |
27-
| [Ray Tracing DXR 1.0](#ray-tracing-1.0) | No | Yes ^ | Yes ^ | No | No | No |
28-
| Ray Tracing DXR 1.1 | No | Yes | No + | No | No | No |
29-
| [Native Bindless](#native-bindless) | No | No | No | Yes | No | Yes |
30-
| [Buffer bounds](#buffer-bounds) | Yes | Yes | Yes | Limited ^ | No ^ | Limited ^ |
31-
| [Resource bounds](#resource-bounds) | Yes | Yes | Yes | Yes (optional) | Yes | Yes |
32-
| Atomics | Yes | Yes | Yes | Yes | Yes | Yes |
33-
| Group shared mem/Barriers | Yes | Yes | Yes | Yes | Yes | No + |
34-
| [TextureArray.Sample float](#tex-array-sample-float) | Yes | Yes | Yes | No | Yes | Yes |
35-
| [Separate Sampler](#separate-sampler) | Yes | Yes | Yes | No | Yes | Yes |
36-
| [tex.Load](#tex-load) | Yes | Yes | Yes | Limited ^ | Yes | Yes |
37-
| [Full bool](#full-bool) | Yes | Yes | Yes | No | Yes | Yes ^ |
38-
| [Mesh Shader](#mesh-shader) | No | Yes | Yes | No | Yes | No |
39-
| [`[unroll]`](#unroll] | Yes | Yes | Yes ^ | Yes | No ^ | Limited + |
40-
| Atomics | Yes | Yes | Yes | Yes | Yes | No + |
41-
| [Atomics on RWBuffer](#rwbuffer-atomics) | Yes | Yes | Yes | No | Yes | No + |
42-
| [Sampler Feedback](#sampler-feedback) | No | Yes | No + | No | No | Yes ^ |
43-
| [RWByteAddressBuffer Atomic](#byte-address-atomic) | No | Yes ^ | Yes ^ | Yes | Yes | No + |
44-
| [Shader Execution Reordering](#ser) | No | Yes ^ | Yes ^ | No | No | No |
45-
| [debugBreak](#debug-break) | No | No | Yes | Yes | No | Yes |
46-
| [realtime clock](#realtime-clock) | No | Yes ^ | Yes | Yes | No | No |
47-
| [Switch Fall-Through](#switch-fallthrough) | No ^ | Yes | Yes | Yes | Yes | Yes |
9+
| Feature | D3D11 | D3D12 | VK | CUDA | Metal | CPU |
10+
| ------------------------------------------------------------- | ----- | --------- | ------- | -------------- | ----- | --------- |
11+
| [Half Type](#half) | No | Yes ^ | Yes | Yes ^ | Yes | No + |
12+
| Double Type | Yes | Yes | Yes | Yes | No | Yes |
13+
| Double Intrinsics | No | Limited + | Limited | Most | No | Yes |
14+
| [u/int8_t Type](#int8_t) | No | No | Yes ^ | Yes | Yes | Yes |
15+
| [u/int16_t Type](#int16_t) | No | Yes ^ | Yes ^ | Yes | Yes | Yes |
16+
| [u/int64_t Type](#int64_t) | No | Yes ^ | Yes | Yes | Yes | Yes |
17+
| u/int64_t Intrinsics | No | No | Yes | Yes | Yes | Yes |
18+
| [int matrix](#int-matrix) | Yes | Yes | No + | Yes | No | Yes |
19+
| [tex.GetDimensions](#tex-get-dimensions) | Yes | Yes | Yes | No | Yes | Yes |
20+
| [SM6.0 Wave Intrinsics](#sm6-wave) | No | Yes | Partial | Yes ^ | No | No |
21+
| SM6.0 Quad Intrinsics | No | Yes | No + | No | No | No |
22+
| [SM6.5 Wave Intrinsics](#sm6.5-wave) | No | Yes ^ | No + | Yes ^ | No | No |
23+
| [WaveMask Intrinsics](#wave-mask) | Yes ^ | Yes ^ | Yes + | Yes | No | No |
24+
| [WaveShuffle](#wave-shuffle) | No | Limited ^ | Yes | Yes | No | No |
25+
| [Tesselation](#tesselation) | Yes ^ | Yes ^ | No + | No | No | No |
26+
| [Graphics Pipeline](#graphics-pipeline) | Yes | Yes | Yes | No | Yes | No |
27+
| [Ray Tracing DXR 1.0](#ray-tracing-1.0) | No | Yes ^ | Yes ^ | No | No | No |
28+
| Ray Tracing DXR 1.1 | No | Yes | No + | No | No | No |
29+
| [Native Bindless](#native-bindless) | No | No | No | Yes | No | Yes |
30+
| [Buffer bounds](#buffer-bounds) | Yes | Yes | Yes | Limited ^ | No ^ | Limited ^ |
31+
| [Resource bounds](#resource-bounds) | Yes | Yes | Yes | Yes (optional) | Yes | Yes |
32+
| Atomics | Yes | Yes | Yes | Yes | Yes | Yes |
33+
| Group shared mem/Barriers | Yes | Yes | Yes | Yes | Yes | No + |
34+
| [TextureArray.Sample float](#tex-array-sample-float) | Yes | Yes | Yes | No | Yes | Yes |
35+
| [Separate Sampler](#separate-sampler) | Yes | Yes | Yes | No | Yes | Yes |
36+
| [tex.Load](#tex-load) | Yes | Yes | Yes | Limited ^ | Yes | Yes |
37+
| [Full bool](#full-bool) | Yes | Yes | Yes | No | Yes | Yes ^ |
38+
| [Mesh Shader](#mesh-shader) | No | Yes | Yes | No | Yes | No |
39+
| [`[unroll]`](#unroll] | Yes | Yes | Yes ^ | Yes | No ^ | Limited + |
40+
| Atomics | Yes | Yes | Yes | Yes | Yes | No + |
41+
| [Atomics on RWBuffer](#rwbuffer-atomics) | Yes | Yes | Yes | No | Yes | No + |
42+
| [Sampler Feedback](#sampler-feedback) | No | Yes | No + | No | No | Yes ^ |
43+
| [RWByteAddressBuffer Atomic](#byte-address-atomic) | No | Yes ^ | Yes ^ | Yes | Yes | No + |
44+
| [ByteAddressBuffer Load/Store Alignment](#byte-address-align) | Yes ^ | Yes ^ | Yes ^ | Yes | Yes | Yes |
45+
| [Shader Execution Reordering](#ser) | No | Yes ^ | Yes ^ | No | No | No |
46+
| [debugBreak](#debug-break) | No | No | Yes | Yes | No | Yes |
47+
| [realtime clock](#realtime-clock) | No | Yes ^ | Yes | Yes | No | No |
48+
| [Switch Fall-Through](#switch-fallthrough) | No ^ | Yes | Yes | Yes | Yes | Yes |
4849

4950
<a id="half"></a>
5051

@@ -280,6 +281,22 @@ On Vulkan, for float the [`GL_EXT_shader_atomic_float`](https://www.khronos.org/
280281

281282
CUDA requires SM6.0 or higher for int64 support.
282283

284+
<a id="byte-address-align"></a>
285+
286+
## ByteAddressBuffer Load/Store Alignment
287+
288+
The templated `(RW)ByteAddressBuffer` accessors that take an explicit `alignment`
289+
`LoadAligned<T>(location, alignment)` and the three-argument `Store<T>(address, value, alignment)`
290+
— treat `alignment` as a compile-time contract that is validated up front. (The plain
291+
`Load<T>`/`Store<T>` forms make no alignment promise and are unaffected.)
292+
293+
A non-zero `alignment` must satisfy all of the following, or a compile-time error is reported:
294+
295+
| Requirement | Diagnostic |
296+
| ------------------------------------------------------------- | ---------- |
297+
| Be a compile-time constant | 41302 |
298+
| (Constant `location`) `location` is a multiple of `alignment` | 41303 |
299+
283300
<a id="mesh-shader"></a>
284301

285302
## Mesh Shader

source/slang/hlsl.meta.slang

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -463,8 +463,9 @@ struct ByteAddressBuffer
463463

464464
/// Load a value with type `T` from the buffer at the specified location with a known alignment.
465465
///@param T The type of the value to load from the buffer.
466-
///@param location The input address in bytes, which must be a multiple of 4.
467-
///@param alignment The known alignment of `location`, which must be a multiple of 4 and compatible with `T`.
466+
///@param location The input address in bytes, which must be a multiple of `alignment`.
467+
///@param alignment The known alignment of `location` in bytes. It must be a compile-time
468+
/// constant; otherwise a compile-time error is reported.
468469
///@return The value loaded from the buffer.
469470
///@remarks
470471
/// On HLSL, `alignment` is informational only and does not affect the emitted intrinsic.
@@ -6442,8 +6443,9 @@ struct $(item.name)
64426443

64436444
/// Load a value with type `T` from the buffer at the specified location with a known alignment.
64446445
///@param T The type of the value to load from the buffer.
6445-
///@param location The input address in bytes, which must be a multiple of 4.
6446-
///@param alignment The known alignment of `location`, which must be a multiple of 4 and compatible with `T`.
6446+
///@param location The input address in bytes, which must be a multiple of `alignment`.
6447+
///@param alignment The known alignment of `location` in bytes. It must be a compile-time
6448+
/// constant; otherwise a compile-time error is reported.
64476449
///@return The value loaded from the buffer.
64486450
///@remarks
64496451
/// On HLSL, `alignment` is informational only and does not affect the emitted intrinsic.

source/slang/slang-diagnostics.lua

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5676,6 +5676,20 @@ standalone_note(
56765676
span { loc = "location" }
56775677
)
56785678

5679+
err(
5680+
"byte-address-buffer-alignment-not-constant",
5681+
41302,
5682+
"byte address buffer alignment must be a compile-time constant",
5683+
span { loc = "location", message = "the alignment of a byte address buffer access must be a compile-time constant" }
5684+
)
5685+
5686+
err(
5687+
"byte-address-buffer-location-not-aligned",
5688+
41303,
5689+
"byte address buffer location is not a multiple of the specified alignment",
5690+
span { loc = "location", message = "the byte address buffer location `~offset:Int` is not a multiple of the specified alignment `~alignment:Int`" }
5691+
)
5692+
56795693

56805694
-- Process and validate all diagnostics
56815695
processed_diagnostics, validation_errors = helpers.process_diagnostics(helpers.diagnostics)

source/slang/slang-ir-byte-address-legalize.cpp

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -129,6 +129,10 @@ struct ByteAddressBufferLegalizationContext
129129
//
130130
auto type = load->getDataType();
131131

132+
// Validate the alignment contract first, so it is checked even when the type needs no
133+
// legalization and the early-out below fires.
134+
validateExplicitAlignment(load->getOperand(1), load->getOperand(2));
135+
132136
// We start by looking at the type being loaded so
133137
// that we can opt out if it is legal.
134138
//
@@ -240,6 +244,37 @@ struct ByteAddressBufferLegalizationContext
240244
return false;
241245
}
242246

247+
// Validate the `alignment`/`location` operands of a single byte-address load/store against the
248+
// `LoadAligned`/`StoreAligned` contract (issue #11545): a non-zero `alignment` promise must be
249+
// a compile-time constant (else 41302), and a compile-time-constant `location` must be a
250+
// multiple of it (else 41303). Diagnostics do not abort legalization; `alignment == 0` is the
251+
// "no promise" sentinel (plain `Load`/`Store`) and carries no contract to validate.
252+
void validateExplicitAlignment(IRInst* baseOffset, IRInst* alignment)
253+
{
254+
auto alignLit = as<IRIntLit>(alignment);
255+
if (!alignLit)
256+
{
257+
m_sink->diagnose(Diagnostics::ByteAddressBufferAlignmentNotConstant{
258+
.location = baseOffset->sourceLoc,
259+
});
260+
return;
261+
}
262+
auto alignmentVal = alignLit->getValue();
263+
if (alignmentVal == 0)
264+
return;
265+
if (auto baseOffsetLit = as<IRIntLit>(baseOffset))
266+
{
267+
if ((baseOffsetLit->getValue() % alignmentVal) != 0)
268+
{
269+
m_sink->diagnose(Diagnostics::ByteAddressBufferLocationNotAligned{
270+
.offset = baseOffsetLit->getValue(),
271+
.alignment = alignmentVal,
272+
.location = baseOffset->sourceLoc,
273+
});
274+
}
275+
}
276+
}
277+
243278
// Returns true if a vectorized load or store of `alignmentVal` bytes at
244279
// `baseOffset + immediateOffset` is known to be sufficiently aligned.
245280
bool isAligned(
@@ -1316,6 +1351,10 @@ struct ByteAddressBufferLegalizationContext
13161351
auto value = store->getOperand(3);
13171352
auto type = value->getDataType();
13181353

1354+
// Validate the alignment contract first, so it is checked even when the type needs no
1355+
// legalization and the early-out below fires.
1356+
validateExplicitAlignment(store->getOperand(1), store->getOperand(2));
1357+
13191358
// Types that are already legal to use don't require any processing.
13201359
//
13211360
if (isTypeLegalForByteAddressLoadStore(type))

tests/compute/byte-address-buffer-aligned.slang

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -112,10 +112,15 @@ void computeMain(uint3 threadId : SV_DispatchThreadID)
112112
// CHECK3-DAG: OpStore %[[V33]] %[[V28]]
113113
// CHECK3-DAG: %[[V34:[a-zA-Z0-9_]+]] = OpAccessChain %[[SBf]] %[[BUF00]]
114114
// CHECK3-DAG: OpStore %[[V34]] %[[V30]]
115-
buffer0.StoreAligned(32, buffer0.LoadAligned<float4>(32));
116-
buffer0.StoreAligned(32, buffer0.LoadAligned<float4>(8));
117-
buffer0.StoreAligned(8, buffer0.LoadAligned<float4>(32));
118-
buffer0.StoreAligned(8, buffer0.LoadAligned<float4>(8));
115+
// Per issue #11545 the alignment must be a power of two and a constant location must be a
116+
// multiple of it, so the wide-vs-scalarized choice is requested through an explicit alignment
117+
// rather than a (now self-contradictory) misaligned single-argument call. A `float4` whose
118+
// natural alignment (16) is satisfied loads/stores wide; an explicit alignment of 4 forces the
119+
// per-component path.
120+
buffer0.StoreAligned(32, buffer0.LoadAligned<float4>(32)); // wide load + wide store
121+
buffer0.StoreAligned(32, buffer0.LoadAligned<float4>(8, 4)); // scalarized load + wide store
122+
buffer0.Store<float4>(8, buffer0.LoadAligned<float4>(32), 4); // wide load + scalarized store
123+
buffer0.Store<float4>(8, buffer0.LoadAligned<float4>(8, 4), 4); // scalarized load + store
119124
}
120125

121126
// CHECK_CUDA: ).Load<float4{{.*}}>(32U);

tests/compute/byte-address-buffer-array.slang

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,10 @@ void computeMain(uint3 threadId : SV_DispatchThreadID)
3939
// CHECK3: OpLoad %v4float
4040
// CHECK3: OpStore
4141
buffer.Store(0, buffer.LoadAligned<Block>(0));
42-
buffer.StoreAligned(16, buffer.LoadAligned<Block>(4, 16));
42+
// Load at scalar (4-byte) alignment -> per-component loads; store at 16-byte alignment ->
43+
// the two `float4` rows are stored wide. (Per issue #11545 the alignment must be a power of
44+
// two and the constant location a multiple of it, so the previous `LoadAligned<Block>(4, 16)`
45+
// / `StoreAligned(16, ...)` self-contradictory calls are expressed with valid alignments.)
46+
buffer.Store<Block>(16, buffer.LoadAligned<Block>(4, 4), 16);
4347
}
4448

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
// byte-address-buffer-contract-11590.slang
2+
3+
// Diagnostic test for https://github.com/shader-slang/slang/issues/11545 — Slice 1 (#11590).
4+
//
5+
// The `alignment` argument of `(RW)ByteAddressBuffer.LoadAligned`/`Store` is a compile-time
6+
// contract: it must be a compile-time constant, and a compile-time-constant `location` must be a
7+
// multiple of it. Each violation is reported as a distinct diagnostic.
8+
9+
//TEST:SIMPLE(filecheck=CHECK):-target hlsl -entry computeMain -stage compute
10+
//TEST:SIMPLE(filecheck=CHECK):-target spirv -entry computeMain -stage compute
11+
12+
RWByteAddressBuffer buffer;
13+
14+
[shader("compute")]
15+
[numthreads(1, 1, 1)]
16+
void computeMain(uint3 tid : SV_DispatchThreadID)
17+
{
18+
uint dynamicAlignment = tid.x;
19+
20+
// A non-constant alignment cannot be honored as a compile-time contract.
21+
// CHECK-DAG: error[E41302]
22+
float4 a = buffer.LoadAligned<float4>(0, dynamicAlignment);
23+
24+
// A compile-time-constant location that is not a multiple of the alignment is a
25+
// self-contradictory call.
26+
// CHECK-DAG: error[E41303]
27+
float4 c = buffer.LoadAligned<float4>(20, 16);
28+
29+
buffer.Store<float4>(0, a + c);
30+
}

0 commit comments

Comments
 (0)