[2/3] ByteAddressBuffer: decide wide-vs-scalarized + validate the alignment contract#11595
[2/3] ByteAddressBuffer: decide wide-vs-scalarized + validate the alignment contract#11595nv-slang-bot[bot] wants to merge 3 commits into
Conversation
Mark the `alignment` parameter of `LoadAligned`/`Store` and the underlying `__byteAddressBufferLoad`/`__byteAddressBufferStore` intrinsics `constexpr`, so a non-compile-time-constant alignment is rejected by the language's built-in constexpr enforcement (diagnostic 40012) instead of a bespoke check. Front-end enforcement, so it is target-agnostic. Part of the #11545 ByteAddressBuffer alignment-contract decomposition (slice 1 of 4). Co-authored-by: Harsh Aggarwal <haaggarwal@nvidia.com>
…e alone Replace the decide-and-diagnose `isAligned` with a pure, promise-first `isWideAccessAligned`: a non-zero `constexpr` alignment promise is the sole input to the wide-vs-scalarized decision, so a constant and a runtime `location` carrying the same promise legalize identically. A compile-time-constant base-offset fallback is retained only for the no-promise (`alignment == 0`) plain `Load`/`Store` forms (dropping it verbatim would wide-load every no-promise access at an unproven offset). Assert the alignment operand is an `IRIntLit`, guaranteed by slice 1's `constexpr`. Part of #11545 (slice 2 of 4). NOTE (pending boundary decision): removing the in-predicate 41300 diagnostic leaves tests/compute/byte-address-buffer-align-error.slang expecting an error that slice 3 reintroduces (relaxed, as 41301). That test is transiently red in this slice under a 4-way (Option X) split; a 2+3 merge (Option Y) avoids it. Co-authored-by: Harsh Aggarwal <haaggarwal@nvidia.com>
…gnment Add the byte-address alignment value checks to the legalizer in validateExplicitAlignment: a non-zero `alignment` promise must be a power of two (41301) and at least the access type's scalar-component alignment (41300), and a compile-time-constant `location` must be a multiple of an otherwise-valid promise (41303). Compile-time-constness itself is enforced by the `constexpr` `alignment` parameter (front-end diagnostic 40012), so no dedicated diagnostic is needed here. Introduce the `getNaturalAlignment` IR op (`__naturalAlignmentOf<T>`) — the largest power of two dividing `T`'s natural stride — and switch the implicit single-argument `*Aligned` accessors to forward it instead of the stride, so a `float3` (12-byte stride, not a power of two) forwards a valid alignment of 4 and is no longer spuriously rejected by 41303. This merges the consistency (point 3) and validation/natural-alignment (issue #11592) work into one change so the pre-existing alignment diagnostic transitions from E41300 to E41301 in a single coherent step. Co-authored-by: Harsh Aggarwal <haaggarwal@nvidia.com>
bce5d76 to
f7fe921
Compare
befc313 to
8adee03
Compare
|
@jkwak-work — a sequencing question on the #11545 decomposition before I rework this slice. This PR (Slice 2) currently carries diagnostic 41303 ( 41303 really couples with the Recommendation: defer 41303 to Slice 3 and make this PR a pure consistency change (constant vs. runtime Update — CI triage on the current diff (
Note (2026-06-22): the shared base 🤖 Generated by an automated Slang coworker — may be inaccurate. A human maintainer should verify. |
8adee03 to
eda63ff
Compare
|
@nv-slang-bot , the first slice of this set of PRs is merged to ToT. |
|
On it — rebasing #11595 onto current ToT ( 🤖 Generated by an automated Slang coworker — may be inaccurate. A human maintainer should verify. |
|
@nv-slang-bot , please resolve the merge conflict with master and rebase. |
|
On it, @jkwak-work — the rebase onto current ToT is staged (dropping the now-merged 1/3 commit, 🤖 Generated by an automated Slang coworker — may be inaccurate. A human maintainer should verify. |
|
@jkwak-work — the rebase is done and verified locally, but I hit a hard wall pushing it: this bot account is a GitHub App without What the rebase produced (verified against current ToT
Could you either rebase/push this branch yourself (you have 🤖 Generated by an automated Slang coworker — may be inaccurate. A human maintainer should verify. |
Part 2 of 3 decomposing the original #11564 for #11545. Stacked on #11594 ([1/3], the
constexprcontract); rebase down to master once [1/3] lands. Held as draft pending review.Closes #11591. Closes #11592.
Motivation
With the
alignmentargument established as a compile-time (constexpr) contract in [1/3], the legalizer can stop re-deriving alignment from the load/store offset and instead make wide-vs-scalarized chunking a pure function of the promise. Two gaps remain that this PR closes:Load/LoadAlignedbehave consistently.alignmentpromise must be validated (power of two; at leastT's scalar-component alignment), a constantlocationmust be a multiple of it, and the implicit single-argument*Alignedaccessors must forward a valid natural alignment — today afloat3(12-byte stride, not a power of two) is forwarded as its stride and spuriously rejected.Motivating case for the single-arg fix:
Proposed solution
Predicate refactor (ByteAddressBuffer alignment: decide wide-vs-scalarized from the alignment promise alone (consistency) #11591). Replace the offset-re-deriving
isAlignedwith a pureisWideAccessAligned(baseOffset, immediateOffset, promisedAlignment, accessSize). Per maintainer review, the promise is nowSLANG_RELEASE_ASSERT(alignLit)-guaranteed to be a literal after [1/3]'sconstexprfolding. ThebaseOffsetoperand is retained only for thealignment == 0(no-promise) case — dropping it there would let0 % accessSize == 0mark every plainLoad/Storewide-aligned at an unproven offset (a miscompile). For any non-zero promise, the decision ispromise % accessSize == 0 && immediateOffset % accessSize == 0andbaseOffsetis unused.Validation (ByteAddressBuffer alignment: power-of-two requirement, scalar-component relaxation, single-arg natural-alignment fix #11592) in
validateExplicitAlignment: a non-zero promise must be a power of two (E41301) and at least the access type's scalar-component alignment (E41300, relaxed wording); a compile-time-constantlocationmust be a multiple of an otherwise-valid promise (E41303). The custom 41302 diagnostic is removed — compile-time-constness is enforced by theconstexprparameter itself (front-end E40012need-compile-time-constant), so no dedicated diagnostic is needed.Natural-alignment forwarding (ByteAddressBuffer alignment: power-of-two requirement, scalar-component relaxation, single-arg natural-alignment fix #11592). Introduce the
getNaturalAlignmentIR op (__naturalAlignmentOf<T>— the largest power of two dividingT's natural stride) and switch the implicit single-argument*Alignedaccessors to forward it instead of the stride, sofloat3forwards a valid4.These two issues are merged into one PR so the pre-existing alignment diagnostic transitions E41300 → E41301 in a single coherent commit — splitting them would leave
byte-address-buffer-align-error.slangred in the window between the predicate refactor and the validation restore.Change summary
slang-ir-byte-address-legalize.cppisAligned→ pureisWideAccessAligned;validateExplicitAlignment(41301 / relaxed-41300 / 41303)slang-diagnostics.luaslang-ir-insts.lua,slang-ir-insts-stable-names.lua,slang-ir.h,slang-ir-peephole.cpp,slang-ir-use-uninitialized-values.cppgetNaturalAlignmentop + constant-foldhlsl.meta.slang,core.meta.slang*Alignedforwards__naturalAlignmentOf<T>byte-address-buffer-align-error.slang(E41300→E41301),byte-address-buffer-alignment-11545.slang(DIAGNOSTIC_TEST),byte-address-buffer-singlearg-float3-11592.slang(-cpuCOMPARE_COMPUTE),*-aligned,*-arrayConcepts and vocabulary
constexpr alignmentargument; after [1/3]'s folding it reaches the legalizer as anIRIntLit.0is the no-promise sentinel for plainLoad/Store.__naturalAlignmentOf<T>— largest power of two dividingT's natural stride (float4→16,float2/half4→8,float3→4). Distinct from__naturalStrideOf<T>, which can be a non-power-of-two (float3→12).Process report
isWideAccessAlignedis a pure predicate over(promise, immediateOffset, accessSize)— the input shape (a foldedIRIntLitpromise) is guaranteed by [1/3]'sconstexpr+ SCCP, so theSLANG_RELEASE_ASSERT(alignLit)is a fail-loud contract check, not a guard over malformed input. ThebaseOffsetretention is the one principled exception (thealignment == 0no-promise path has no literal promise to reason from), justified above.constexpr, E40012), not something the IR legalizer should re-diagnose.getNaturalAlignmentis a new canonical IR op rather than an open-coded computation so the single-arg forwarding and any future consumer share one definition and constant-fold identically.🤖 Generated by an automated Slang coworker — may be inaccurate. A human maintainer should verify.