Validate incompatible cache_modifier/eviction_policy combinations in NVIDIA backend#10003
Validate incompatible cache_modifier/eviction_policy combinations in NVIDIA backend#10003swjng wants to merge 1 commit intotriton-lang:mainfrom
cache_modifier/eviction_policy combinations in NVIDIA backend#10003Conversation
cache_modifier and eviction_policy
264e82f to
a13cfd0
Compare
|
|
||
| def load(self, ptr: TensorTy, mask: Optional[TensorTy], other: Optional[TensorTy], boundary_check: Tuple, | ||
| padding_option: str, cache_modifier: str, eviction_policy: str, is_volatile: bool) -> TensorTy: | ||
| # Validate PTX-incompatible combinations before emitting IR. |
There was a problem hiding this comment.
load is generic across different backends and not always lowered to PTX
There was a problem hiding this comment.
Thanks for the feedback. Moved the validation out of semantic.py into LoadStoreOpToLLVM.cpp in the NVIDIA PTX codegen pass, right before the PTX instruction builder runs. The check now lives at the correct layer — it only fires for the NVIDIA backend and is expressed as op.emitOpError() returning failure(). Updated the tests to expect triton.CompilationError instead of ValueError.
|
ThomasRaoux
left a comment
There was a problem hiding this comment.
In triton we should not error in the compiler from a valid kernel
13ae582 to
8b4f25c
Compare
…NVIDIA backend
When tl.load/tl.store is called with a PTX-illegal combination of
cache_modifier and eviction_policy, Triton previously emitted PTX
containing both modifiers and let ptxas fail with an opaque assembler
error:
ptxas error: Modifier '.evict_first' cannot be combined with modifier '.cs'
Users saw a low-level message with no indication of which Python
arguments caused it.
Add validation in LoadStoreOpToLLVM.cpp (NVIDIA-specific PTX lowering)
that emits a clear compilation error before any PTX is generated.
Placing the check in the NVIDIA backend, not in backend-agnostic
semantic.py, keeps the frontend neutral to PTX ISA constraints.
PTX-illegal combinations covered:
| op | cache_modifier | eviction_policy |
|-------|----------------|------------------------------|
| store | .cs | evict_first, evict_last |
| store | .cg | evict_first |
| load | .ca | evict_first, evict_last |
| load | .cg | evict_first |
8b4f25c to
174ce21
Compare
cache_modifier and eviction_policycache_modifier/eviction_policy combinations in NVIDIA backend
|
@Jokeren — moved the test to @ThomasRaoux — on reflection the kernel is valid at the Triton level (both One alternative before closing: the NVIDIA backend could silently drop one of the two hints (e.g. keep |
New contributor declaration
I am not making a trivial change, such as fixing a typo in a comment.
I have written a PR description following these rules.
I have run
pre-commit run --from-ref origin/main --to-ref HEAD.Select one of the following.
/testforlittestsSelect one of the following.
littests I have added follow these best practices, including the "tests should be minimal" section.Why
When
tl.load/tl.storeis called with a PTX-illegal combination ofcache_modifierandeviction_policy, Triton emits PTX that ptxas then rejects with an opaque assembler error:Users see a low-level message with no indication of which Python arguments caused it. The PTX ISA forbids combining certain cache modifiers with L1 eviction hints:
.cs(cache streaming) bypasses L1; L1 eviction hints are undefined..ca(cache all, load) overrides L1 eviction policy; hints conflict..cg(cache global) bypasses L1;evict_firstis invalid.What the fix does
Validate the combinations at PTX lowering time in
third_party/nvidia/lib/TritonNVIDIAGPUToLLVM/LoadStoreOpToLLVM.cpp(LoadOpConversion::matchAndRewriteandStoreOpConversion::matchAndRewrite). On an illegal combination, emit a clearop.emitOpErrorbefore generating PTX.The check lives in the NVIDIA backend, not the backend-agnostic
semantic.py, so the Triton frontend remains neutral to PTX ISA constraints.Combinations covered:
.csevict_first,evict_last.cgevict_first.caevict_first,evict_last.cgevict_firstExample error after this fix:
Test plan
Added lit test cases to
test/Conversion/tritongpu_to_llvm.mliralongside the existingstore_with_cache_attr/load_with_l2_cache_hint/store_with_l2_cache_hinttests.--verify-diagnosticswas added to the existing RUN line so each illegal combination is checked viaexpected-error. Existing valid-path tests continue to pass unchanged.