Commit cee62bb
authored
triton_kernels: use tl.clamp whenever possible (#8728)
In many cases `tl.clamp` can be compiled to a single instruction rather
than two instructions for `tl.minimum` + `tl.maximum`:
- Output saturation instruction modifier to clamp output to [0, 1]
- `min.xorsign.abs.f32` instructions on Hopper+
- `V_MED3_F32` instructions on AMD
# New contributor declaration
- [x] I am not making a trivial change, such as fixing a typo in a
comment.
- [x] I have written a PR description following these
[rules](https://cbea.ms/git-commit/#why-not-how).
- [x] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`.
- Select one of the following.
- [ ] I have added tests.
- `/test` for `lit` tests
- `/unittest` for C++ tests
- `/python/test` for end-to-end tests
- [x] This PR does not need a test because `expected to be bit-exact`.
- Select one of the following.
- [x] I have not added any `lit` tests.
- [ ] The `lit` tests I have added follow these [best
practices](https://mlir.llvm.org/getting_started/TestingGuide/#filecheck-best-practices),
including the "tests should be minimal" section. (Usually running Python
code
and using the instructions it generates is not minimal.)1 parent 3ebfe54 commit cee62bb
2 files changed
Lines changed: 4 additions & 5 deletions
Lines changed: 1 addition & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
115 | 115 | | |
116 | 116 | | |
117 | 117 | | |
118 | | - | |
119 | | - | |
120 | | - | |
| 118 | + | |
121 | 119 | | |
122 | 120 | | |
123 | 121 | | |
| |||
Lines changed: 3 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
8 | | - | |
9 | 8 | | |
10 | | - | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
11 | 12 | | |
12 | 13 | | |
13 | 14 | | |
| |||
0 commit comments