Commit ec6e822
committed
Implement approx_tanh for ROCm using OCML tanh function
AMD CDNA3 (MI300X/gfx942) does not have a hardware tanh instruction like
NVIDIA's PTX tanh.approx. This implements approx_tanh for ROCm using:
- For f32 (and f16/bf16 via casting): Triton's __triton_hip_fast_tanhf
which uses a fast exp-based formula: tanh(x) = (exp(2x) - 1) / (exp(2x) + 1)
- For f64: OCML's __ocml_tanh_f64 (AMD's Open Compute Math Library)
Changes:
- Add f64 support to approx_tanh function
- Add ROCm platform detection in _elementwise_inline_asm_lowering
- Add _approx_tanh_rocm_lowering function for ROCm-specific lowering
- Add test_approx_tanh test with f16/bf16/f32/f64 support
See: triton-lang/triton#77801 parent 09e023e commit ec6e822
File tree
2 files changed
+133
-0
lines changed- jax/_src/pallas/triton
- tests/pallas
2 files changed
+133
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
48 | 48 | | |
49 | 49 | | |
50 | 50 | | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
51 | 56 | | |
52 | 57 | | |
53 | 58 | | |
| |||
119 | 124 | | |
120 | 125 | | |
121 | 126 | | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
122 | 134 | | |
123 | 135 | | |
124 | 136 | | |
| |||
129 | 141 | | |
130 | 142 | | |
131 | 143 | | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
132 | 224 | | |
133 | 225 | | |
134 | 226 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1870 | 1870 | | |
1871 | 1871 | | |
1872 | 1872 | | |
| 1873 | + | |
| 1874 | + | |
| 1875 | + | |
| 1876 | + | |
| 1877 | + | |
| 1878 | + | |
| 1879 | + | |
| 1880 | + | |
| 1881 | + | |
| 1882 | + | |
| 1883 | + | |
| 1884 | + | |
| 1885 | + | |
| 1886 | + | |
| 1887 | + | |
| 1888 | + | |
| 1889 | + | |
| 1890 | + | |
| 1891 | + | |
| 1892 | + | |
| 1893 | + | |
| 1894 | + | |
| 1895 | + | |
| 1896 | + | |
| 1897 | + | |
| 1898 | + | |
| 1899 | + | |
| 1900 | + | |
| 1901 | + | |
| 1902 | + | |
| 1903 | + | |
| 1904 | + | |
| 1905 | + | |
| 1906 | + | |
| 1907 | + | |
| 1908 | + | |
| 1909 | + | |
| 1910 | + | |
| 1911 | + | |
| 1912 | + | |
| 1913 | + | |
1873 | 1914 | | |
1874 | 1915 | | |
1875 | 1916 | | |
| |||
0 commit comments