Commit e8106c0
committed
Add div_approx and inverse_approx builtins for approximate PTX intrinsics (GH-1199)
Add standalone wp.div_approx() and wp.inverse_approx() builtins that
use fast GPU intrinsics (div.approx.f32, rcp.approx.ftz.f64) for
approximate division and matrix inverse. Only floating-point types are
supported; falls back to exact arithmetic on CPU.
Signed-off-by: Eric Shi <ershi@nvidia.com>1 parent 2d78485 commit e8106c0
File tree
9 files changed
+735
-7
lines changed- docs/language_reference
- warp
- _src
- native
- tests
9 files changed
+735
-7
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
37 | 37 | | |
38 | 38 | | |
39 | 39 | | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
40 | 44 | | |
41 | 45 | | |
42 | 46 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
75 | 75 | | |
76 | 76 | | |
77 | 77 | | |
| 78 | + | |
78 | 79 | | |
79 | 80 | | |
80 | 81 | | |
| |||
390 | 391 | | |
391 | 392 | | |
392 | 393 | | |
| 394 | + | |
393 | 395 | | |
394 | 396 | | |
395 | 397 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2237 | 2237 | | |
2238 | 2238 | | |
2239 | 2239 | | |
| 2240 | + | |
| 2241 | + | |
| 2242 | + | |
| 2243 | + | |
| 2244 | + | |
| 2245 | + | |
| 2246 | + | |
| 2247 | + | |
| 2248 | + | |
| 2249 | + | |
| 2250 | + | |
| 2251 | + | |
| 2252 | + | |
| 2253 | + | |
| 2254 | + | |
| 2255 | + | |
| 2256 | + | |
| 2257 | + | |
| 2258 | + | |
| 2259 | + | |
| 2260 | + | |
| 2261 | + | |
| 2262 | + | |
| 2263 | + | |
2240 | 2264 | | |
2241 | 2265 | | |
2242 | 2266 | | |
| |||
6665 | 6689 | | |
6666 | 6690 | | |
6667 | 6691 | | |
| 6692 | + | |
| 6693 | + | |
| 6694 | + | |
| 6695 | + | |
| 6696 | + | |
| 6697 | + | |
| 6698 | + | |
| 6699 | + | |
| 6700 | + | |
| 6701 | + | |
| 6702 | + | |
| 6703 | + | |
| 6704 | + | |
| 6705 | + | |
| 6706 | + | |
| 6707 | + | |
| 6708 | + | |
| 6709 | + | |
| 6710 | + | |
| 6711 | + | |
| 6712 | + | |
| 6713 | + | |
| 6714 | + | |
| 6715 | + | |
| 6716 | + | |
| 6717 | + | |
| 6718 | + | |
| 6719 | + | |
| 6720 | + | |
| 6721 | + | |
| 6722 | + | |
| 6723 | + | |
| 6724 | + | |
| 6725 | + | |
| 6726 | + | |
| 6727 | + | |
| 6728 | + | |
| 6729 | + | |
| 6730 | + | |
| 6731 | + | |
| 6732 | + | |
| 6733 | + | |
| 6734 | + | |
| 6735 | + | |
| 6736 | + | |
| 6737 | + | |
| 6738 | + | |
| 6739 | + | |
| 6740 | + | |
| 6741 | + | |
| 6742 | + | |
| 6743 | + | |
| 6744 | + | |
| 6745 | + | |
| 6746 | + | |
| 6747 | + | |
| 6748 | + | |
| 6749 | + | |
| 6750 | + | |
| 6751 | + | |
6668 | 6752 | | |
6669 | 6753 | | |
6670 | 6754 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
843 | 843 | | |
844 | 844 | | |
845 | 845 | | |
| 846 | + | |
| 847 | + | |
| 848 | + | |
| 849 | + | |
| 850 | + | |
| 851 | + | |
| 852 | + | |
| 853 | + | |
| 854 | + | |
| 855 | + | |
| 856 | + | |
| 857 | + | |
| 858 | + | |
| 859 | + | |
| 860 | + | |
| 861 | + | |
| 862 | + | |
| 863 | + | |
| 864 | + | |
| 865 | + | |
| 866 | + | |
| 867 | + | |
| 868 | + | |
| 869 | + | |
| 870 | + | |
| 871 | + | |
| 872 | + | |
| 873 | + | |
| 874 | + | |
| 875 | + | |
| 876 | + | |
| 877 | + | |
| 878 | + | |
| 879 | + | |
| 880 | + | |
| 881 | + | |
| 882 | + | |
| 883 | + | |
| 884 | + | |
846 | 885 | | |
847 | 886 | | |
848 | 887 | | |
| |||
11079 | 11118 | | |
11080 | 11119 | | |
11081 | 11120 | | |
| 11121 | + | |
| 11122 | + | |
| 11123 | + | |
| 11124 | + | |
| 11125 | + | |
| 11126 | + | |
| 11127 | + | |
| 11128 | + | |
| 11129 | + | |
| 11130 | + | |
| 11131 | + | |
| 11132 | + | |
| 11133 | + | |
| 11134 | + | |
| 11135 | + | |
| 11136 | + | |
| 11137 | + | |
| 11138 | + | |
| 11139 | + | |
| 11140 | + | |
| 11141 | + | |
| 11142 | + | |
| 11143 | + | |
| 11144 | + | |
| 11145 | + | |
| 11146 | + | |
| 11147 | + | |
| 11148 | + | |
| 11149 | + | |
| 11150 | + | |
| 11151 | + | |
| 11152 | + | |
| 11153 | + | |
| 11154 | + | |
| 11155 | + | |
| 11156 | + | |
| 11157 | + | |
| 11158 | + | |
| 11159 | + | |
| 11160 | + | |
| 11161 | + | |
| 11162 | + | |
| 11163 | + | |
| 11164 | + | |
| 11165 | + | |
| 11166 | + | |
| 11167 | + | |
| 11168 | + | |
| 11169 | + | |
| 11170 | + | |
| 11171 | + | |
| 11172 | + | |
| 11173 | + | |
| 11174 | + | |
| 11175 | + | |
| 11176 | + | |
| 11177 | + | |
| 11178 | + | |
| 11179 | + | |
| 11180 | + | |
| 11181 | + | |
| 11182 | + | |
| 11183 | + | |
| 11184 | + | |
| 11185 | + | |
| 11186 | + | |
| 11187 | + | |
| 11188 | + | |
| 11189 | + | |
| 11190 | + | |
| 11191 | + | |
| 11192 | + | |
| 11193 | + | |
| 11194 | + | |
| 11195 | + | |
| 11196 | + | |
| 11197 | + | |
| 11198 | + | |
| 11199 | + | |
| 11200 | + | |
| 11201 | + | |
| 11202 | + | |
| 11203 | + | |
11082 | 11204 | | |
11083 | 11205 | | |
11084 | 11206 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
182 | 182 | | |
183 | 183 | | |
184 | 184 | | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
185 | 237 | | |
186 | 238 | | |
187 | 239 | | |
| |||
475 | 527 | | |
476 | 528 | | |
477 | 529 | | |
| 530 | + | |
| 531 | + | |
| 532 | + | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
| 543 | + | |
478 | 544 | | |
479 | 545 | | |
480 | 546 | | |
| |||
0 commit comments