-
Notifications
You must be signed in to change notification settings - Fork 263
Extend LLVM 18 workaround to other float types. #3016
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| @eval begin | ||
| @device_override @inline Base.FastMath.max_fast(x::$T, y::$T) = ifelse(y > x, y, x) | ||
| @device_override @inline Base.FastMath.min_fast(x::$T, y::$T) = ifelse(y > x, x, y) | ||
| @device_override @inline Base.FastMath.minmax_fast(x::$T, y::$T) = ifelse(y > x, (x, y), (y, x)) | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you do something like
@device_override @inline Base.FastMath.max_fast(x::$T, y::$T) where {T<:Union{Float16, Float32, Float64}} = ifelse(y > x, y, x)just to avoid the loop
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm always wary for doing so, because the Base method may then end up being more specific (and we really want these to apply). In this case, Base doesn't use metaprogramming so I guess it could work..
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #3016 +/- ##
=======================================
Coverage 89.31% 89.31%
=======================================
Files 148 148
Lines 12995 12995
=======================================
Hits 11606 11606
Misses 1389 1389 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CUDA.jl Benchmarks
Details
| Benchmark suite | Current: 97f7c1e | Previous: da38676 | Ratio |
|---|---|---|---|
latency/precompile |
55584314156 ns |
55180480419 ns |
1.01 |
latency/ttfp |
7717855921.5 ns |
7807018821.5 ns |
0.99 |
latency/import |
4025438307 ns |
4140006046.5 ns |
0.97 |
integration/volumerhs |
9623132 ns |
9623640.5 ns |
1.00 |
integration/byval/slices=1 |
146664 ns |
147119 ns |
1.00 |
integration/byval/slices=3 |
425554 ns |
426080.5 ns |
1.00 |
integration/byval/reference |
144978 ns |
145207 ns |
1.00 |
integration/byval/slices=2 |
286179 ns |
286491 ns |
1.00 |
integration/cudadevrt |
103398 ns |
103866 ns |
1.00 |
kernel/indexing |
14096 ns |
14460 ns |
0.97 |
kernel/indexing_checked |
14790 ns |
15250 ns |
0.97 |
kernel/occupancy |
679.2064516129033 ns |
680.6470588235294 ns |
1.00 |
kernel/launch |
2083.2 ns |
2197.4444444444443 ns |
0.95 |
kernel/rand |
15013 ns |
18784 ns |
0.80 |
array/reverse/1d |
19771 ns |
20177 ns |
0.98 |
array/reverse/2dL_inplace |
66721 ns |
67023 ns |
1.00 |
array/reverse/1dL |
69914 ns |
70421 ns |
0.99 |
array/reverse/2d |
21925 ns |
22600 ns |
0.97 |
array/reverse/1d_inplace |
11577 ns |
10092 ns |
1.15 |
array/reverse/2d_inplace |
13285 ns |
13669 ns |
0.97 |
array/reverse/2dL |
74024 ns |
74807 ns |
0.99 |
array/reverse/1dL_inplace |
66907 ns |
67197 ns |
1.00 |
array/copy |
20556 ns |
20909 ns |
0.98 |
array/iteration/findall/int |
157600.5 ns |
159075 ns |
0.99 |
array/iteration/findall/bool |
139621 ns |
140579 ns |
0.99 |
array/iteration/findfirst/int |
160826 ns |
161050 ns |
1.00 |
array/iteration/findfirst/bool |
161498 ns |
162329.5 ns |
0.99 |
array/iteration/scalar |
73528 ns |
74588 ns |
0.99 |
array/iteration/logical |
213816 ns |
216756.5 ns |
0.99 |
array/iteration/findmin/1d |
91481 ns |
96345.5 ns |
0.95 |
array/iteration/findmin/2d |
121664 ns |
122694 ns |
0.99 |
array/reductions/reduce/Int64/1d |
42940 ns |
43621 ns |
0.98 |
array/reductions/reduce/Int64/dims=1 |
50581 ns |
44622.5 ns |
1.13 |
array/reductions/reduce/Int64/dims=2 |
61611 ns |
61757 ns |
1.00 |
array/reductions/reduce/Int64/dims=1L |
89000 ns |
89064 ns |
1.00 |
array/reductions/reduce/Int64/dims=2L |
88050 ns |
88156 ns |
1.00 |
array/reductions/reduce/Float32/1d |
36523.5 ns |
38304 ns |
0.95 |
array/reductions/reduce/Float32/dims=1 |
42444 ns |
42098.5 ns |
1.01 |
array/reductions/reduce/Float32/dims=2 |
59795 ns |
60277 ns |
0.99 |
array/reductions/reduce/Float32/dims=1L |
52444 ns |
52722 ns |
0.99 |
array/reductions/reduce/Float32/dims=2L |
71866 ns |
72438 ns |
0.99 |
array/reductions/mapreduce/Int64/1d |
43028 ns |
43667 ns |
0.99 |
array/reductions/mapreduce/Int64/dims=1 |
44303 ns |
45026 ns |
0.98 |
array/reductions/mapreduce/Int64/dims=2 |
61445 ns |
62167 ns |
0.99 |
array/reductions/mapreduce/Int64/dims=1L |
89047 ns |
89081 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=2L |
87991 ns |
88471 ns |
0.99 |
array/reductions/mapreduce/Float32/1d |
36026 ns |
38399 ns |
0.94 |
array/reductions/mapreduce/Float32/dims=1 |
51970 ns |
41819.5 ns |
1.24 |
array/reductions/mapreduce/Float32/dims=2 |
59669 ns |
60230.5 ns |
0.99 |
array/reductions/mapreduce/Float32/dims=1L |
52620 ns |
52828 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=2L |
72070 ns |
72045.5 ns |
1.00 |
array/broadcast |
19836 ns |
20443 ns |
0.97 |
array/copyto!/gpu_to_gpu |
12834 ns |
13022 ns |
0.99 |
array/copyto!/cpu_to_gpu |
214812 ns |
216732 ns |
0.99 |
array/copyto!/gpu_to_cpu |
287929 ns |
283462 ns |
1.02 |
array/accumulate/Int64/1d |
124865 ns |
124912 ns |
1.00 |
array/accumulate/Int64/dims=1 |
83675 ns |
84105 ns |
0.99 |
array/accumulate/Int64/dims=2 |
158830 ns |
158348 ns |
1.00 |
array/accumulate/Int64/dims=1L |
1710434 ns |
1710807 ns |
1.00 |
array/accumulate/Int64/dims=2L |
966856.5 ns |
966629 ns |
1.00 |
array/accumulate/Float32/1d |
108995 ns |
109358 ns |
1.00 |
array/accumulate/Float32/dims=1 |
80096 ns |
80805 ns |
0.99 |
array/accumulate/Float32/dims=2 |
147591 ns |
148060.5 ns |
1.00 |
array/accumulate/Float32/dims=1L |
1619034 ns |
1619572.5 ns |
1.00 |
array/accumulate/Float32/dims=2L |
697841 ns |
698871 ns |
1.00 |
array/construct |
1287.4 ns |
1243.1 ns |
1.04 |
array/random/randn/Float32 |
46521.5 ns |
48790 ns |
0.95 |
array/random/randn!/Float32 |
24942 ns |
25388 ns |
0.98 |
array/random/rand!/Int64 |
27269 ns |
27419 ns |
0.99 |
array/random/rand!/Float32 |
8736.333333333334 ns |
9072.666666666666 ns |
0.96 |
array/random/rand/Int64 |
29927 ns |
29902 ns |
1.00 |
array/random/rand/Float32 |
13355 ns |
13324 ns |
1.00 |
array/permutedims/4d |
54940 ns |
57303.5 ns |
0.96 |
array/permutedims/2d |
54110 ns |
54025.5 ns |
1.00 |
array/permutedims/3d |
54939.5 ns |
55054.5 ns |
1.00 |
array/sorting/1d |
2758522 ns |
2759246 ns |
1.00 |
array/sorting/by |
3345836 ns |
3345927 ns |
1.00 |
array/sorting/2d |
1080524 ns |
1082310 ns |
1.00 |
cuda/synchronization/stream/auto |
1041.9 ns |
1045.9 ns |
1.00 |
cuda/synchronization/stream/nonblocking |
7703.5 ns |
7338.6 ns |
1.05 |
cuda/synchronization/stream/blocking |
836.9480519480519 ns |
812.9347826086956 ns |
1.03 |
cuda/synchronization/context/auto |
1193.2 ns |
1170.7 ns |
1.02 |
cuda/synchronization/context/nonblocking |
7080.9 ns |
7720.5 ns |
0.92 |
cuda/synchronization/context/blocking |
933.8857142857142 ns |
910.6818181818181 ns |
1.03 |
This comment was automatically generated by workflow using github-action-benchmark.
Extends #2937, fixes #2946