Merged
Conversation
[only benchmarks]
552b359 to
3f78272
Compare
Contributor
There was a problem hiding this comment.
CUDA.jl Benchmarks
Details
| Benchmark suite | Current: 3f78272 | Previous: e2eab84 | Ratio |
|---|---|---|---|
latency/precompile |
45051704463.5 ns |
55351775865 ns |
0.81 |
latency/ttfp |
14298177097 ns |
7675959782 ns |
1.86 |
latency/import |
4533035006 ns |
4025008043 ns |
1.13 |
integration/volumerhs |
9439094.5 ns |
9625265.5 ns |
0.98 |
integration/byval/slices=1 |
145633.5 ns |
147037 ns |
0.99 |
integration/byval/slices=3 |
422923 ns |
426089 ns |
0.99 |
integration/byval/reference |
143856 ns |
144983 ns |
0.99 |
integration/byval/slices=2 |
284175 ns |
286492 ns |
0.99 |
integration/cudadevrt |
102423 ns |
103531 ns |
0.99 |
kernel/indexing |
13579 ns |
14161 ns |
0.96 |
kernel/indexing_checked |
14005 ns |
15039 ns |
0.93 |
kernel/occupancy |
692.6013513513514 ns |
792.6237623762377 ns |
0.87 |
kernel/launch |
2197.6666666666665 ns |
2304.777777777778 ns |
0.95 |
kernel/rand |
14320 ns |
18605.5 ns |
0.77 |
array/reverse/1d |
18739 ns |
19692 ns |
0.95 |
array/reverse/2dL_inplace |
66142 ns |
66886 ns |
0.99 |
array/reverse/1dL |
68931 ns |
69816 ns |
0.99 |
array/reverse/2d |
20818 ns |
21785 ns |
0.96 |
array/reverse/1d_inplace |
8842.333333333334 ns |
9783 ns |
0.90 |
array/reverse/2d_inplace |
12691 ns |
13411 ns |
0.95 |
array/reverse/2dL |
72797.5 ns |
73680 ns |
0.99 |
array/reverse/1dL_inplace |
66162 ns |
66877 ns |
0.99 |
array/copy |
18055 ns |
20494 ns |
0.88 |
array/iteration/findall/int |
145603.5 ns |
159432 ns |
0.91 |
array/iteration/findall/bool |
130483 ns |
141132 ns |
0.92 |
array/iteration/findfirst/int |
84143 ns |
161404 ns |
0.52 |
array/iteration/findfirst/bool |
81456 ns |
162399 ns |
0.50 |
array/iteration/scalar |
68124 ns |
73008 ns |
0.93 |
array/iteration/logical |
195818.5 ns |
220542 ns |
0.89 |
array/iteration/findmin/1d |
84052.5 ns |
94388 ns |
0.89 |
array/iteration/findmin/2d |
116890 ns |
121456 ns |
0.96 |
array/reductions/reduce/Int64/1d |
39246 ns |
43817 ns |
0.90 |
array/reductions/reduce/Int64/dims=1 |
51371 ns |
44722 ns |
1.15 |
array/reductions/reduce/Int64/dims=2 |
58829 ns |
61524.5 ns |
0.96 |
array/reductions/reduce/Int64/dims=1L |
86990 ns |
88932 ns |
0.98 |
array/reductions/reduce/Int64/dims=2L |
84622 ns |
88232 ns |
0.96 |
array/reductions/reduce/Float32/1d |
34377 ns |
37401.5 ns |
0.92 |
array/reductions/reduce/Float32/dims=1 |
40438 ns |
51945.5 ns |
0.78 |
array/reductions/reduce/Float32/dims=2 |
56463 ns |
59868 ns |
0.94 |
array/reductions/reduce/Float32/dims=1L |
51417 ns |
52569 ns |
0.98 |
array/reductions/reduce/Float32/dims=2L |
70207 ns |
72250 ns |
0.97 |
array/reductions/mapreduce/Int64/1d |
39306 ns |
43757 ns |
0.90 |
array/reductions/mapreduce/Int64/dims=1 |
43505 ns |
51077 ns |
0.85 |
array/reductions/mapreduce/Int64/dims=2 |
58972 ns |
61716 ns |
0.96 |
array/reductions/mapreduce/Int64/dims=1L |
86971 ns |
89026 ns |
0.98 |
array/reductions/mapreduce/Int64/dims=2L |
84890 ns |
88266 ns |
0.96 |
array/reductions/mapreduce/Float32/1d |
33962 ns |
37130 ns |
0.91 |
array/reductions/mapreduce/Float32/dims=1 |
39705.5 ns |
42011 ns |
0.95 |
array/reductions/mapreduce/Float32/dims=2 |
56085 ns |
60008 ns |
0.93 |
array/reductions/mapreduce/Float32/dims=1L |
51159 ns |
52656 ns |
0.97 |
array/reductions/mapreduce/Float32/dims=2L |
69891 ns |
72333.5 ns |
0.97 |
array/broadcast |
20412 ns |
20154 ns |
1.01 |
array/copyto!/gpu_to_gpu |
10660.666666666666 ns |
12828 ns |
0.83 |
array/copyto!/cpu_to_gpu |
213289 ns |
216999 ns |
0.98 |
array/copyto!/gpu_to_cpu |
282857 ns |
282452 ns |
1.00 |
array/accumulate/Int64/1d |
117824.5 ns |
125219 ns |
0.94 |
array/accumulate/Int64/dims=1 |
79135 ns |
88138 ns |
0.90 |
array/accumulate/Int64/dims=2 |
155937.5 ns |
162165 ns |
0.96 |
array/accumulate/Int64/dims=1L |
1705343 ns |
1714037 ns |
0.99 |
array/accumulate/Int64/dims=2L |
960402.5 ns |
971078.5 ns |
0.99 |
array/accumulate/Float32/1d |
100373 ns |
109882.5 ns |
0.91 |
array/accumulate/Float32/dims=1 |
76042 ns |
84424 ns |
0.90 |
array/accumulate/Float32/dims=2 |
143997 ns |
151744.5 ns |
0.95 |
array/accumulate/Float32/dims=1L |
1590640 ns |
1622762 ns |
0.98 |
array/accumulate/Float32/dims=2L |
658982 ns |
702590.5 ns |
0.94 |
array/construct |
1313.4 ns |
1267.85 ns |
1.04 |
array/random/randn/Float32 |
42758.5 ns |
48207 ns |
0.89 |
array/random/randn!/Float32 |
29806 ns |
25000 ns |
1.19 |
array/random/rand!/Int64 |
34591 ns |
27295 ns |
1.27 |
array/random/rand!/Float32 |
8171.333333333333 ns |
8737.333333333334 ns |
0.94 |
array/random/rand/Int64 |
30153 ns |
30000 ns |
1.01 |
array/random/rand/Float32 |
12165 ns |
13155 ns |
0.92 |
array/permutedims/4d |
50924 ns |
55023 ns |
0.93 |
array/permutedims/2d |
52358 ns |
53878 ns |
0.97 |
array/permutedims/3d |
52707 ns |
54959 ns |
0.96 |
array/sorting/1d |
2735396 ns |
2758315 ns |
0.99 |
array/sorting/by |
3304406 ns |
3344753.5 ns |
0.99 |
array/sorting/2d |
1067036 ns |
1081270 ns |
0.99 |
cuda/synchronization/stream/auto |
1022.7 ns |
1017.0833333333334 ns |
1.01 |
cuda/synchronization/stream/nonblocking |
7781.700000000001 ns |
7295.6 ns |
1.07 |
cuda/synchronization/stream/blocking |
815.3333333333334 ns |
798.6601941747573 ns |
1.02 |
cuda/synchronization/context/auto |
1192.6 ns |
1158.8 ns |
1.03 |
cuda/synchronization/context/nonblocking |
6878.8 ns |
7658.9 ns |
0.90 |
cuda/synchronization/context/blocking |
880.2037037037037 ns |
902.5217391304348 ns |
0.98 |
This comment was automatically generated by workflow using github-action-benchmark.
maleadt
approved these changes
Feb 5, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Just changing this so it shows up on the benchmark graphs and makes it easier to figure out why a the sudden jump/drop in some benchmarks