-
Notifications
You must be signed in to change notification settings - Fork 239
SparseMatricesCSR Dispatch #2720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #2720 +/- ##
==========================================
+ Coverage 88.87% 89.00% +0.12%
==========================================
Files 153 153
Lines 13154 13154
==========================================
+ Hits 11691 11708 +17
+ Misses 1463 1446 -17 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Your PR no longer requires formatting changes. Thank you for your contribution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CUDA.jl Benchmarks
Benchmark suite | Current: 627c794 | Previous: af58f61 | Ratio |
---|---|---|---|
latency/precompile |
47219964749 ns |
47364047578 ns |
1.00 |
latency/ttfp |
6800010126 ns |
6821264934 ns |
1.00 |
latency/import |
3212836642.5 ns |
3225196597.5 ns |
1.00 |
integration/volumerhs |
9610171.5 ns |
9614354 ns |
1.00 |
integration/byval/slices=1 |
146882 ns |
146613 ns |
1.00 |
integration/byval/slices=3 |
425144 ns |
425074 ns |
1.00 |
integration/byval/reference |
144782 ns |
144823 ns |
1.00 |
integration/byval/slices=2 |
286155 ns |
286027 ns |
1.00 |
integration/cudadevrt |
103242 ns |
103297 ns |
1.00 |
kernel/indexing |
14023 ns |
14019 ns |
1.00 |
kernel/indexing_checked |
14532 ns |
14585 ns |
1.00 |
kernel/occupancy |
687.8533333333334 ns |
721.3333333333334 ns |
0.95 |
kernel/launch |
2110 ns |
2085.8 ns |
1.01 |
kernel/rand |
17528 ns |
18047 ns |
0.97 |
array/reverse/1d |
19719 ns |
19477 ns |
1.01 |
array/reverse/2d |
25105 ns |
24090.5 ns |
1.04 |
array/reverse/1d_inplace |
11120 ns |
10504 ns |
1.06 |
array/reverse/2d_inplace |
13011 ns |
12197 ns |
1.07 |
array/copy |
21291 ns |
21209 ns |
1.00 |
array/iteration/findall/int |
157272 ns |
158069.5 ns |
0.99 |
array/iteration/findall/bool |
138618.5 ns |
139342 ns |
0.99 |
array/iteration/findfirst/int |
153006.5 ns |
153958 ns |
0.99 |
array/iteration/findfirst/bool |
154403 ns |
154465 ns |
1.00 |
array/iteration/scalar |
70522 ns |
72075 ns |
0.98 |
array/iteration/logical |
213281.5 ns |
215491 ns |
0.99 |
array/iteration/findmin/1d |
41124 ns |
41679 ns |
0.99 |
array/iteration/findmin/2d |
94274 ns |
94340 ns |
1.00 |
array/reductions/reduce/1d |
35396 ns |
35947 ns |
0.98 |
array/reductions/reduce/2d |
41051 ns |
41316 ns |
0.99 |
array/reductions/mapreduce/1d |
33102 ns |
33483 ns |
0.99 |
array/reductions/mapreduce/2d |
40882.5 ns |
40997 ns |
1.00 |
array/broadcast |
20761.5 ns |
20733 ns |
1.00 |
array/copyto!/gpu_to_gpu |
13749 ns |
13434 ns |
1.02 |
array/copyto!/cpu_to_gpu |
207750 ns |
208335 ns |
1.00 |
array/copyto!/gpu_to_cpu |
243108.5 ns |
243244 ns |
1.00 |
array/accumulate/1d |
109745.5 ns |
109403 ns |
1.00 |
array/accumulate/2d |
80219 ns |
80261 ns |
1.00 |
array/construct |
1247.5 ns |
1244.9 ns |
1.00 |
array/random/randn/Float32 |
44314 ns |
43677.5 ns |
1.01 |
array/random/randn!/Float32 |
26326 ns |
26376 ns |
1.00 |
array/random/rand!/Int64 |
27062 ns |
27073 ns |
1.00 |
array/random/rand!/Float32 |
8598.333333333334 ns |
8572 ns |
1.00 |
array/random/rand/Int64 |
33597 ns |
29918 ns |
1.12 |
array/random/rand/Float32 |
12971 ns |
12871 ns |
1.01 |
array/permutedims/4d |
61325 ns |
61023 ns |
1.00 |
array/permutedims/2d |
55343 ns |
55334 ns |
1.00 |
array/permutedims/3d |
55951 ns |
55987.5 ns |
1.00 |
array/sorting/1d |
2775342.5 ns |
2774813 ns |
1.00 |
array/sorting/by |
3367068 ns |
3365701 ns |
1.00 |
array/sorting/2d |
1084272 ns |
1084786 ns |
1.00 |
cuda/synchronization/stream/auto |
1043.1 ns |
1038.1 ns |
1.00 |
cuda/synchronization/stream/nonblocking |
6521.2 ns |
6575 ns |
0.99 |
cuda/synchronization/stream/blocking |
833.1764705882352 ns |
798.2788461538462 ns |
1.04 |
cuda/synchronization/context/auto |
1189.9 ns |
1153 ns |
1.03 |
cuda/synchronization/context/nonblocking |
6686.2 ns |
6748.6 ns |
0.99 |
cuda/synchronization/context/blocking |
932.2051282051282 ns |
888.7142857142857 ns |
1.05 |
This comment was automatically generated by workflow using github-action-benchmark.
This PR should be ready by now |
This PR adds a new extension module SparseMatricesCSRExt that enables dispatching
SparseMatrixCSR
fromSparseMatricesCSR.jl
toCuSparseMatrixCSR
.