-
Notifications
You must be signed in to change notification settings - Fork 239
Fill in missing KA functionality (KA.functional + sparse matrices adaption from CUDAbackend) #2740
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
||
Adapt.adapt_storage(::CUDABackend, a::AbstractArray) = Adapt.adapt(CuArray, a) | ||
Adapt.adapt_storage(::CUDABackend, a::Union{CuArray,CUSPARSE.AbstractCuSparseArray}) = a | ||
Adapt.adapt_storage(::KA.CPU, a::Union{CuArray,CUSPARSE.AbstractCuSparseArray}) = Adapt.adapt(Array, a) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, one note here, last line will fallback to type Array
for matrices other than CuSparseMatrixCSC
unless some extension is loaded, for instance #2720. also some alternatives is to fallback to SparseMatrixCSC
or just throw not implemented error
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #2740 +/- ##
==========================================
- Coverage 88.99% 88.94% -0.05%
==========================================
Files 153 153
Lines 13153 13154 +1
==========================================
- Hits 11705 11700 -5
- Misses 1448 1454 +6 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Yeah. |
Your PR requires formatting changes to meet the project's style guidelines. Click here to view the suggested changes.diff --git a/src/CUDAKernels.jl b/src/CUDAKernels.jl
index 0e18eba61..9e87686a7 100644
--- a/src/CUDAKernels.jl
+++ b/src/CUDAKernels.jl
@@ -32,8 +32,8 @@ KA.synchronize(::CUDABackend) = synchronize()
KA.functional(::CUDABackend) = CUDA.functional()
Adapt.adapt_storage(::CUDABackend, a::AbstractArray) = Adapt.adapt(CuArray, a)
-Adapt.adapt_storage(::CUDABackend, a::Union{CuArray,CUSPARSE.AbstractCuSparseArray}) = a
-Adapt.adapt_storage(::KA.CPU, a::Union{CuArray,CUSPARSE.AbstractCuSparseArray}) = Adapt.adapt(Array, a)
+Adapt.adapt_storage(::CUDABackend, a::Union{CuArray, CUSPARSE.AbstractCuSparseArray}) = a
+Adapt.adapt_storage(::KA.CPU, a::Union{CuArray, CUSPARSE.AbstractCuSparseArray}) = Adapt.adapt(Array, a)
## memory operations
diff --git a/test/base/kernelabstractions.jl b/test/base/kernelabstractions.jl
index 4f4120032..61519e210 100644
--- a/test/base/kernelabstractions.jl
+++ b/test/base/kernelabstractions.jl
@@ -17,7 +17,7 @@ end
@testset "CUDA Backend Adapt Tests" begin
# CPU → GPU
A = sprand(Float32, 10, 10, 0.5) #CSC
- A_d = adapt(CUDABackend(), A)
+ A_d = adapt(CUDABackend(), A)
@test A_d isa CUSPARSE.CuSparseMatrixCSC
@test adapt(CUDABackend(), A_d) |> typeof == typeof(A_d)
@@ -25,5 +25,5 @@ end
B_d = A |> cu # CuCSC
B = adapt(KA.CPU(), A_d)
@test B isa SparseMatrixCSC
- @test adapt(KA.CPU(), B) |> typeof == typeof(B)
+ @test adapt(KA.CPU(), B) |> typeof == typeof(B)
end |
Test failures related. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CUDA.jl Benchmarks
Benchmark suite | Current: 6aa9b1e | Previous: b24af84 | Ratio |
---|---|---|---|
latency/precompile |
47227094040 ns |
47408635362.5 ns |
1.00 |
latency/ttfp |
6824720867 ns |
6756583810 ns |
1.01 |
latency/import |
3219728914.5 ns |
3201988386.5 ns |
1.01 |
integration/volumerhs |
9610143 ns |
9608856 ns |
1.00 |
integration/byval/slices=1 |
147044 ns |
146935 ns |
1.00 |
integration/byval/slices=3 |
425640.5 ns |
425262 ns |
1.00 |
integration/byval/reference |
145233 ns |
144977 ns |
1.00 |
integration/byval/slices=2 |
286533 ns |
286173 ns |
1.00 |
integration/cudadevrt |
103559 ns |
103373 ns |
1.00 |
kernel/indexing |
14361.5 ns |
14266 ns |
1.01 |
kernel/indexing_checked |
15137 ns |
14874.5 ns |
1.02 |
kernel/occupancy |
691.7651006711409 ns |
689.8741721854304 ns |
1.00 |
kernel/launch |
2506.5 ns |
2164.3 ns |
1.16 |
kernel/rand |
16558 ns |
18183.5 ns |
0.91 |
array/reverse/1d |
19851 ns |
20047 ns |
0.99 |
array/reverse/2d |
23852 ns |
25366 ns |
0.94 |
array/reverse/1d_inplace |
10978.5 ns |
11030.666666666666 ns |
1.00 |
array/reverse/2d_inplace |
12652 ns |
11508 ns |
1.10 |
array/copy |
21244.5 ns |
21080 ns |
1.01 |
array/iteration/findall/int |
160140 ns |
158388 ns |
1.01 |
array/iteration/findall/bool |
140519 ns |
139415 ns |
1.01 |
array/iteration/findfirst/int |
165650 ns |
154381 ns |
1.07 |
array/iteration/findfirst/bool |
168634.5 ns |
155031 ns |
1.09 |
array/iteration/scalar |
73635 ns |
71204.5 ns |
1.03 |
array/iteration/logical |
219015.5 ns |
210201.5 ns |
1.04 |
array/iteration/findmin/1d |
42078 ns |
41765 ns |
1.01 |
array/iteration/findmin/2d |
94351 ns |
95061.5 ns |
0.99 |
array/reductions/reduce/1d |
40507.5 ns |
35804 ns |
1.13 |
array/reductions/reduce/2d |
51737 ns |
42000.5 ns |
1.23 |
array/reductions/mapreduce/1d |
39119.5 ns |
33568 ns |
1.17 |
array/reductions/mapreduce/2d |
50881 ns |
41606.5 ns |
1.22 |
array/broadcast |
21490 ns |
21172 ns |
1.02 |
array/copyto!/gpu_to_gpu |
11970 ns |
13803 ns |
0.87 |
array/copyto!/cpu_to_gpu |
210872 ns |
210795 ns |
1.00 |
array/copyto!/gpu_to_cpu |
245606.5 ns |
245433 ns |
1.00 |
array/accumulate/1d |
110045.5 ns |
109840.5 ns |
1.00 |
array/accumulate/2d |
80742 ns |
80884 ns |
1.00 |
array/construct |
1290.9 ns |
1282.8 ns |
1.01 |
array/random/randn/Float32 |
49916.5 ns |
47662 ns |
1.05 |
array/random/randn!/Float32 |
26896 ns |
26788 ns |
1.00 |
array/random/rand!/Int64 |
27380 ns |
27194 ns |
1.01 |
array/random/rand!/Float32 |
8841.333333333334 ns |
8682.333333333334 ns |
1.02 |
array/random/rand/Int64 |
30085 ns |
30217 ns |
1.00 |
array/random/rand/Float32 |
13174 ns |
13197 ns |
1.00 |
array/permutedims/4d |
61281 ns |
62953 ns |
0.97 |
array/permutedims/2d |
55930 ns |
55536.5 ns |
1.01 |
array/permutedims/3d |
56595 ns |
56437 ns |
1.00 |
array/sorting/1d |
2778451 ns |
2776763 ns |
1.00 |
array/sorting/by |
3369102 ns |
3368018 ns |
1.00 |
array/sorting/2d |
1086558 ns |
1085390 ns |
1.00 |
cuda/synchronization/stream/auto |
1086.5 ns |
1028.5 ns |
1.06 |
cuda/synchronization/stream/nonblocking |
6519.2 ns |
6435 ns |
1.01 |
cuda/synchronization/stream/blocking |
835.8333333333334 ns |
810.6022727272727 ns |
1.03 |
cuda/synchronization/context/auto |
1213.4 ns |
1167.4 ns |
1.04 |
cuda/synchronization/context/nonblocking |
6796.8 ns |
6638.299999999999 ns |
1.02 |
cuda/synchronization/context/blocking |
961.5675675675676 ns |
898.7291666666666 ns |
1.07 |
This comment was automatically generated by workflow using github-action-benchmark.
I believe CI failure is not from my end. is it? if not, then this should be ready |
f7129e9
to
6aa9b1e
Compare
This PR should be able to solve two missing pieces:
I might need help on where I should write the tests, because
KA.jl
only tests forArray
type (https://github.com/JuliaGPU/KernelAbstractions.jl/blob/365bf2cd461b297cf723ee3f819220ecb73c1e1e/test/test.jl#L94), so tests for sparse matrices should be put intest/kernelabstractions.jl
, right?