Skip to content

Fill in missing KA functionality (KA.functional + sparse matrices adaption from CUDAbackend) #2740

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 16, 2025

Conversation

Abdelrahman912
Copy link
Contributor

This PR should be able to solve two missing pieces:

julia> KA.functional(CUDABackend())
missing
julia> adapt(CUDABackend(),sprand(10,10,0.2))
10×10 SparseMatrixCSC{Float64, Int64} with 14 stored entries:

I might need help on where I should write the tests, because KA.jl only tests for Array type (https://github.com/JuliaGPU/KernelAbstractions.jl/blob/365bf2cd461b297cf723ee3f819220ecb73c1e1e/test/test.jl#L94), so tests for sparse matrices should be put in test/kernelabstractions.jl , right?


Adapt.adapt_storage(::CUDABackend, a::AbstractArray) = Adapt.adapt(CuArray, a)
Adapt.adapt_storage(::CUDABackend, a::Union{CuArray,CUSPARSE.AbstractCuSparseArray}) = a
Adapt.adapt_storage(::KA.CPU, a::Union{CuArray,CUSPARSE.AbstractCuSparseArray}) = Adapt.adapt(Array, a)
Copy link
Contributor Author

@Abdelrahman912 Abdelrahman912 Apr 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, one note here, last line will fallback to type Array for matrices other than CuSparseMatrixCSC unless some extension is loaded, for instance #2720. also some alternatives is to fallback to SparseMatrixCSC or just throw not implemented error

Copy link

codecov bot commented Apr 12, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.94%. Comparing base (b24af84) to head (6aa9b1e).
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2740      +/-   ##
==========================================
- Coverage   88.99%   88.94%   -0.05%     
==========================================
  Files         153      153              
  Lines       13153    13154       +1     
==========================================
- Hits        11705    11700       -5     
- Misses       1448     1454       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@maleadt
Copy link
Member

maleadt commented Apr 13, 2025

tests for sparse matrices should be put in test/kernelabstractions.jl

Yeah. test/base/kernelabstractions.jl currently simply launches the KA.jl test suite, but we could test additional functionality there too.

@maleadt maleadt requested a review from vchuravy April 13, 2025 07:40
@maleadt maleadt added the enhancement New feature or request label Apr 13, 2025
@Abdelrahman912 Abdelrahman912 marked this pull request as ready for review April 14, 2025 21:00
Copy link
Contributor

github-actions bot commented Apr 14, 2025

Your PR requires formatting changes to meet the project's style guidelines.
Please consider running Runic (git runic master) to apply these changes.

Click here to view the suggested changes.
diff --git a/src/CUDAKernels.jl b/src/CUDAKernels.jl
index 0e18eba61..9e87686a7 100644
--- a/src/CUDAKernels.jl
+++ b/src/CUDAKernels.jl
@@ -32,8 +32,8 @@ KA.synchronize(::CUDABackend) = synchronize()
 KA.functional(::CUDABackend) = CUDA.functional()
 
 Adapt.adapt_storage(::CUDABackend, a::AbstractArray) = Adapt.adapt(CuArray, a)
-Adapt.adapt_storage(::CUDABackend, a::Union{CuArray,CUSPARSE.AbstractCuSparseArray}) = a
-Adapt.adapt_storage(::KA.CPU, a::Union{CuArray,CUSPARSE.AbstractCuSparseArray}) = Adapt.adapt(Array, a)
+Adapt.adapt_storage(::CUDABackend, a::Union{CuArray, CUSPARSE.AbstractCuSparseArray}) = a
+Adapt.adapt_storage(::KA.CPU, a::Union{CuArray, CUSPARSE.AbstractCuSparseArray}) = Adapt.adapt(Array, a)
 
 ## memory operations
 
diff --git a/test/base/kernelabstractions.jl b/test/base/kernelabstractions.jl
index 4f4120032..61519e210 100644
--- a/test/base/kernelabstractions.jl
+++ b/test/base/kernelabstractions.jl
@@ -17,7 +17,7 @@ end
 @testset "CUDA Backend Adapt Tests" begin
     # CPU → GPU
     A = sprand(Float32, 10, 10, 0.5) #CSC
-    A_d = adapt(CUDABackend(), A) 
+    A_d = adapt(CUDABackend(), A)
     @test A_d isa CUSPARSE.CuSparseMatrixCSC
     @test adapt(CUDABackend(), A_d) |> typeof == typeof(A_d)
 
@@ -25,5 +25,5 @@ end
     B_d = A |> cu # CuCSC
     B = adapt(KA.CPU(), A_d)
     @test B isa SparseMatrixCSC
-    @test adapt(KA.CPU(), B) |> typeof == typeof(B) 
+    @test adapt(KA.CPU(), B) |> typeof == typeof(B)
 end

@maleadt
Copy link
Member

maleadt commented Apr 15, 2025

Test failures related.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Benchmark suite Current: 6aa9b1e Previous: b24af84 Ratio
latency/precompile 47227094040 ns 47408635362.5 ns 1.00
latency/ttfp 6824720867 ns 6756583810 ns 1.01
latency/import 3219728914.5 ns 3201988386.5 ns 1.01
integration/volumerhs 9610143 ns 9608856 ns 1.00
integration/byval/slices=1 147044 ns 146935 ns 1.00
integration/byval/slices=3 425640.5 ns 425262 ns 1.00
integration/byval/reference 145233 ns 144977 ns 1.00
integration/byval/slices=2 286533 ns 286173 ns 1.00
integration/cudadevrt 103559 ns 103373 ns 1.00
kernel/indexing 14361.5 ns 14266 ns 1.01
kernel/indexing_checked 15137 ns 14874.5 ns 1.02
kernel/occupancy 691.7651006711409 ns 689.8741721854304 ns 1.00
kernel/launch 2506.5 ns 2164.3 ns 1.16
kernel/rand 16558 ns 18183.5 ns 0.91
array/reverse/1d 19851 ns 20047 ns 0.99
array/reverse/2d 23852 ns 25366 ns 0.94
array/reverse/1d_inplace 10978.5 ns 11030.666666666666 ns 1.00
array/reverse/2d_inplace 12652 ns 11508 ns 1.10
array/copy 21244.5 ns 21080 ns 1.01
array/iteration/findall/int 160140 ns 158388 ns 1.01
array/iteration/findall/bool 140519 ns 139415 ns 1.01
array/iteration/findfirst/int 165650 ns 154381 ns 1.07
array/iteration/findfirst/bool 168634.5 ns 155031 ns 1.09
array/iteration/scalar 73635 ns 71204.5 ns 1.03
array/iteration/logical 219015.5 ns 210201.5 ns 1.04
array/iteration/findmin/1d 42078 ns 41765 ns 1.01
array/iteration/findmin/2d 94351 ns 95061.5 ns 0.99
array/reductions/reduce/1d 40507.5 ns 35804 ns 1.13
array/reductions/reduce/2d 51737 ns 42000.5 ns 1.23
array/reductions/mapreduce/1d 39119.5 ns 33568 ns 1.17
array/reductions/mapreduce/2d 50881 ns 41606.5 ns 1.22
array/broadcast 21490 ns 21172 ns 1.02
array/copyto!/gpu_to_gpu 11970 ns 13803 ns 0.87
array/copyto!/cpu_to_gpu 210872 ns 210795 ns 1.00
array/copyto!/gpu_to_cpu 245606.5 ns 245433 ns 1.00
array/accumulate/1d 110045.5 ns 109840.5 ns 1.00
array/accumulate/2d 80742 ns 80884 ns 1.00
array/construct 1290.9 ns 1282.8 ns 1.01
array/random/randn/Float32 49916.5 ns 47662 ns 1.05
array/random/randn!/Float32 26896 ns 26788 ns 1.00
array/random/rand!/Int64 27380 ns 27194 ns 1.01
array/random/rand!/Float32 8841.333333333334 ns 8682.333333333334 ns 1.02
array/random/rand/Int64 30085 ns 30217 ns 1.00
array/random/rand/Float32 13174 ns 13197 ns 1.00
array/permutedims/4d 61281 ns 62953 ns 0.97
array/permutedims/2d 55930 ns 55536.5 ns 1.01
array/permutedims/3d 56595 ns 56437 ns 1.00
array/sorting/1d 2778451 ns 2776763 ns 1.00
array/sorting/by 3369102 ns 3368018 ns 1.00
array/sorting/2d 1086558 ns 1085390 ns 1.00
cuda/synchronization/stream/auto 1086.5 ns 1028.5 ns 1.06
cuda/synchronization/stream/nonblocking 6519.2 ns 6435 ns 1.01
cuda/synchronization/stream/blocking 835.8333333333334 ns 810.6022727272727 ns 1.03
cuda/synchronization/context/auto 1213.4 ns 1167.4 ns 1.04
cuda/synchronization/context/nonblocking 6796.8 ns 6638.299999999999 ns 1.02
cuda/synchronization/context/blocking 961.5675675675676 ns 898.7291666666666 ns 1.07

This comment was automatically generated by workflow using github-action-benchmark.

@Abdelrahman912
Copy link
Contributor Author

I believe CI failure is not from my end. is it? if not, then this should be ready

@maleadt maleadt force-pushed the extend-ka-functional branch from f7129e9 to 6aa9b1e Compare April 16, 2025 05:22
@maleadt maleadt merged commit af58f61 into JuliaGPU:master Apr 16, 2025
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants