Description
Some of the existing CUSOLVER wrappers, and the ones added in #1806, directly implement LinearAlgebra.LAPACK
methods for greater reuse of Base functionality. We used to do that in the past, and have actually moved away from that, instead opting to implement/override higher-level APIs from LinearAlgebra building on a private CUSOLVER LAPACK-style interface instead. The main reasons for that were that NVIDIA's libraries generally do not exactly match the CPU LAPACK interface (that's not their intent, and is what libraries like NVBLAS should be used for), and that higher-level APIs often need to be overwritten anyway because they introduce Array allocations or perform GPU-incompatible operations (like for loops).
Simply removing the overloads obviously doesn't work because we've already started relying again on higher-level LinearAlgebra functionality:
ArgumentError: cannot take the CPU address of a CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}
Stacktrace:
[1] unsafe_convert(::Type{Ptr{Float32}}, x::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer})
@ CUDA ~/Julia/pkg/CUDA/src/array.jl:484
[2] potrf!(uplo::Char, A::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer})
@ LinearAlgebra.LAPACK ~/Julia/depot/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/LinearAlgebra/src/lapack.jl:3226
[3] _chol!
@ ~/Julia/depot/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/LinearAlgebra/src/cholesky.jl:187 [inlined]
[4] cholesky!(A::Hermitian{Float32, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}}, ::NoPivot; check::Bool)
@ LinearAlgebra ~/Julia/depot/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/LinearAlgebra/src/cholesky.jl:268
[5] cholesky!
@ ~/Julia/depot/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/LinearAlgebra/src/cholesky.jl:267 [inlined]
[6] cholesky!(A::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, ::NoPivot; check::Bool)
@ LinearAlgebra ~/Julia/depot/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/LinearAlgebra/src/cholesky.jl:301
[7] cholesky! (repeats 2 times)
@ ~/Julia/depot/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/LinearAlgebra/src/cholesky.jl:295 [inlined]
[8] cholesky(A::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, ::NoPivot; check::Bool)
@ LinearAlgebra ~/Julia/depot/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/LinearAlgebra/src/cholesky.jl:401
ArgumentError: cannot take the CPU address of a CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}
Stacktrace:
[1] unsafe_convert(::Type{Ptr{Float32}}, x::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer})
@ CUDA ~/Julia/pkg/CUDA/src/array.jl:484
[2] ormqr!(side::Char, trans::Char, A::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, tau::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, C::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer})
@ LinearAlgebra.LAPACK ~/Julia/depot/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/LinearAlgebra/src/lapack.jl:2926
[3] lmul!(adjQ::LinearAlgebra.AdjointQ{Float32, LinearAlgebra.QRPackedQ{Float32, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}, B::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer})
@ LinearAlgebra ~/Julia/depot/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/LinearAlgebra/src/abstractq.jl:347
[4] mul!(C::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Q::LinearAlgebra.AdjointQ{Float32, LinearAlgebra.QRPackedQ{Float32, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}, B::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer})
@ LinearAlgebra ~/Julia/depot/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/LinearAlgebra/src/abstractq.jl:212
[5] *(Q::LinearAlgebra.AdjointQ{Float32, LinearAlgebra.QRPackedQ{Float32, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}}, B::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer})
@ LinearAlgebra ~/Julia/depot/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/LinearAlgebra/src/abstractq.jl:171
[6] ldiv!(_qr::QR{Float32, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, b::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer})
@ CUDA.CUSOLVER ~/Julia/pkg/CUDA/lib/cusolver/linalg.jl:153
... and some more. That last one actually is an example of why we probably don't want this, because we already have several factorizations implemented at the higher API level, and it's confusing to have both these and some inherited from LinearAlgebra:
CUDA.jl/lib/cusolver/linalg.jl
Lines 139 to 237 in a011e73
I don't feel particularly strongly about this because I'm not too familiar with the LinearAlgebra codebase, but it doesn't seem ideal from a maintainability perspective to mix both approaches.
cc @amontoison