Skip to content

Commit da17d44

Browse files
Merge pull request #600 from SciML/ChrisRackauckas-patch-3
Address comments of Discourse
2 parents 5129389 + b715d94 commit da17d44

File tree

5 files changed

+190
-10
lines changed

5 files changed

+190
-10
lines changed

docs/pages.jl

+4-2
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,11 @@
11
# Put in a separate page so it can be used by SciMLDocs.jl
22

33
pages = ["index.md",
4-
"Tutorials" => Any["tutorials/linear.md",
4+
"tutorials/linear.md",
5+
"Tutorials" => Any[
56
"tutorials/caching_interface.md",
6-
"tutorials/accelerating_choices.md"],
7+
"tutorials/accelerating_choices.md",
8+
"tutorials/gpu.md"],
79
"Basics" => Any["basics/LinearProblem.md",
810
"basics/common_solver_opts.md",
911
"basics/OperatorAssumptions.md",

docs/src/tutorials/accelerating_choices.md

+2-1
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,8 @@ should be thought about:
7878
3. A Krylov subspace method with proper preconditioning will be better than direct solvers
7979
when the matrices get large enough. You could always precondition a sparse matrix with
8080
iLU as an easy choice, though the tolerance would need to be tuned in a problem-specific
81-
way.
81+
way. Please see the [preconditioenrs page](https://docs.sciml.ai/LinearSolve/stable/basics/Preconditioners/)
82+
for more information on defining and using preconditioners.
8283

8384
!!! note
8485

docs/src/tutorials/gpu.md

+135
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,135 @@
1+
# GPU-Accelerated Linear Solving in Julia
2+
3+
LinearSolve.jl provides two ways to GPU accelerate linear solves:
4+
5+
* Offloading: offloading takes a CPU-based problem and automatically transforms it into a
6+
GPU-based problem in the background, and returns the solution on CPU. Thus using
7+
offloading requires no change on the part of the user other than to choose an offloading
8+
solver.
9+
* Array type interface: the array type interface requires that the user defines the
10+
`LinearProblem` using an `AbstractGPUArray` type and chooses an appropriate solver
11+
(or uses the default solver). The solution will then be returned as a GPU array type.
12+
13+
The offloading approach has the advantage of being simpler and requiring no change to
14+
existing CPU code, while having the disadvantage of having more overhead. In the following
15+
sections we will demonstrate how to use each of the approaches.
16+
17+
!!! warn
18+
19+
GPUs are not always faster! Your matrices need to be sufficiently large in order for
20+
GPU accelerations to actually be faster. For offloading it's around 1,000 x 1,000 matrices
21+
and for Array type interface it's around 100 x 100. For sparse matrices, it is highly
22+
dependent on the sparsity pattern and the amount of fill-in.
23+
24+
## GPU-Offloading
25+
26+
GPU offloading is simple as it's done simply by changing the solver algorithm. Take the
27+
example from the start of the documentation:
28+
29+
```julia
30+
using LinearSolve
31+
32+
A = rand(4, 4)
33+
b = rand(4)
34+
prob = LinearProblem(A, b)
35+
sol = solve(prob)
36+
sol.u
37+
```
38+
39+
This computation can be moved to the GPU by the following:
40+
41+
```julia
42+
using CUDA # Add the GPU library
43+
sol = solve(prob, CudaOffloadFactorization())
44+
sol.u
45+
```
46+
47+
## GPUArray Interface
48+
49+
For more manual control over the factorization setup, you can use the
50+
[GPUArray interface](https://juliagpu.github.io/GPUArrays.jl/dev/), the most common
51+
instantiation being [CuArray for CUDA-based arrays on NVIDIA GPUs](https://cuda.juliagpu.org/stable/usage/array/).
52+
To use this, we simply send the matrix `A` and the value `b` over to the GPU and solve:
53+
54+
```julia
55+
using CUDA
56+
57+
A = rand(4, 4) |> cu
58+
b = rand(4) |> cu
59+
prob = LinearProblem(A, b)
60+
sol = solve(prob)
61+
sol.u
62+
```
63+
64+
```
65+
4-element CuArray{Float32, 1, CUDA.DeviceMemory}:
66+
-27.02665
67+
16.338171
68+
-77.650116
69+
106.335686
70+
```
71+
72+
Notice that the solution is a `CuArray`, and thus one must use `Array(sol.u)` if you with
73+
to return it to the CPU. This setup does no automated memory transfers and will thus only
74+
move things to CPU on command.
75+
76+
!!! warn
77+
78+
Many GPU functionalities, such as `CUDA.cu`, have a built-in preference for `Float32`.
79+
Generally it is much faster to use 32-bit floating point operations on GPU than 64-bit
80+
operations, and thus this is generally the right choice if going to such platforms.
81+
However, this change in numerical precision needs to be accounted for in your mathematics
82+
as it could lead to instabilities. To disable this, use a constructor that is more
83+
specific about the bitsize, such as `CuArray{Float64}(A)`. Additionally, preferring more
84+
stable factorization methods, such as `QRFactorization()`, can improve the numerics in
85+
such cases.
86+
87+
Similarly to other use cases, you can choose the solver, for example:
88+
89+
```julia
90+
sol = solve(prob, QRFactorization())
91+
```
92+
93+
## Sparse Matrices on GPUs
94+
95+
Currently, sparse matrix computations on GPUs are only supported for CUDA. This is done using
96+
the `CUDA.CUSPARSE` sublibrary.
97+
98+
```julia
99+
using LinearAlgebra, CUDA.CUSPARSE
100+
T = Float32
101+
n = 100
102+
A_cpu = sprand(T, n, n, 0.05) + I
103+
x_cpu = zeros(T, n)
104+
b_cpu = rand(T, n)
105+
106+
A_gpu_csr = CuSparseMatrixCSR(A_cpu)
107+
b_gpu = CuVector(b_cpu)
108+
```
109+
110+
In order to solve such problems using a direct method, you must add
111+
[CUDSS.jl](https://github.com/exanauts/CUDSS.jl). This looks like:
112+
113+
```julia
114+
using CUDSS
115+
sol = solve(prob, LUFactorization())
116+
```
117+
118+
!!! note
119+
120+
For now, CUDSS only supports CuSparseMatrixCSR type matrices.
121+
122+
Note that `KrylovJL` methods also work with sparse GPU arrays:
123+
124+
```julia
125+
sol = solve(prob, KrylovJL_GMRES())
126+
```
127+
128+
Note that CUSPARSE also has some GPU-based preconditioners, such as a built-in `ilu`. However:
129+
130+
```julia
131+
sol = solve(prob, KrylovJL_GMRES(precs = (A, p) -> (CUDA.CUSPARSE.ilu02!(A, 'O'), I)))
132+
```
133+
134+
However, right now CUSPARSE is missing the right `ldiv!` implementation for this to work
135+
in general. See https://github.com/SciML/LinearSolve.jl/issues/341 for details.

docs/src/tutorials/linear.md

+48-6
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,7 @@
1-
# Solving Linear Systems in Julia
1+
# Getting Started with Solving Linear Systems in Julia
22

3-
A linear system $$Au=b$$ is specified by defining an `AbstractMatrix` `A`, or
4-
by providing a matrix-free operator for performing `A*x` operations via the
5-
function `A(u,p,t)` out-of-place and `A(du,u,p,t)` for in-place. For the sake
6-
of simplicity, this tutorial will only showcase concrete matrices.
3+
A linear system $$Au=b$$ is specified by defining an `AbstractMatrix` or `AbstractSciMLOperator`.
4+
For the sake of simplicity, this tutorial will start by only showcasing concrete matrices.
75

86
The following defines a matrix and a `LinearProblem` which is subsequently solved
97
by the default linear solver.
@@ -34,4 +32,48 @@ sol.u
3432

3533
Thus, a package which uses LinearSolve.jl simply needs to allow the user to
3634
pass in an algorithm struct and all wrapped linear solvers are immediately
37-
available as tweaks to the general algorithm.
35+
available as tweaks to the general algorithm. For more information on the
36+
available solvers, see [the solvers page](@ref linearsystemsolvers)
37+
38+
## Sparse and Structured Matrices
39+
40+
There is no difference in the interface for using LinearSolve.jl on sparse
41+
and structured matrices. For example, the following now uses Julia's
42+
built-in [SparseArrays.jl](https://docs.julialang.org/en/v1/stdlib/SparseArrays/)
43+
to define a sparse matrix (`SparseMatrixCSC`) and solve the system using LinearSolve.jl.
44+
Note that `sprand` is a shorthand for quickly creating a sparse random matrix
45+
(see SparseArrays.jl for more details on defining sparse matrices).
46+
47+
```@example linsys1
48+
using LinearSolve, SparseArrays
49+
50+
A = sprand(4, 4, 0.75)
51+
b = rand(4)
52+
prob = LinearProblem(A, b)
53+
sol = solve(prob)
54+
sol.u
55+
56+
sol = solve(prob, KrylovJL_GMRES()) # Choosing algorithms is done the same way
57+
sol.u
58+
```
59+
60+
Similerly structure matrix types, like banded matrices, can be provided using special matrix
61+
types. While any `AbstractMatrix` type should be compatible via the general Julia interfaces,
62+
LinearSolve.jl specifically tests with the following cases:
63+
64+
* [BandedMatrices.jl](https://github.com/JuliaLinearAlgebra/BandedMatrices.jl)
65+
* [BlockDiagonals.jl](https://github.com/JuliaArrays/BlockDiagonals.jl)
66+
* [CUDA.jl](https://cuda.juliagpu.org/stable/) (CUDA GPU-based dense and sparse matrices)
67+
* [FastAlmostBandedMatrices.jl](https://github.com/SciML/FastAlmostBandedMatrices.jl)
68+
* [Metal.jl](https://metal.juliagpu.org/stable/) (Apple M-series GPU-based dense matrices)
69+
70+
## Using Matrix-Free Operators
71+
72+
In many cases where a sparse matrix gets really large, even the sparse representation
73+
cannot be stored in memory. However, in many such cases, such as with PDE discretizations,
74+
you may be able to write down a function that directly computes `A*x`. These "matrix-free"
75+
operators allow the user to define the `Ax=b` problem to be solved giving only the definition
76+
of `A*x` and allowing specific solvers (Krylov methods) to act without ever constructing
77+
the full matrix.
78+
79+
**This will be documented in more detail in the near future**

src/factorization.jl

+1-1
Original file line numberDiff line numberDiff line change
@@ -117,7 +117,7 @@ function SciMLBase.solve!(cache::LinearCache, alg::LUFactorization; kwargs...)
117117
end
118118
cache.cacheval = fact
119119

120-
if !LinearAlgebra.issuccess(fact)
120+
if hasmethod(LinearAlgebra.issuccess, Tuple{typeof(fact)}) && !LinearAlgebra.issuccess(fact)
121121
return SciMLBase.build_linear_solution(
122122
alg, cache.u, nothing, cache; retcode = ReturnCode.Failure)
123123
end

0 commit comments

Comments
 (0)