-
Notifications
You must be signed in to change notification settings - Fork 3
L1 Gauss Seidel preconditioner #191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
Abdelrahman912
wants to merge
55
commits into
JuliaHealth:main
Choose a base branch
from
Abdelrahman912:l1-gs-smoother
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+862
−6
Draft
Changes from 10 commits
Commits
Show all changes
55 commits
Select commit
Hold shift + click to select a range
2382a8f
init l1 smoother
Abdelrahman912 e8991ba
minor fix
Abdelrahman912 a127bc0
init cuda prec setup
Abdelrahman912 d216e34
init working cuda
Abdelrahman912 086ed78
fix partition limit indices
Abdelrahman912 99b1ab6
add comment
Abdelrahman912 64a247c
minor change
Abdelrahman912 343c99d
minor adjustment for csc
Abdelrahman912 ee66f73
add cuda csr
Abdelrahman912 fbb3338
check symmetry for csc
Abdelrahman912 2b2dc17
Merge branch 'main' into l1-gs-smoother
Abdelrahman912 5e1203a
rm unnecessary code
Abdelrahman912 bc1cec3
Merge branch 'main' into l1-gs-smoother
Abdelrahman912 c8cc291
add cpu version
Abdelrahman912 457c110
Merge branch 'add-multi-threading-l1-prec' into l1-gs-smoother
Abdelrahman912 7f17845
Merge branch 'main' into l1-gs-smoother
Abdelrahman912 7c9b474
Merge branch 'main' into l1-gs-smoother
Abdelrahman912 42026cd
init ka, working but buggy.
Abdelrahman912 1b3ce6f
Merge branch 'main' into l1-gs-smoother
Abdelrahman912 8a5675b
Merge branch 'ka-porting' into l1-gs-smoother
Abdelrahman912 476928b
Merge branch 'main' into l1-gs-smoother
Abdelrahman912 8bcca25
fix ka buggy code
Abdelrahman912 77f7148
add tests
Abdelrahman912 aecbf67
minor fix
Abdelrahman912 1aa8986
update manifest
Abdelrahman912 552f8b8
Merge branch 'main' into l1-gs-smoother
Abdelrahman912 dc8d4e4
merge cpu and gpu
Abdelrahman912 24b72ab
Merge branch 'main' into l1-gs-smoother
Abdelrahman912 a87c66f
minor fix
Abdelrahman912 5778fe9
add Preconditioners submodule
Abdelrahman912 b104cfb
remove unnecessary module reference
Abdelrahman912 0e93f05
add cpu symmetric test
Abdelrahman912 51fa896
add test path
Abdelrahman912 1f56bad
minor fix
Abdelrahman912 226f55a
set nparts to be ncores
Abdelrahman912 292aacc
precompute blocks
Abdelrahman912 36f5754
separate CPU GPU tests
Abdelrahman912 33f1de7
fix ci
Abdelrahman912 3b52869
minor fix
Abdelrahman912 a056a67
add symmetric test
Abdelrahman912 bf2cc96
rm dead code
Abdelrahman912 a84ab45
comment out adapt
Abdelrahman912 0794dce
rm direct solver
Abdelrahman912 2532179
add doc string
Abdelrahman912 8203cac
add gpu test examples
Abdelrahman912 7ef9e15
minor fix
Abdelrahman912 869d3db
elementwise operations refinement
Abdelrahman912 9609d59
add reference
Abdelrahman912 8e62678
add block partitioning to doc string + some comments for (CSC/CSR)Format
Abdelrahman912 4a6454c
rm piratical code (only those which were merged into CUDA.jl) + add w…
Abdelrahman912 cce6547
rm dead code
Abdelrahman912 2ca65a6
init gs
Abdelrahman912 1e01857
init forward_sweep
Abdelrahman912 01faea3
minor fixes
Abdelrahman912 df65f07
minor fixes (buggy test )
Abdelrahman912 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,219 @@ | ||
######################################### | ||
## CUDA L1 Gauss Seidel Preconditioner ## | ||
######################################### | ||
|
||
struct CudaPartitioning{Ti} <: AbstractPartitioning | ||
threads::Ti # number of diagonals per partition | ||
blocks::Ti # number of partitions | ||
size_A::Ti | ||
end | ||
|
||
function Thunderbolt.build_l1prec(::CudaL1PrecBuilder, A::AbstractSparseMatrix; | ||
n_threads::Union{Integer,Nothing}=nothing, n_blocks::Union{Integer,Nothing}=nothing) | ||
if CUDA.functional() | ||
# Raise error if invalid thread or block count is provided | ||
if !isnothing(n_threads) && n_threads == 0 | ||
error("n_threads must be greater than zero") | ||
end | ||
if !isnothing(n_blocks) && n_blocks == 0 | ||
error("n_blocks must be greater than zero") | ||
end | ||
return _build_cuda_l1prec(A, n_threads, n_blocks) | ||
else | ||
error("CUDA is not functional, please check your GPU driver and CUDA installation") | ||
end | ||
end | ||
|
||
(builder::CudaL1PrecBuilder)(A::AbstractSparseMatrix; | ||
n_threads::Union{Integer,Nothing}=nothing, n_blocks::Union{Integer,Nothing}=nothing) = build_l1prec(builder, A; n_threads=n_threads, n_blocks=n_blocks) | ||
|
||
function _build_cuda_l1prec(A::AbstractSparseMatrix, n_threads::Union{Integer,Nothing}, n_blocks::Union{Integer,Nothing}) | ||
# Determine threads and blocks if not provided | ||
# blocks -> number of partitions | ||
# threads -> number of diagonals per partition | ||
size_A = convert(Int32,size(A, 1)) | ||
threads = isnothing(n_threads) ? convert(Int32, min(size_A, 256)) : convert(Int32, n_threads) | ||
blocks = isnothing(n_blocks) ? _calculate_nblocks(threads, size_A) : convert(Int32, n_blocks) | ||
partitioning = CudaPartitioning(threads, blocks, size_A) | ||
cuda_A = _cuda_A(A) | ||
return L1Preconditioner(partitioning, cuda_A) | ||
end | ||
|
||
_cuda_A(A::SparseMatrixCSC) = CUSPARSE.CuSparseMatrixCSC(A) | ||
_cuda_A(A::SparseMatrixCSR) = CUSPARSE.CuSparseMatrixCSR(A) | ||
_cuda_A(A::CUSPARSE.CuSparseMatrixCSC) = A | ||
_cuda_A(A::CUSPARSE.CuSparseMatrixCSR) = A | ||
|
||
# TODO: should x & b be CuArrays? or leave them as AbstractVector? | ||
function LinearSolve.ldiv!(y::VectorType, P::L1Preconditioner{CudaPartitioning{Ti}}, x::VectorType) where {Ti, VectorType <: AbstractVector} | ||
# x: residual | ||
# y: preconditioned residual | ||
y .= x #works either way, whether x is CuArray or Vector | ||
_ldiv!(y, P) | ||
end | ||
|
||
function _ldiv!(y::CuVector , P::L1Preconditioner{CudaPartitioning{Ti}}) where {Ti} | ||
@unpack partitioning, A = P | ||
@unpack threads, blocks, size_A = partitioning | ||
issym = isapprox(A, A',rtol=1e-12) | ||
CUDA.@sync CUDA.@cuda threads=threads blocks=blocks _l1prec_kernel!(y, A,issym) | ||
return nothing | ||
end | ||
|
||
function _ldiv!(y::Vector , P::L1Preconditioner{CudaPartitioning{Ti}}) where {Ti} | ||
@unpack partitioning, A = P | ||
@unpack threads, blocks, size_A = partitioning | ||
cy = y |> cu | ||
issym = isapprox(A, A',rtol=1e-12) | ||
CUDA.@sync CUDA.@cuda threads=threads blocks=blocks _l1prec_kernel!(cy, A,issym) | ||
copyto!(y, cy) | ||
return nothing | ||
end | ||
|
||
abstract type AbstractMatrixSymmetry end | ||
|
||
struct SymmetricMatrix <: AbstractMatrixSymmetry end # important for the case of CSC format | ||
struct NonSymmetricMatrix <: AbstractMatrixSymmetry end | ||
|
||
# TODO: consider creating unified iterator for both CPU and GPU. | ||
struct DeviceDiagonalIterator{MatrixType, MatrixSymmetry <: AbstractMatrixSymmetry} | ||
A::MatrixType | ||
end | ||
|
||
struct DeviceDiagonalCache{Ti,Tv} | ||
k::Ti # partition index | ||
idx::Ti # diagonal index | ||
b::Tv # partition diagonal value | ||
d::Tv # off-partition absolute sum | ||
end | ||
|
||
DiagonalIterator(::Type{SymT}, A::MatrixType) where {SymT <: AbstractMatrixSymmetry, MatrixType} = | ||
DeviceDiagonalIterator{MatrixType, SymT}(A) | ||
|
||
function Base.iterate(iterator::DeviceDiagonalIterator) | ||
idx = (blockIdx().x - Int32(1)) * blockDim().x + threadIdx().x # diagonal index | ||
idx <= size(iterator.A, 1) || return nothing | ||
k = blockIdx().x # partition index | ||
return (_makecache(iterator,idx,k), (idx,k)) | ||
end | ||
|
||
function Base.iterate(iterator::DeviceDiagonalIterator, state) | ||
n_blocks = gridDim().x | ||
n_threads = blockDim().x | ||
idx,k = state | ||
k += n_blocks # partition index | ||
stride = n_blocks * n_threads | ||
idx = idx + stride # diagonal index | ||
idx <= size(iterator.A, 1) || return nothing | ||
return (_makecache(iterator,idx,k), (idx,k)) | ||
end | ||
|
||
function _makecache(iterator::DeviceDiagonalIterator{CUSPARSE.CuSparseDeviceMatrixCSC{Tv,Ti,1},NonSymmetricMatrix}, idx,k) where {Tv,Ti} | ||
#Ωⁱ := {j ∈ Ωₖ : i ∈ Ωₖ} | ||
#Ωⁱₒ := {j ∉ Ωₖ : i ∈ Ωₖ} off-partition column values | ||
# bₖᵢ := Aᵢᵢ | ||
# dₖᵢ := ∑_{j ∈ Ωⁱₒ} |Aᵢⱼ| | ||
n_threads = blockDim().x | ||
@unpack A = iterator | ||
part_start_idx = (k - Int32(1)) * n_threads + Int32(1) | ||
part_end_idx = min(part_start_idx + n_threads - Int32(1), size(A, 2)) | ||
|
||
b = zero(eltype(A)) | ||
d = zero(eltype(A)) | ||
|
||
# specific to CSC format | ||
for col in 1:size(A, 2) | ||
col_start = A.colPtr[col] | ||
col_end = A.colPtr[col+1] - 1 | ||
|
||
for i in col_start:col_end | ||
row = A.rowVal[i] | ||
if row == idx | ||
v = A.nzVal[i] | ||
|
||
if part_start_idx > col || col > part_end_idx | ||
d += abs(v) | ||
end | ||
|
||
if col == idx | ||
b = v | ||
end | ||
end | ||
end | ||
end | ||
|
||
return DeviceDiagonalCache(k, idx,b, d) | ||
end | ||
|
||
function _makecache(iterator::DeviceDiagonalIterator{CUSPARSE.CuSparseDeviceMatrixCSC{Tv,Ti,1},SymmetricMatrix}, idx,k) where {Tv,Ti} | ||
#Ωⁱ := {j ∈ Ωₖ : i ∈ Ωₖ} | ||
#Ωⁱₒ := {j ∉ Ωₖ : i ∈ Ωₖ} off-partition column values | ||
# bₖᵢ := Aᵢᵢ | ||
# dₖᵢ := ∑_{j ∈ Ωⁱₒ} |Aᵢⱼ| | ||
n_threads = blockDim().x | ||
@unpack A = iterator | ||
part_start_idx = (k - Int32(1)) * n_threads + Int32(1) | ||
part_end_idx = min(part_start_idx + n_threads - Int32(1), size(A, 2)) | ||
|
||
b = zero(eltype(A)) | ||
d = zero(eltype(A)) | ||
|
||
# since matrix is symmetric, then both CSC and CSR are the same. | ||
b,d = _diag_offpart_csr(A.colPtr, A.rowVal, A.nzVal, idx, part_start_idx, part_end_idx) | ||
|
||
return DeviceDiagonalCache(k, idx,b, d) | ||
end | ||
|
||
function _makecache(iterator::DeviceDiagonalIterator{CUSPARSE.CuSparseDeviceMatrixCSR{Tv,Ti,1}}, idx,k) where {Tv,Ti} | ||
#Ωⁱ := {j ∈ Ωₖ : i ∈ Ωₖ} | ||
#Ωⁱₒ := {j ∉ Ωₖ : i ∈ Ωₖ} off-partition column values | ||
# bₖᵢ := Aᵢᵢ | ||
# dₖᵢ := ∑_{j ∈ Ωⁱₒ} |Aᵢⱼ| | ||
n_threads = blockDim().x | ||
@unpack A = iterator # A is in CSR format | ||
part_start_idx = (k - Int32(1)) * n_threads + Int32(1) | ||
part_end_idx = min(part_start_idx + n_threads - Int32(1), size(A, 2)) | ||
|
||
b,d = _diag_offpart_csr(A.rowPtr, A.colVal, A.nzVal, idx, part_start_idx, part_end_idx) | ||
|
||
return DeviceDiagonalCache(k, idx, b, d) | ||
end | ||
|
||
|
||
function _diag_offpart_csr(rowPtr, colVal, nzVal, idx::Integer, part_start::Integer, part_end::Integer) | ||
Tv = eltype(nzVal) | ||
b = zero(Tv) | ||
d = zero(Tv) | ||
|
||
row_start = rowPtr[idx] | ||
row_end = rowPtr[idx + 1] - 1 | ||
|
||
for i in row_start:row_end | ||
col = colVal[i] | ||
v = nzVal[i] | ||
|
||
if col == idx | ||
b = v | ||
elseif col < part_start || col > part_end | ||
d += abs(v) | ||
end | ||
end | ||
|
||
return b, d | ||
end | ||
|
||
|
||
function _l1prec_kernel!(y, A,issym) | ||
# this kernel will loop over the corresponding diagonals in strided fashion. | ||
# e.g. if n_threads = 4, n_blocks = 2, A is (100 x 100), and current global thread id = 5, | ||
# then the kernel will loop over the diagonals with stride: | ||
# k (partition index) = k + n_blocks (i.e. 2, 4, 6, 8, 10, 12, 14) | ||
# idx (diagonal index) = idx + n_blocks * n_threads (i.e. 5, 13, 21, 29, 37, 45, 53) | ||
symT = issym ? SymmetricMatrix : NonSymmetricMatrix | ||
for diagonal in DiagonalIterator(symT,A) | ||
@unpack k, idx, b, d = diagonal | ||
@cushow k,d #TODO: remove this line | ||
y[idx] = y[idx]/ (b + d) | ||
end | ||
return nothing | ||
end |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Abdelrahman912 marked this conversation as resolved.
Show resolved
Hide resolved
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
#################################### | ||
## L1 Gauss Seidel Preconditioner ## | ||
#################################### | ||
|
||
## Interfaces for the L1 Gauss Seidel preconditioner | ||
abstract type AbstractPartitioning end | ||
|
||
struct L1Preconditioner{Partitioning,MatrixType} | ||
partitioning::Partitioning | ||
A::MatrixType | ||
end | ||
|
||
|
||
LinearSolve.ldiv!(::VectorType, ::L1Preconditioner{Partitioning}, ::VectorType) where {VectorType <: AbstractVector, Partitioning} = | ||
error("Not implemented") | ||
|
||
abstract type AbstractL1PrecBuilder end | ||
struct CudaL1PrecBuilder <: AbstractL1PrecBuilder end | ||
|
||
function build_l1prec(::AbstractL1PrecBuilder, ::AbstractMatrix) | ||
error("Not implemented") | ||
end | ||
|
||
(builder::AbstractL1PrecBuilder)(A::AbstractMatrix) = build_l1prec(builder, A) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Abdelrahman912 marked this conversation as resolved.
Show resolved
Hide resolved
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
using SparseArrays | ||
using CUDA | ||
using Thunderbolt | ||
using LinearSolve | ||
using SparseMatricesCSR | ||
|
||
N = 8 | ||
A = spdiagm(0 => 2 * ones(N), -1 => -ones(N-1), 1 => -ones(N-1)) | ||
|
||
cudal1prec = Thunderbolt.CudaL1PrecBuilder() | ||
P = cudal1prec(A; n_threads=2, n_blocks=1) | ||
x = eltype(A).(collect(0:N-1)) | ||
y = x | ||
LinearSolve.ldiv!(y, P, x) | ||
|
||
|
||
B = SparseMatrixCSR(A) | ||
x = eltype(A).(collect(0:N-1)) | ||
y = x | ||
P = cudal1prec(B; n_threads=2, n_blocks=1) | ||
LinearSolve.ldiv!(y, P, x) | ||
## TODO: Add tests for the above code snippet | ||
|
||
|
||
|
||
abstract type AbstractMatrixSymmetry end | ||
|
||
struct SymmetricMatrix <: AbstractMatrixSymmetry end | ||
struct NonSymmetricMatrix <: AbstractMatrixSymmetry end | ||
|
||
struct DeviceDiagonalIterator{MatrixType, MatrixSymmetry <: AbstractMatrixSymmetry} | ||
A::MatrixType | ||
end | ||
|
||
matrix_symmetry_type(A::AbstractSparseMatrix) = isapprox(A, A',rtol=1e-12) ? SymmetricMatrix : NonSymmetricMatrix | ||
|
||
DiagonalIterator(A::MatrixType) where {MatrixType} = DeviceDiagonalIterator{MatrixType,matrix_symmetry(A)}(A) |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we import these types here from their respective packages instead of Thunderbolt? We might hit weird bugs otherwise.