-
Notifications
You must be signed in to change notification settings - Fork 3
L1 Gauss Seidel preconditioner #191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
termi-official
merged 76 commits into
JuliaHealth:main
from
Abdelrahman912:l1-gs-smoother
May 9, 2025
Merged
Changes from 50 commits
Commits
Show all changes
76 commits
Select commit
Hold shift + click to select a range
2382a8f
init l1 smoother
Abdelrahman912 e8991ba
minor fix
Abdelrahman912 a127bc0
init cuda prec setup
Abdelrahman912 d216e34
init working cuda
Abdelrahman912 086ed78
fix partition limit indices
Abdelrahman912 99b1ab6
add comment
Abdelrahman912 64a247c
minor change
Abdelrahman912 343c99d
minor adjustment for csc
Abdelrahman912 ee66f73
add cuda csr
Abdelrahman912 fbb3338
check symmetry for csc
Abdelrahman912 2b2dc17
Merge branch 'main' into l1-gs-smoother
Abdelrahman912 5e1203a
rm unnecessary code
Abdelrahman912 bc1cec3
Merge branch 'main' into l1-gs-smoother
Abdelrahman912 c8cc291
add cpu version
Abdelrahman912 457c110
Merge branch 'add-multi-threading-l1-prec' into l1-gs-smoother
Abdelrahman912 7f17845
Merge branch 'main' into l1-gs-smoother
Abdelrahman912 7c9b474
Merge branch 'main' into l1-gs-smoother
Abdelrahman912 42026cd
init ka, working but buggy.
Abdelrahman912 1b3ce6f
Merge branch 'main' into l1-gs-smoother
Abdelrahman912 8a5675b
Merge branch 'ka-porting' into l1-gs-smoother
Abdelrahman912 476928b
Merge branch 'main' into l1-gs-smoother
Abdelrahman912 8bcca25
fix ka buggy code
Abdelrahman912 77f7148
add tests
Abdelrahman912 aecbf67
minor fix
Abdelrahman912 1aa8986
update manifest
Abdelrahman912 552f8b8
Merge branch 'main' into l1-gs-smoother
Abdelrahman912 dc8d4e4
merge cpu and gpu
Abdelrahman912 24b72ab
Merge branch 'main' into l1-gs-smoother
Abdelrahman912 a87c66f
minor fix
Abdelrahman912 5778fe9
add Preconditioners submodule
Abdelrahman912 b104cfb
remove unnecessary module reference
Abdelrahman912 0e93f05
add cpu symmetric test
Abdelrahman912 51fa896
add test path
Abdelrahman912 1f56bad
minor fix
Abdelrahman912 226f55a
set nparts to be ncores
Abdelrahman912 292aacc
precompute blocks
Abdelrahman912 36f5754
separate CPU GPU tests
Abdelrahman912 33f1de7
fix ci
Abdelrahman912 3b52869
minor fix
Abdelrahman912 a056a67
add symmetric test
Abdelrahman912 bf2cc96
rm dead code
Abdelrahman912 a84ab45
comment out adapt
Abdelrahman912 0794dce
rm direct solver
Abdelrahman912 2532179
add doc string
Abdelrahman912 8203cac
add gpu test examples
Abdelrahman912 7ef9e15
minor fix
Abdelrahman912 869d3db
elementwise operations refinement
Abdelrahman912 9609d59
add reference
Abdelrahman912 8e62678
add block partitioning to doc string + some comments for (CSC/CSR)Format
Abdelrahman912 4a6454c
rm piratical code (only those which were merged into CUDA.jl) + add w…
Abdelrahman912 cce6547
rm dead code
Abdelrahman912 2ca65a6
init gs
Abdelrahman912 1e01857
init forward_sweep
Abdelrahman912 01faea3
minor fixes
Abdelrahman912 df65f07
minor fixes (buggy test )
Abdelrahman912 f40ee47
fix stride issues
Abdelrahman912 c1fc69d
add nonsymmetric unit test
Abdelrahman912 4401f18
add partsize test + fix row offset
Abdelrahman912 4b855fb
add default configs
Abdelrahman912 23315df
test config
Abdelrahman912 9d75776
add gpu tests
Abdelrahman912 255fa15
add docstring
Abdelrahman912 dc57963
rm ThreadPinning
Abdelrahman912 c4a3cd2
add CUDA version
Abdelrahman912 ecae973
fix l1gs dostring
Abdelrahman912 032c06b
minor fix
Abdelrahman912 1536bd2
rm comments
Abdelrahman912 39dd912
fix packages ver
Abdelrahman912 9508ec6
rm cuda_utils to avoid piracy
Abdelrahman912 9304ec8
change l1gs note
Abdelrahman912 1f7602b
hot fix ferrite ver
Abdelrahman912 a0cb91a
clean up some comments used for debugging
Abdelrahman912 b0ced8e
fix ferrite ver (hopefully)
Abdelrahman912 b0e6bb9
Merge branch 'main' into l1-gs-smoother
termi-official 88d086c
change default nthreads
Abdelrahman912 b62b1a3
add ThreadedCSR tests
Abdelrahman912 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
######################################### | ||
## CUDA L1 Gauss Seidel Preconditioner ## | ||
######################################### | ||
|
||
# PIRACY ALERT: this code exhibit piratic nature because both `adapt` and its arguments are foreign objects. | ||
# Therefore, `adapt` behavior is going to be different depending on whether `Thunderbolt` and `CuThunderboltExt` are loaded or not. | ||
# Reference: https://juliatesting.github.io/Aqua.jl/stable/piracies/ | ||
# Note: the problem is with `AbstractSparseMatrix` as the default behavior of `adapt` is to return the same object, whatever the backend is. | ||
# Adapt.adapt(::CUDABackend, A::CUSPARSE.AbstractCuSparseMatrix) = A | ||
# Adapt.adapt(::CUDABackend,A::AbstractSparseMatrix) = A |> cu | ||
# Adapt.adapt(::CUDABackend, x::Vector) = x |> cu # not needed | ||
# Adapt.adapt(::CUDABackend, x::CuVector) = x # not needed | ||
|
||
# TODO: remove this function if back compatibility is not needed | ||
Abdelrahman912 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Preconditioners.convert_to_backend(::CUDABackend, A::AbstractSparseMatrix) = adapt(CUDABackend(), A) | ||
|
||
|
||
# For some reason, these properties are not automatically defined for Device Arrays, | ||
# TODO: remove the following code when https://github.com/JuliaGPU/CUDA.jl/pull/2738 is merged | ||
#SparseArrays.rowvals(A::CUSPARSE.CuSparseDeviceMatrixCSC{Tv,Ti,1}) where {Tv,Ti} = A.rowVal | ||
#SparseArrays.getcolptr(A::CUSPARSE.CuSparseDeviceMatrixCSC{Tv,Ti,1}) where {Tv,Ti} = A.colPtr | ||
#SparseArrays.getnzval(A::CUSPARSE.CuSparseDeviceMatrixCSC{Tv,Ti,1}) where {Tv,Ti} = A.nzVal | ||
#SparseMatricesCSR.getnzval(A::CUSPARSE.CuSparseDeviceMatrixCSR{Tv,Ti,1}) where {Tv,Ti} = A.nzVal | ||
|
||
# PIRACY ALERT: the following code is commented out to avoid piracy | ||
# SparseMatricesCSR.colvals(A::CUSPARSE.CuSparseDeviceMatrixCSR{Tv,Ti,1}) where {Tv,Ti} = A.colVal | ||
# SparseMatricesCSR.getrowptr(A::CUSPARSE.CuSparseDeviceMatrixCSR{Tv,Ti,1}) where {Tv,Ti} = A.rowPtr | ||
Abdelrahman912 marked this conversation as resolved.
Show resolved
Hide resolved
Abdelrahman912 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
# workaround for the issue with SparseMatricesCSR | ||
# TODO: find a more robust solution to dispatch the correct function | ||
Preconditioners.colvals(A::CUSPARSE.CuSparseDeviceMatrixCSR{Tv,Ti,1}) where {Tv,Ti} = A.colVal | ||
Preconditioners.getrowptr(A::CUSPARSE.CuSparseDeviceMatrixCSR{Tv,Ti,1}) where {Tv,Ti} = A.rowPtr | ||
|
||
Preconditioners.sparsemat_format_type(::CUSPARSE.CuSparseDeviceMatrixCSC{Tv,Ti,1}) where {Tv,Ti} = CSCFormat | ||
Abdelrahman912 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Preconditioners.sparsemat_format_type(::CUSPARSE.CuSparseDeviceMatrixCSR{Tv,Ti,1}) where {Tv,Ti} = CSRFormat | ||
Abdelrahman912 marked this conversation as resolved.
Show resolved
Hide resolved
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
# remove the following code once PR (https://github.com/JuliaGPU/CUDA.jl/pull/2720) is merged ## | ||
CUDA.CUSPARSE.CuSparseMatrixCSR{T}(Mat::SparseMatrixCSR) where {T} = | ||
CUDA.CUSPARSE.CuSparseMatrixCSR{T}(CuVector{Cint}(Mat.rowptr), CuVector{Cint}(Mat.colval), | ||
CuVector{T}(Mat.nzval), size(Mat)) | ||
|
||
|
||
CUSPARSE.CuSparseMatrixCSC{T}(Mat::SparseMatrixCSR) where {T} = | ||
CUSPARSE.CuSparseMatrixCSC{T}(CUSPARSE.CuSparseMatrixCSR(Mat)) | ||
|
||
SparseMatricesCSR.SparseMatrixCSR(A::CUSPARSE.CuSparseMatrixCSR) = | ||
SparseMatrixCSR(CUSPARSE.SparseMatrixCSC(A)) # no direct conversion (gpu_CSR -> cpu_CSC -> cpu_CSR) | ||
|
||
Adapt.adapt_storage(::Type{CuArray}, xs::SparseMatrixCSR) = | ||
CUSPARSE.CuSparseMatrixCSR(xs) | ||
|
||
Adapt.adapt_storage(::Type{CuArray{T}}, xs::SparseMatrixCSR) where {T} = | ||
CUSPARSE.CuSparseMatrixCSR{T}(xs) | ||
|
||
Adapt.adapt_storage(::Type{Array}, mat::CUSPARSE.CuSparseMatrixCSR) = | ||
SparseMatrixCSR(mat) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
module Preconditioners | ||
|
||
using SparseArrays, SparseMatricesCSR | ||
using LinearSolve | ||
import LinearSolve: \ | ||
using Adapt | ||
using UnPack | ||
import KernelAbstractions: Backend, @kernel, @index, @ndrange, @groupsize, @print, functional, | ||
CPU,synchronize | ||
import SparseArrays: getcolptr,getnzval | ||
import SparseMatricesCSR: getnzval | ||
import LinearAlgebra: Symmetric | ||
|
||
## Generic Code # | ||
|
||
# CSR and CSC are exact the same in symmetric matrices,so we need to hold symmetry info | ||
# in order to be exploited in cases in which one format has better access pattern than the other. | ||
abstract type AbstractMatrixSymmetry end | ||
struct SymmetricMatrix <: AbstractMatrixSymmetry end | ||
struct NonSymmetricMatrix <: AbstractMatrixSymmetry end | ||
termi-official marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
abstract type AbstractMatrixFormat end | ||
struct CSRFormat <: AbstractMatrixFormat end | ||
struct CSCFormat <: AbstractMatrixFormat end | ||
|
||
|
||
# Why using these traits? | ||
# Since we are targeting multiple backends, but unfortunately, all the sparse matrix CSC/CSR on all | ||
# backends don't share the same supertype (e.g. AbstractSparseMatrixCSC/AbstractSparseMatrixCSR) | ||
# e.g. CUSPARSE.CuSparseDeviceMatrixCSC <:SparseArrays.AbstractSparseMatrixCSC → false | ||
# So we need to define our own traits to identify the format of the sparse matrix | ||
sparsemat_format_type(::SparseMatrixCSC) = CSCFormat | ||
sparsemat_format_type(::SparseMatrixCSR) = CSRFormat | ||
|
||
#TODO: remove once https://github.com/JuliaGPU/CUDA.jl/pull/2740 is merged | ||
convert_to_backend(backend::Backend, A::AbstractSparseMatrix) = | ||
adapt(backend, A) # fallback value, specific backends are to be extended in their corresponding extensions. | ||
|
||
# Why? because we want to circumvent piracy when extending these functions for device backend (e.g. CuSparseDeviceMatrixCSR) | ||
# TODO: find a more robust solution to dispatch the correct function | ||
colvals(A::SparseMatrixCSR) = SparseMatricesCSR.colvals(A) | ||
getrowptr(A::SparseMatrixCSR) = SparseMatricesCSR.getrowptr(A) | ||
|
||
include("l1_gauss_seidel.jl") | ||
|
||
export L1GSPrecBuilder | ||
|
||
end |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.