Skip to content

L1 Gauss Seidel preconditioner #191

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 51 commits into
base: main
Choose a base branch
from

Conversation

Abdelrahman912
Copy link
Collaborator

@Abdelrahman912 Abdelrahman912 commented Feb 28, 2025

Rough ideas for the data structures that incorporate partitioned arrays with CPU example of Gauss Seidel with and without partitions 📚

@codecov-commenter
Copy link

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

Thanks for integrating Codecov - We've got you covered ☂️

Copy link

codecov bot commented Mar 26, 2025

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

Thanks for integrating Codecov - We've got you covered ☂️

@Abdelrahman912 Abdelrahman912 changed the title L1 Gauss Seidel smoother L1 Gauss Seidel preconditioner Mar 28, 2025
@termi-official
Copy link
Collaborator

termi-official commented Apr 9, 2025

Is it possible that you just use too many threads? Also, do you pin the threads for the benchmarks? Performance is more fundamentally broken here. If you try the matrix "MathWorks/Kuu" you see that there is a massive performance diff.

@Abdelrahman912
Copy link
Collaborator Author

Unsymmetric Stats

Unpreconditioned

SimpleStats
 niter: 981
 solved: true
 inconsistent: false
 residuals: [ 6.2e+01  6.2e+01  6.2e+01 ...  1.1e-06  1.0e-06  8.2e-07 ]
 Aresiduals: []
 κ₂(A): []
 timer: 719.44ms
 status: solution good enough given atol and rtol

Preconditioned

nparts = 8 (same as cores)
partsize = 414

SimpleStats
 niter: 158
 solved: true
 inconsistent: false
 residuals: [ 4.5e+00  4.3e+00  3.9e+00 ...  1.1e-07  8.8e-08  6.5e-08 ]
 Aresiduals: []
 κ₂(A): []
 timer: 186.18ms
 status: solution good enough given atol and rtol

@Abdelrahman912
Copy link
Collaborator Author

Abdelrahman912 commented Apr 10, 2025

Symmetric Problem:

bcsstk15 https://math.nist.gov/MatrixMarket/data/Harwell-Boeing/bcsstruc2/bcsstruc2.html

Unpreconditioned stats:

SimpleStats
 niter: 3067
 solved: true
 inconsistent: false
 residuals: [ 6.3e+01  5.9e+01  5.8e+01 ...  9.8e-07  9.6e-07  9.2e-07 ]
 Aresiduals: []
 κ₂(A): []
 timer: 10.24s
 status: solution good enough given atol and rtol

Preconditioned stats:

SimpleStats
 niter: 497
 solved: true
 inconsistent: false
 residuals: [ 2.4e+00  3.7e-04  2.3e-04 ...  5.4e-08  5.2e-08  5.0e-08 ]
 Aresiduals: []
 κ₂(A): []
 timer: 487.80ms
 status: solution good enough given atol and rtol

Copy link
Collaborator

@termi-official termi-official left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great already. Some last fine tuning and we are ready to go.

Comment on lines +26 to +27
# SparseMatricesCSR.colvals(A::CUSPARSE.CuSparseDeviceMatrixCSR{Tv,Ti,1}) where {Tv,Ti} = A.colVal
# SparseMatricesCSR.getrowptr(A::CUSPARSE.CuSparseDeviceMatrixCSR{Tv,Ti,1}) where {Tv,Ti} = A.rowPtr
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these two lines should be added in JuliaGPU/CUDA.jl#2720, right ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I am not sure about this one. Probably not, because SparseMatricesCSR is not an interface package. I would discuss this in the linked PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a compat entry for the new CUDA release?


[targets]
test = ["Aqua", "DelimitedFiles", "JET", "Pkg", "StaticArrays", "Tensors", "Test"]
test = ["Aqua", "DelimitedFiles", "JET", "Pkg", "StaticArrays", "Tensors", "Test","MatrixDepot","ThreadPinning"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need ThreadPinning for testing?

# Adapt.adapt(::CUDABackend, x::Vector) = x |> cu # not needed
# Adapt.adapt(::CUDABackend, x::CuVector) = x # not needed

# TODO: remove this function if back compatibility is not needed
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backward compat with what?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are missing a test for the ThreadedSparseCSR format here

@doc raw"""
L1GSPreconditioner{Partitioning, VectorType}

The ℓ₁ Gauss–Seidel preconditioner is a robust and parallel-friendly preconditioner, particularly effective for sparse and ill-conditioned systems.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ill-conditioned systems

Where have you got that statement from?

@@ -10,7 +15,7 @@ import CUDA:
import Thunderbolt:
UnPack.@unpack,
SimpleMesh,
SparseMatrixCSR, SparseMatrixCSC,
SparseMatrixCSR, SparseMatrixCSC, AbstractSparseMatrix,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we import these types here from their respective packages instead of Thunderbolt? We might hit weird bugs otherwise.

Comment on lines +5 to +7
# PIRACY ALERT: the following code is commented out to avoid piracy
# SparseMatricesCSR.colvals(A::CUSPARSE.CuSparseDeviceMatrixCSR{Tv,Ti,1}) where {Tv,Ti} = A.colVal
# SparseMatricesCSR.getrowptr(A::CUSPARSE.CuSparseDeviceMatrixCSR{Tv,Ti,1}) where {Tv,Ti} = A.rowPtr
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# PIRACY ALERT: the following code is commented out to avoid piracy
# SparseMatricesCSR.colvals(A::CUSPARSE.CuSparseDeviceMatrixCSR{Tv,Ti,1}) where {Tv,Ti} = A.colVal
# SparseMatricesCSR.getrowptr(A::CUSPARSE.CuSparseDeviceMatrixCSR{Tv,Ti,1}) where {Tv,Ti} = A.rowPtr

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove these here for now and add a comment to the tests that prior to 2720 we need to add these dispatches? I do not want to artificially postpone this PR here further due to missing optional stuff in the dependencies.

Comment on lines +101 to +105
if __cuda_version__ >= __min_cuda_version__
include("cuda/cuda_preconditioner.jl")
else
@warn("CuThunderboltExt.jl: CUDA.jl version is too old $__cuda_version__ <$__min_cuda_version__, skipping `cuda_preconditioner.jl`.")
end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if __cuda_version__ >= __min_cuda_version__
include("cuda/cuda_preconditioner.jl")
else
@warn("CuThunderboltExt.jl: CUDA.jl version is too old $__cuda_version__ <$__min_cuda_version__, skipping `cuda_preconditioner.jl`.")
end
include("cuda/cuda_preconditioner.jl")
  • add a compat entry to the Project.toml.

# y: preconditioned residual
@unpack partitioning,B,D = P
@unpack backend = partitioning
@. y = x / (B + D)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we precompute 1/(B+D)? This should give us a speedup, as multiplication is usually way faster than division.

Also, I think we are missing the Gauss-Seidel part on the local block?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants