Skip to content

Comments

Avoid adjoint sparse matrix times Diagonal#179

Merged
JoshuaLampert merged 6 commits intomainfrom
spdiagm-sgn
Apr 29, 2025
Merged

Avoid adjoint sparse matrix times Diagonal#179
JoshuaLampert merged 6 commits intomainfrom
spdiagm-sgn

Conversation

@JoshuaLampert
Copy link
Member

@JoshuaLampert JoshuaLampert commented Apr 28, 2025

Using Diagonalslowed down the simulation dramatically as observed in #178. With spdiagm this is fixed.
Running serre_green_naghdi_soliton_upwind.jl. on main:

───────────────────────────────────────────────────────────────────────────────────────────
              DispersiveSWE                       Time                    Allocations      
                                         ───────────────────────   ────────────────────────
            Tot / % measured:                 3.40s /  99.8%            879MiB /  99.9%    

Section                          ncalls     time    %tot     avg     alloc    %tot      avg
───────────────────────────────────────────────────────────────────────────────────────────
rhs!                              1.62k    3.39s  100.0%  2.09ms    877MiB  100.0%   554KiB
  assembling elliptic operator    1.62k    3.27s   96.4%  2.01ms    584MiB   66.5%   368KiB
  solving elliptic system         1.62k    116ms    3.4%  71.5μs    293MiB   33.4%   185KiB
  hyperbolic terms                1.62k   4.32ms    0.1%  2.66μs     0.00B    0.0%    0.00B
  ~rhs!~                          1.62k    620μs    0.0%   382ns   1.25KiB    0.0%    0.79B
analyze solution                      3    538μs    0.0%   179μs    148KiB    0.0%  49.3KiB
───────────────────────────────────────────────────────────────────────────────────────────

This PR (old version with spdiagm):

───────────────────────────────────────────────────────────────────────────────────────────
              DispersiveSWE                       Time                    Allocations      
                                         ───────────────────────   ────────────────────────
            Tot / % measured:                 237ms /  97.5%            861MiB /  99.9%    

Section                          ncalls     time    %tot     avg     alloc    %tot      avg
───────────────────────────────────────────────────────────────────────────────────────────
rhs!                              1.62k    229ms   99.3%   141μs    860MiB  100.0%   542KiB
  solving elliptic system         1.62k    118ms   50.9%  72.4μs    293MiB   34.1%   185KiB
  assembling elliptic operator    1.62k    107ms   46.5%  66.1μs    566MiB   65.9%   357KiB
  hyperbolic terms                1.62k   4.10ms    1.8%  2.53μs     0.00B    0.0%    0.00B
  ~rhs!~                          1.62k    515μs    0.2%   317ns   1.25KiB    0.0%    0.79B
analyze solution                      3   1.52ms    0.7%   506μs    150KiB    0.0%  49.9KiB
───────────────────────────────────────────────────────────────────────────────────────────

This PR (new version with Diagonal and paranthesis):

───────────────────────────────────────────────────────────────────────────────────────────
              DispersiveSWE                       Time                    Allocations      
                                         ───────────────────────   ────────────────────────
            Tot / % measured:                 208ms /  96.6%            778MiB /  99.8%    

Section                          ncalls     time    %tot     avg     alloc    %tot      avg
───────────────────────────────────────────────────────────────────────────────────────────
rhs!                              1.62k    201ms   99.8%   124μs    776MiB  100.0%   490KiB
  solving elliptic system         1.62k    116ms   57.8%  71.7μs    293MiB   37.8%   185KiB
  assembling elliptic operator    1.62k   79.5ms   39.5%  49.0μs    483MiB   62.2%   305KiB
  hyperbolic terms                1.62k   4.20ms    2.1%  2.59μs     0.00B    0.0%    0.00B
  ~rhs!~                          1.62k    703μs    0.3%   433ns   1.25KiB    0.0%    0.79B
analyze solution                      3    487μs    0.2%   162μs    148KiB    0.0%  49.3KiB
───────────────────────────────────────────────────────────────────────────────────────────

Fixes #178.

@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

📢 Thoughts on this report? Let us know!

@JoshuaLampert JoshuaLampert linked an issue Apr 28, 2025 that may be closed by this pull request
@github-actions
Copy link
Contributor

github-actions bot commented Apr 28, 2025

Benchmark Results

main ec88361... main/ec88361ecb248a...
bbm_1d/bbm_1d_basic.jl 13.9 ± 0.3 μs 13.9 ± 0.36 μs 0.998
bbm_1d/bbm_1d_fourier.jl 0.224 ± 0.31 ms 0.224 ± 0.31 ms 1
bbm_bbm_1d/bbm_bbm_1d_basic_reflecting.jl 0.0812 ± 0.00033 ms 0.0823 ± 0.00042 ms 0.987
bbm_bbm_1d/bbm_bbm_1d_dg.jl 0.0344 ± 0.00048 ms 0.0344 ± 0.00052 ms 1
bbm_bbm_1d/bbm_bbm_1d_relaxation.jl 27.1 ± 0.42 μs 27.3 ± 0.46 μs 0.991
bbm_bbm_1d/bbm_bbm_1d_upwind_relaxation.jl 0.0484 ± 0.00055 ms 0.0487 ± 0.0005 ms 0.994
hyperbolic_serre_green_naghdi_1d/hyperbolic_serre_green_naghdi_dingemans.jl 4.2 ± 0.016 μs 4.14 ± 0.016 μs 1.01
serre_green_naghdi_1d/serre_green_naghdi_well_balanced.jl 0.197 ± 0.0061 ms 0.195 ± 0.0061 ms 1.01
svaerd_kalisch_1d/svaerd_kalisch_1d_dingemans_relaxation.jl 0.146 ± 0.003 ms 0.144 ± 0.0028 ms 1.01
time_to_load 1.95 ± 0.022 s 1.95 ± 0.0018 s 1

Benchmark Plots

A plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR.
Benchmark Plot

@coveralls
Copy link
Collaborator

coveralls commented Apr 28, 2025

Pull Request Test Coverage Report for Build 14727735913

Details

  • 5 of 5 (100.0%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 98.073%

Totals Coverage Status
Change from base Build 14680067559: 0.0%
Covered Lines: 1883
Relevant Lines: 1920

💛 - Coveralls

ranocha
ranocha previously approved these changes Apr 29, 2025
Copy link
Member

@ranocha ranocha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I guess the reason was that there is a more efficient implementation of the matrix matrix product for sparse matrices vs. sparse and Diagonal matrices.

Could you please file an issue to check for other places where this pattern occurs, e.g.,

(maybe?)
(maybe?)
return Symmetric(Diagonal(M_h) + D1mat' * Diagonal(M_h3_3) * D1mat)

etc. ? We can nevertheless merge this PR for now and release a new version including this fix.

@JoshuaLampert
Copy link
Member Author

JoshuaLampert commented Apr 29, 2025

Thanks! I guess the reason was that there is a more efficient implementation of the matrix matrix product for sparse matrices vs. sparse and Diagonal matrices.

I don't really understand why the sparse times sparse matrix version should be more efficient than the sparse times Diagonal matrix version since there is a specialized method. The diagonal sparse matrix has no zeros on the diagonal so it has the same nnz as the Diagonal. So I would think the sparse times Diagonal version should be as efficient as sparse times sparse, but whatever... I saw that coincidentally there was a PR in SparseArrays.jl merged just yesterday, which addressed Diagonal times sparse matrix products. Maybe this also increased performance for the 3-arg mul!, but I didn't test that.
Edit: Ahh, the issue is with the Adjoint{Float64, SparseMatrixCSC{Float64, Int64}} times Diagonal matrix product:

julia> using DispersiveShallowWater, LinearAlgebra, SparseArrays, BenchmarkTools

julia> trixi_include(joinpath(examples_dir(), "serre_green_naghdi_1d", "serre_green_naghdi_dingemans.jl"), tspan = (0.0, 1e-10));
[...]
julia> A = sparse(D1.plus);

julia> B = Diagonal(ones(N));

julia> @benchmark A * B
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (min  max):  2.678 μs  237.673 μs  ┊ GC (min  max):  0.00%  95.07%
 Time  (median):     3.301 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   4.008 μs ±   7.866 μs  ┊ GC (mean ± σ):  13.67% ±  7.12%

    ▄█▆█▆                                                      
  ▂▄██████▄▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂▂▂▁▂▁▂▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▂▂▂ ▃
  2.68 μs         Histogram: frequency by time          10 μs <

 Memory estimate: 36.32 KiB, allocs estimate: 10.

julia> @benchmark A' * B
BenchmarkTools.Trial: 2560 samples with 1 evaluation.
 Range (min  max):  1.930 ms   3.540 ms  ┊ GC (min  max): 0.00%  40.94%
 Time  (median):     1.946 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.952 ms ± 58.793 μs  ┊ GC (mean ± σ):  0.10% ±  1.45%

      ▃▃▄▆█▆▃▂                                                
  ▂▃▆██████████▆▅▅▄▄▃▃▃▃▃▃▂▃▃▂▂▂▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂ ▃
  1.93 ms        Histogram: frequency by time        2.03 ms <

 Memory estimate: 136.45 KiB, allocs estimate: 15.

This seems like a missing feature in SparseArrays.jl. I guess the "proper" solution of this would be adding a method for adjoint sparse times Diagonal matrix product. Maybe I'll create an issue there. I created JuliaSparse/SparseArrays.jl#619.
Therefore, another way to solve this issue is by not using spdiagm, but using parentheses to first compute the right Diagonal times sparse matrix resulting in a sparse matrix and then computing the (fast) left adjoint sparse times sparse matrix. Some benchmarks of rhs! below:
main:

julia> result["serre_green_naghdi_1d/serre_green_naghdi_soliton_upwind.jl"]
BenchmarkTools.Trial: 2413 samples with 1 evaluation.
 Range (min  max):  2.020 ms    4.884 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     2.041 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.070 ms ± 190.473 μs  ┊ GC (mean ± σ):  0.95% ± 4.41%

  ██▄▂▁                                                        
  ██████▄▆▆▃▁▁▁▃▁▁▁▁▁▁▁▁▁▁▃▁▃▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▄▁▄▅▅▅▄▅ █
  2.02 ms      Histogram: log(frequency) by time      3.17 ms <

 Memory estimate: 553.49 KiB, allocs estimate: 82.

with spdiagm:

julia> result["serre_green_naghdi_1d/serre_green_naghdi_soliton_upwind.jl"]
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min  max):   93.304 μs    5.776 ms  ┊ GC (min  max): 0.00%  34.14%
 Time  (median):     101.072 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   118.737 μs ± 149.333 μs  ┊ GC (mean ± σ):  9.74% ±  7.95%

  █▃  ▃                                                         ▁
  ██▆▇█▅▅▄▄▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▅ █
  93.3 μs       Histogram: log(frequency) by time       1.27 ms <

 Memory estimate: 542.41 KiB, allocs estimate: 125.

with parentheses:

julia> result["serre_green_naghdi_1d/serre_green_naghdi_soliton_upwind.jl"]
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min  max):   85.128 μs    5.753 ms  ┊ GC (min  max):  0.00%  83.22%
 Time  (median):      95.589 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   115.297 μs ± 167.201 μs  ┊ GC (mean ± σ):  10.12% ±  7.22%

  ▁▄▇███▇▆▆▅▅▅▄▃▃▂▂▁               ▁▁▂▂▂▂▁                      ▃
  █████████████████████▇▇▆▇▇▇▇▅▆▆▇████████▇█▇▇▇▅▅▇▆▆▅▅▅▆▁▄▃▃▅▁▄ █
  85.1 μs       Histogram: log(frequency) by time        199 μs <

 Memory estimate: 489.81 KiB, allocs estimate: 90.

So this means the parentheses solution seams the fastest, which also makes sense to me because it has the most structure (in contrast to the spdiagm approach we also know where the nnz are, which could speed things up a little bit).

I double-checked the other sparse times Diagonal matrix product and updated the cases, where we had a sparse adjoint times Diagonal. These were not so performance critical though because they only appeared in creating the cache, but not in the rhs!.

JoshuaLampert and others added 3 commits April 29, 2025 11:16
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@JoshuaLampert JoshuaLampert requested a review from ranocha April 29, 2025 09:23
@JoshuaLampert JoshuaLampert changed the title Use spdiagm in Serre-Green-Naghdi equations Avoid adjoint sparse matrix times Diagonal Apr 29, 2025
Copy link
Member

@ranocha ranocha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@JoshuaLampert JoshuaLampert merged commit 5e4830a into main Apr 29, 2025
11 checks passed
@JoshuaLampert JoshuaLampert deleted the spdiagm-sgn branch April 29, 2025 10:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Performance Issue in Serre Green Naghdi Matrix Solve

4 participants