Skip to content

Test Implicit free surface solvers performance #4387

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

simone-silvestri
Copy link
Collaborator

Oceananigans' custom PCG has a couple of performance problems. If the objective is eventually to remove the Matrix solver we need to make sure that the PCG is up to speed. This PR is propedeutic to #4386 and refurbishes some benchmark tests that we can use to improve the PCG performance. The result at the moment is that the PCG works well only on rectilinear non-immersed grids.

Another PR will come before #4386 to bring the PCG up to speed

The output of the benchmark is here

32×10 DataFrame
 Row │ architectures  grid_types             free_surface_types         min         median      mean        max         memory      allocs  samples
     │ Any            Any                    Any                        Any         Any         Any         Any         Any         Any     Any
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ CPU            ImmersedLatGrid        ExplicitFreeSurface        50.997 ms   51.075 ms   51.320 ms   52.579 ms   1.11 MiB    1070    10
   2 │ CPU            ImmersedLatGrid        MatrixImplicitFreeSurface  132.211 ms  142.570 ms  143.763 ms  162.111 ms  1.99 MiB    3065    10
   3 │ CPU            ImmersedLatGrid        PCGImplicitFreeSurface     373.622 ms  391.803 ms  398.345 ms  438.238 ms  136.70 MiB  35528   10
   4 │ CPU            ImmersedLatGrid        SplitExplicitFreeSurface   179.234 ms  235.579 ms  238.157 ms  309.451 ms  5.22 MiB    4323    10
   5 │ CPU            ImmersedRecGrid        ExplicitFreeSurface        41.571 ms   42.001 ms   42.036 ms   42.755 ms   746.29 KiB  1027    10
   6 │ CPU            ImmersedRecGrid        MatrixImplicitFreeSurface  142.778 ms  144.371 ms  144.757 ms  148.521 ms  1.31 MiB    3017    10
   7 │ CPU            ImmersedRecGrid        PCGImplicitFreeSurface     187.274 ms  190.594 ms  194.484 ms  215.125 ms  61.00 MiB   17563   10
   8 │ CPU            ImmersedRecGrid        SplitExplicitFreeSurface   293.361 ms  354.231 ms  352.831 ms  406.777 ms  3.78 MiB    4349    10
   9 │ CPU            LatitudeLongitudeGrid  ExplicitFreeSurface        27.922 ms   28.256 ms   28.422 ms   29.354 ms   562.52 KiB  1006    10
  10 │ CPU            LatitudeLongitudeGrid  MatrixImplicitFreeSurface  93.440 ms   94.291 ms   96.596 ms   113.497 ms  1.00 MiB    2937    10
  11 │ CPU            LatitudeLongitudeGrid  PCGImplicitFreeSurface     259.576 ms  277.398 ms  280.766 ms  301.446 ms  130.83 MiB  35400   10
  12 │ CPU            LatitudeLongitudeGrid  SplitExplicitFreeSurface   114.787 ms  119.239 ms  119.896 ms  126.970 ms  2.61 MiB    4170    10
  13 │ CPU            RectilinearGrid        ExplicitFreeSurface        24.716 ms   25.500 ms   25.504 ms   26.809 ms   374.48 KiB  963     10
  14 │ CPU            RectilinearGrid        MatrixImplicitFreeSurface  73.609 ms   80.298 ms   80.621 ms   89.221 ms   696.18 KiB  2885    10
  15 │ CPU            RectilinearGrid        PCGImplicitFreeSurface     40.315 ms   41.191 ms   41.080 ms   41.950 ms   2.85 MiB    3610    10
  16 │ CPU            RectilinearGrid        SplitExplicitFreeSurface   206.162 ms  228.030 ms  225.412 ms  240.865 ms  1.93 MiB    4196    10
  17 │ GPU            ImmersedLatGrid        ExplicitFreeSurface        50.364 ms   51.378 ms   51.420 ms   52.472 ms   1.11 MiB    1070    10
  18 │ GPU            ImmersedLatGrid        MatrixImplicitFreeSurface  115.014 ms  116.209 ms  118.472 ms  124.939 ms  1.99 MiB    3065    10
  19 │ GPU            ImmersedLatGrid        PCGImplicitFreeSurface     372.160 ms  392.106 ms  391.975 ms  410.496 ms  136.70 MiB  35530   10
  20 │ GPU            ImmersedLatGrid        SplitExplicitFreeSurface   199.555 ms  281.901 ms  270.142 ms  341.260 ms  5.22 MiB    4323    10
  21 │ GPU            ImmersedRecGrid        ExplicitFreeSurface        918.828 μs  960.679 μs  1.022 ms    1.530 ms    905.59 KiB  2112    10
  22 │ GPU            ImmersedRecGrid        MatrixImplicitFreeSurface  12.784 ms   12.935 ms   12.970 ms   13.494 ms   1.88 MiB    20020   10
  23 │ GPU            ImmersedRecGrid        PCGImplicitFreeSurface     24.841 ms   25.105 ms   25.156 ms   26.013 ms   6.65 MiB    47592   10
  24 │ GPU            ImmersedRecGrid        SplitExplicitFreeSurface   3.221 ms    3.473 ms    3.500 ms    3.958 ms    2.43 MiB    9096    10
  25 │ GPU            LatitudeLongitudeGrid  ExplicitFreeSurface        1.016 ms    1.058 ms    1.130 ms    1.701 ms    701.93 KiB  2861    10
  26 │ GPU            LatitudeLongitudeGrid  MatrixImplicitFreeSurface  19.470 ms   22.915 ms   22.875 ms   27.270 ms   1.79 MiB    31199   10
  27 │ GPU            LatitudeLongitudeGrid  PCGImplicitFreeSurface     100.270 ms  100.795 ms  105.360 ms  141.808 ms  16.34 MiB   183968  10
  28 │ GPU            LatitudeLongitudeGrid  SplitExplicitFreeSurface   3.708 ms    4.011 ms    4.010 ms    4.719 ms    2.17 MiB    10043   10
  29 │ GPU            RectilinearGrid        ExplicitFreeSurface        717.393 μs  732.261 μs  797.732 μs  1.331 ms    458.16 KiB  1849    10
  30 │ GPU            RectilinearGrid        MatrixImplicitFreeSurface  11.960 ms   12.040 ms   12.073 ms   12.491 ms   1.15 MiB    19490   10
  31 │ GPU            RectilinearGrid        PCGImplicitFreeSurface     3.365 ms    3.414 ms    3.574 ms    4.844 ms    1.04 MiB    7038    10
  32 │ GPU            RectilinearGrid        SplitExplicitFreeSurface   2.275 ms    2.302 ms    2.390 ms    3.014 ms    1.50 MiB    8542    10

@glwagner
Copy link
Member

@amontoison @simone-silvestri the objective is to use Krylove solvers, not Oceananigans PCG right?

@glwagner
Copy link
Member

Also, is this work worthwhile if we don't want to use the matrix solver? The maintenance and testing has a cost, I am wondering whether it is worth it.

@amontoison
Copy link
Collaborator

Yes, the goal is to use the solvers in Krylov.jl, which should not be less efficient than your PCG here or the one in IterativeSolvers.jl.

@amontoison
Copy link
Collaborator

amontoison commented Apr 12, 2025

@simone-silvestri The KrylovSolver is now available in main, is it easy to try it and see the difference with the current version of PCG?

@simone-silvestri
Copy link
Collaborator Author

simone-silvestri commented Apr 15, 2025

I didn't realize the goal was to remove also the custom PCG. I am very in favor! Let me add the KrylovSolver to this PR and try the test again. I think we can remove a lot of code.

@simone-silvestri
Copy link
Collaborator Author

@amontoison I think I need your help here. With the Krylov solver, the benchmarks error with

[2025/04/15 05:05:38.140] INFO  Benchmarking 1/8: (GPU, :RectilinearGrid, :KrylovImplicitFreeSurface)...
ERROR: LoadError: DomainError with -0.0506264973347019:
sqrt was called with a negative real argument but will only return a complex result if called with a complex argument. Try sqrt(Complex(x)).
Stacktrace:
  [1] throw_complex_domainerror(f::Symbol, x::Float64)
    @ Base.Math ./math.jl:33
  [2] sqrt(x::Float64)
    @ Base.Math ./math.jl:686
  [3] cg!(solver::Krylov.CgSolver{…}, A::Oceananigans.Solvers.KrylovOperator{…}, b::Oceananigans.Solvers.KrylovField{…}; M::Oceananigans.Solvers.KrylovPreconditioner{…}, ldiv::Bool, radius::Float64, linesearch::Bool, atol::Float64, rtol::Float64, itmax::Int64, timemax::Float64, verbose::Int64, history::Bool, callback::Krylov.var"#837#845", iostream::Core.CoreSTDOUT)
    @ Krylov ~/.julia/packages/Krylov/wgV9p/src/cg.jl:146
  [4] cg!
    @ ~/.julia/packages/Krylov/wgV9p/src/cg.jl:108 [inlined]
  [5] solve!
    @ ~/.julia/packages/Krylov/wgV9p/src/krylov_solve.jl:159 [inlined]
  [6] solve!(::Field{…}, ::KrylovSolver{…}, ::Field{…}, ::Field{…}, ::Vararg{…}; kwargs::@Kwargs{})
    @ Oceananigans.Solvers ~/development/Oceananigans.jl/src/Solvers/krylov_solver.jl:150
  [7] solve!
    @ ~/development/Oceananigans.jl/src/Solvers/krylov_solver.jl:147 [inlined]
  [8] solve!
    @ ~/development/Oceananigans.jl/src/Models/HydrostaticFreeSurfaceModels/pcg_implicit_free_surface_solver.jl:95 [inlined]
  [9] step_free_surface!(free_surface::ImplicitFreeSurface{…}, model::HydrostaticFreeSurfaceModel{…}, timestepper::Oceananigans.TimeSteppers.QuasiAdamsBashforth2TimeStepper{…}, Δt::Float64)
    @ Oceananigans.Models.HydrostaticFreeSurfaceModels ~/development/Oceananigans.jl/src/Models/HydrostaticFreeSurfaceModels/implicit_free_surface.jl:145
 [10] ab2_step!
    @ ~/development/Oceananigans.jl/src/Models/HydrostaticFreeSurfaceModels/hydrostatic_free_surface_ab2_step.jl:24 [inlined]
 [11] time_step!(model::HydrostaticFreeSurfaceModel{…}, Δt::Float64; callbacks::Vector{…}, euler::Bool)
    @ Oceananigans.TimeSteppers ~/development/Oceananigans.jl/src/TimeSteppers/quasi_adams_bashforth_2.jl:99
 [12] time_step!(model::HydrostaticFreeSurfaceModel{…}, Δt::Float64)
    @ Oceananigans.TimeSteppers ~/development/Oceananigans.jl/src/TimeSteppers/quasi_adams_bashforth_2.jl:74
 [13] benchmark_hydrostatic_model(Arch::Type, grid_type::Symbol, free_surface_type::Symbol)
    @ Main ~/development/Oceananigans.jl/benchmark/benchmark_hydrostatic_model.jl:70
 [14] run_benchmarks(benchmark_fun::Function; kwargs::@Kwargs{})
    @ Benchmarks ~/development/Oceananigans.jl/benchmark/src/Benchmarks.jl:57
 [15] top-level scope
    @ ~/development/Oceananigans.jl/benchmark/benchmark_hydrostatic_model.jl:94
 [16] include(fname::String)
    @ Base.MainInclude ./client.jl:494
 [17] top-level scope
    @ REPL[1]:1
in expression starting at /home/ssilvest/development/Oceananigans.jl/benchmark/benchmark_hydrostatic_model.jl:94
Some type information was truncated. Use `show(err)` to see complete types.

Do I need to change something with respect to the custom PCG to use the Krylov solver?

@simone-silvestri
Copy link
Collaborator Author

It seems to be a problem only on the RectilinearGrid

@simone-silvestri
Copy link
Collaborator Author

simone-silvestri commented Apr 15, 2025

These are the benchmarks for the KrylovSolver on a latitude longitude grid.

julia> df2
4×10 DataFrame
 Row │ architectures  grid_types             free_surface_types         min         median      mean        max         memory      allocs  samples
     │ Any            Any                    Any                        Any         Any         Any         Any         Any         Any     Any
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ CPU            ImmersedLatGrid        KrylovImplicitFreeSurface  298.926 ms  313.780 ms  313.586 ms  328.200 ms  131.67 MiB  31468   10
   2 │ CPU            LatitudeLongitudeGrid  KrylovImplicitFreeSurface  243.694 ms  299.635 ms  294.065 ms  350.232 ms  128.77 MiB  31340   10
   3 │ GPU            ImmersedLatGrid        KrylovImplicitFreeSurface  328.853 ms  361.488 ms  372.190 ms  437.529 ms  131.67 MiB  31470   10
   4 │ GPU            LatitudeLongitudeGrid  KrylovImplicitFreeSurface  82.258 ms   87.060 ms   86.978 ms   90.977 ms   9.77 MiB    148681  10

@amontoison any suggestion for a good preconditioner to pass in?

@simone-silvestri
Copy link
Collaborator Author

I think the preconditioner acts differently with a KrylovSolver with respect to our custom ConjugateGradientSolver. The negative sqrt error appears only using a preconditioner.

@simone-silvestri simone-silvestri changed the title Test custom PCG performance Test Implicit free surface solvers performance Apr 15, 2025
@simone-silvestri
Copy link
Collaborator Author

As a sidenote, we need to improve the interface of the ImplicitFreeSurface. At the moment we are passing settings collected in a dictionary and inferred in the build_implicit_solver function. However, this method seems overly complicated and very difficult to change / interface with. Hopefully, if we end up using just one solver clean up the user interface

@simone-silvestri
Copy link
Collaborator Author

Ok these are the benchmarks not using a preconditioner for the KrylovSolver

julia> df2
24×10 DataFrame
 Row │ architectures  grid_types             free_surface_types         min         median      mean        max         memory      allocs  samples
     │ Any            Any                    Any                        Any         Any         Any         Any         Any         Any     Any
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ CPU            ImmersedLatGrid        KrylovImplicitFreeSurface  334.365 ms  356.902 ms  355.691 ms  385.320 ms  131.67 MiB  31470   10
   2 │ CPU            ImmersedLatGrid        MatrixImplicitFreeSurface  173.313 ms  187.766 ms  192.494 ms  223.839 ms  1.99 MiB    3065    10
   3 │ CPU            ImmersedLatGrid        PCGImplicitFreeSurface     391.530 ms  404.177 ms  413.479 ms  455.854 ms  131.91 MiB  34366   10
   4 │ CPU            ImmersedRecGrid        KrylovImplicitFreeSurface  163.314 ms  170.190 ms  188.254 ms  238.819 ms  61.91 MiB   16136   10
   5 │ CPU            ImmersedRecGrid        MatrixImplicitFreeSurface  143.925 ms  144.347 ms  145.509 ms  153.327 ms  1.31 MiB    3017    10
   6 │ CPU            ImmersedRecGrid        PCGImplicitFreeSurface     198.658 ms  207.317 ms  211.586 ms  229.593 ms  60.83 MiB   17507   10
   7 │ CPU            LatitudeLongitudeGrid  KrylovImplicitFreeSurface  279.414 ms  294.586 ms  293.728 ms  310.094 ms  128.77 MiB  31340   10
   8 │ CPU            LatitudeLongitudeGrid  MatrixImplicitFreeSurface  139.786 ms  160.065 ms  162.874 ms  192.073 ms  1.00 MiB    2937    10
   9 │ CPU            LatitudeLongitudeGrid  PCGImplicitFreeSurface     287.915 ms  324.909 ms  327.460 ms  376.602 ms  130.59 MiB  35278   10
  10 │ CPU            RectilinearGrid        KrylovImplicitFreeSurface  133.404 ms  135.980 ms  138.901 ms  159.827 ms  62.75 MiB   16458   10
  11 │ CPU            RectilinearGrid        MatrixImplicitFreeSurface  105.586 ms  108.312 ms  115.942 ms  144.733 ms  696.24 KiB  2889    10
  12 │ CPU            RectilinearGrid        PCGImplicitFreeSurface     43.789 ms   44.243 ms   50.401 ms   66.237 ms   2.84 MiB    3605    10
  13 │ GPU            ImmersedLatGrid        KrylovImplicitFreeSurface  328.343 ms  355.066 ms  356.494 ms  390.588 ms  131.67 MiB  31468   10
  14 │ GPU            ImmersedLatGrid        MatrixImplicitFreeSurface  195.230 ms  197.134 ms  203.265 ms  237.329 ms  1.99 MiB    3065    10
  15 │ GPU            ImmersedLatGrid        PCGImplicitFreeSurface     420.760 ms  434.179 ms  435.282 ms  447.561 ms  131.91 MiB  34366   10
  16 │ GPU            ImmersedRecGrid        KrylovImplicitFreeSurface  22.045 ms   22.496 ms   22.971 ms   26.660 ms   4.16 MiB    38697   10
  17 │ GPU            ImmersedRecGrid        MatrixImplicitFreeSurface  13.435 ms   13.517 ms   13.617 ms   14.497 ms   1.88 MiB    20020   10
  18 │ GPU            ImmersedRecGrid        PCGImplicitFreeSurface     30.590 ms   31.448 ms   31.409 ms   32.398 ms   6.48 MiB    47535   10
  19 │ GPU            LatitudeLongitudeGrid  KrylovImplicitFreeSurface  90.268 ms   91.438 ms   91.393 ms   92.628 ms   9.77 MiB    148681  10
  20 │ GPU            LatitudeLongitudeGrid  MatrixImplicitFreeSurface  34.286 ms   34.673 ms   34.973 ms   36.712 ms   2.37 MiB    55856   10
  21 │ GPU            LatitudeLongitudeGrid  PCGImplicitFreeSurface     116.274 ms  118.882 ms  125.957 ms  171.908 ms  15.89 MiB   183747  10
  22 │ GPU            RectilinearGrid        KrylovImplicitFreeSurface  21.110 ms   21.435 ms   21.443 ms   21.907 ms   2.65 MiB    37871   10
  23 │ GPU            RectilinearGrid        MatrixImplicitFreeSurface  12.821 ms   14.766 ms   14.969 ms   18.419 ms   1.15 MiB    19494   10
  24 │ GPU            RectilinearGrid        PCGImplicitFreeSurface     4.281 ms    4.358 ms    4.457 ms    5.354 ms    1.04 MiB    7022    10

It seems to be a little bit better than the PCG solver (except for the rectilinear grid which cannot be compared because the PCG solver is basically just the FFT solver)

@amontoison
Copy link
Collaborator

amontoison commented Apr 15, 2025

@amontoison I think I need your help here. With the Krylov solver, the benchmarks error with

[2025/04/15 05:05:38.140] INFO  Benchmarking 1/8: (GPU, :RectilinearGrid, :KrylovImplicitFreeSurface)...
ERROR: LoadError: DomainError with -0.0506264973347019:
sqrt was called with a negative real argument but will only return a complex result if called with a complex argument. Try sqrt(Complex(x)).
Stacktrace:
  [1] throw_complex_domainerror(f::Symbol, x::Float64)
    @ Base.Math ./math.jl:33
  [2] sqrt(x::Float64)
    @ Base.Math ./math.jl:686
  [3] cg!(solver::Krylov.CgSolver{…}, A::Oceananigans.Solvers.KrylovOperator{…}, b::Oceananigans.Solvers.KrylovField{…}; M::Oceananigans.Solvers.KrylovPreconditioner{…}, ldiv::Bool, radius::Float64, linesearch::Bool, atol::Float64, rtol::Float64, itmax::Int64, timemax::Float64, verbose::Int64, history::Bool, callback::Krylov.var"#837#845", iostream::Core.CoreSTDOUT)
    @ Krylov ~/.julia/packages/Krylov/wgV9p/src/cg.jl:146
  [4] cg!
    @ ~/.julia/packages/Krylov/wgV9p/src/cg.jl:108 [inlined]
  [5] solve!
    @ ~/.julia/packages/Krylov/wgV9p/src/krylov_solve.jl:159 [inlined]
  [6] solve!(::Field{…}, ::KrylovSolver{…}, ::Field{…}, ::Field{…}, ::Vararg{…}; kwargs::@Kwargs{})
    @ Oceananigans.Solvers ~/development/Oceananigans.jl/src/Solvers/krylov_solver.jl:150
  [7] solve!
    @ ~/development/Oceananigans.jl/src/Solvers/krylov_solver.jl:147 [inlined]
  [8] solve!
    @ ~/development/Oceananigans.jl/src/Models/HydrostaticFreeSurfaceModels/pcg_implicit_free_surface_solver.jl:95 [inlined]
  [9] step_free_surface!(free_surface::ImplicitFreeSurface{…}, model::HydrostaticFreeSurfaceModel{…}, timestepper::Oceananigans.TimeSteppers.QuasiAdamsBashforth2TimeStepper{…}, Δt::Float64)
    @ Oceananigans.Models.HydrostaticFreeSurfaceModels ~/development/Oceananigans.jl/src/Models/HydrostaticFreeSurfaceModels/implicit_free_surface.jl:145
 [10] ab2_step!
    @ ~/development/Oceananigans.jl/src/Models/HydrostaticFreeSurfaceModels/hydrostatic_free_surface_ab2_step.jl:24 [inlined]
 [11] time_step!(model::HydrostaticFreeSurfaceModel{…}, Δt::Float64; callbacks::Vector{…}, euler::Bool)
    @ Oceananigans.TimeSteppers ~/development/Oceananigans.jl/src/TimeSteppers/quasi_adams_bashforth_2.jl:99
 [12] time_step!(model::HydrostaticFreeSurfaceModel{…}, Δt::Float64)
    @ Oceananigans.TimeSteppers ~/development/Oceananigans.jl/src/TimeSteppers/quasi_adams_bashforth_2.jl:74
 [13] benchmark_hydrostatic_model(Arch::Type, grid_type::Symbol, free_surface_type::Symbol)
    @ Main ~/development/Oceananigans.jl/benchmark/benchmark_hydrostatic_model.jl:70
 [14] run_benchmarks(benchmark_fun::Function; kwargs::@Kwargs{})
    @ Benchmarks ~/development/Oceananigans.jl/benchmark/src/Benchmarks.jl:57
 [15] top-level scope
    @ ~/development/Oceananigans.jl/benchmark/benchmark_hydrostatic_model.jl:94
 [16] include(fname::String)
    @ Base.MainInclude ./client.jl:494
 [17] top-level scope
    @ REPL[1]:1
in expression starting at /home/ssilvest/development/Oceananigans.jl/benchmark/benchmark_hydrostatic_model.jl:94
Some type information was truncated. Use `show(err)` to see complete types.

Do I need to change something with respect to the custom PCG to use the Krylov solver?

@simone-silvestri It means that your preconditioner is not SPD.
It is maybe SND but it still means that you do something wrong with the preconditioner.
I fixed a few issues about that in #4041 (FFTBasedPreconditioner and DiagonallyDominantPreconditioner).

@amontoison
Copy link
Collaborator

What is the preconditioner used when you have the error?
We probably need to multiply by -1 the output.
It was the issue with the FFTBasedPreconditioner.

@simone-silvestri
Copy link
Collaborator Author

both the FFTPreconditioner and the DiagonallyDominantPreconditioner lead to that error apparently. I can take a look at the preconditioner but it should be SPD in theory

@amontoison
Copy link
Collaborator

amontoison commented Apr 15, 2025

You can check if you have an incorrect behavior in PCG too by checking if \rho = dot(r, z) is negative.
It should never be the case because it is equivalent to r' P r where P is the positive definite preconditioner.

@glwagner
Copy link
Member

@amontoison can you potentially make a change to Krylov to throw a more informative error?

@amontoison
Copy link
Collaborator

amontoison commented Apr 15, 2025

Yes, I can add a check if \gamma is negative after this:
https://github.com/JuliaSmoothOptimizers/Krylov.jl/blob/main/src/cg.jl#L145

And returns a nice error to say that the preconditioner is not SPD.

I will do that tonight.
Update: Done in JuliaSmoothOptimizers/Krylov.jl#991

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants