-
Notifications
You must be signed in to change notification settings - Fork 235
Test Implicit free surface solvers performance #4387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@amontoison @simone-silvestri the objective is to use Krylove solvers, not Oceananigans PCG right? |
Also, is this work worthwhile if we don't want to use the matrix solver? The maintenance and testing has a cost, I am wondering whether it is worth it. |
Yes, the goal is to use the solvers in Krylov.jl, which should not be less efficient than your PCG here or the one in IterativeSolvers.jl. |
@simone-silvestri The |
I didn't realize the goal was to remove also the custom PCG. I am very in favor! Let me add the KrylovSolver to this PR and try the test again. I think we can remove a lot of code. |
@amontoison I think I need your help here. With the Krylov solver, the benchmarks error with [2025/04/15 05:05:38.140] INFO Benchmarking 1/8: (GPU, :RectilinearGrid, :KrylovImplicitFreeSurface)...
ERROR: LoadError: DomainError with -0.0506264973347019:
sqrt was called with a negative real argument but will only return a complex result if called with a complex argument. Try sqrt(Complex(x)).
Stacktrace:
[1] throw_complex_domainerror(f::Symbol, x::Float64)
@ Base.Math ./math.jl:33
[2] sqrt(x::Float64)
@ Base.Math ./math.jl:686
[3] cg!(solver::Krylov.CgSolver{…}, A::Oceananigans.Solvers.KrylovOperator{…}, b::Oceananigans.Solvers.KrylovField{…}; M::Oceananigans.Solvers.KrylovPreconditioner{…}, ldiv::Bool, radius::Float64, linesearch::Bool, atol::Float64, rtol::Float64, itmax::Int64, timemax::Float64, verbose::Int64, history::Bool, callback::Krylov.var"#837#845", iostream::Core.CoreSTDOUT)
@ Krylov ~/.julia/packages/Krylov/wgV9p/src/cg.jl:146
[4] cg!
@ ~/.julia/packages/Krylov/wgV9p/src/cg.jl:108 [inlined]
[5] solve!
@ ~/.julia/packages/Krylov/wgV9p/src/krylov_solve.jl:159 [inlined]
[6] solve!(::Field{…}, ::KrylovSolver{…}, ::Field{…}, ::Field{…}, ::Vararg{…}; kwargs::@Kwargs{})
@ Oceananigans.Solvers ~/development/Oceananigans.jl/src/Solvers/krylov_solver.jl:150
[7] solve!
@ ~/development/Oceananigans.jl/src/Solvers/krylov_solver.jl:147 [inlined]
[8] solve!
@ ~/development/Oceananigans.jl/src/Models/HydrostaticFreeSurfaceModels/pcg_implicit_free_surface_solver.jl:95 [inlined]
[9] step_free_surface!(free_surface::ImplicitFreeSurface{…}, model::HydrostaticFreeSurfaceModel{…}, timestepper::Oceananigans.TimeSteppers.QuasiAdamsBashforth2TimeStepper{…}, Δt::Float64)
@ Oceananigans.Models.HydrostaticFreeSurfaceModels ~/development/Oceananigans.jl/src/Models/HydrostaticFreeSurfaceModels/implicit_free_surface.jl:145
[10] ab2_step!
@ ~/development/Oceananigans.jl/src/Models/HydrostaticFreeSurfaceModels/hydrostatic_free_surface_ab2_step.jl:24 [inlined]
[11] time_step!(model::HydrostaticFreeSurfaceModel{…}, Δt::Float64; callbacks::Vector{…}, euler::Bool)
@ Oceananigans.TimeSteppers ~/development/Oceananigans.jl/src/TimeSteppers/quasi_adams_bashforth_2.jl:99
[12] time_step!(model::HydrostaticFreeSurfaceModel{…}, Δt::Float64)
@ Oceananigans.TimeSteppers ~/development/Oceananigans.jl/src/TimeSteppers/quasi_adams_bashforth_2.jl:74
[13] benchmark_hydrostatic_model(Arch::Type, grid_type::Symbol, free_surface_type::Symbol)
@ Main ~/development/Oceananigans.jl/benchmark/benchmark_hydrostatic_model.jl:70
[14] run_benchmarks(benchmark_fun::Function; kwargs::@Kwargs{…})
@ Benchmarks ~/development/Oceananigans.jl/benchmark/src/Benchmarks.jl:57
[15] top-level scope
@ ~/development/Oceananigans.jl/benchmark/benchmark_hydrostatic_model.jl:94
[16] include(fname::String)
@ Base.MainInclude ./client.jl:494
[17] top-level scope
@ REPL[1]:1
in expression starting at /home/ssilvest/development/Oceananigans.jl/benchmark/benchmark_hydrostatic_model.jl:94
Some type information was truncated. Use `show(err)` to see complete types. Do I need to change something with respect to the custom PCG to use the Krylov solver? |
It seems to be a problem only on the |
These are the benchmarks for the julia> df2
4×10 DataFrame
Row │ architectures grid_types free_surface_types min median mean max memory allocs samples
│ Any Any Any Any Any Any Any Any Any Any
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ CPU ImmersedLatGrid KrylovImplicitFreeSurface 298.926 ms 313.780 ms 313.586 ms 328.200 ms 131.67 MiB 31468 10
2 │ CPU LatitudeLongitudeGrid KrylovImplicitFreeSurface 243.694 ms 299.635 ms 294.065 ms 350.232 ms 128.77 MiB 31340 10
3 │ GPU ImmersedLatGrid KrylovImplicitFreeSurface 328.853 ms 361.488 ms 372.190 ms 437.529 ms 131.67 MiB 31470 10
4 │ GPU LatitudeLongitudeGrid KrylovImplicitFreeSurface 82.258 ms 87.060 ms 86.978 ms 90.977 ms 9.77 MiB 148681 10 @amontoison any suggestion for a good preconditioner to pass in? |
I think the preconditioner acts differently with a |
As a sidenote, we need to improve the interface of the |
Ok these are the benchmarks not using a preconditioner for the julia> df2
24×10 DataFrame
Row │ architectures grid_types free_surface_types min median mean max memory allocs samples
│ Any Any Any Any Any Any Any Any Any Any
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ CPU ImmersedLatGrid KrylovImplicitFreeSurface 334.365 ms 356.902 ms 355.691 ms 385.320 ms 131.67 MiB 31470 10
2 │ CPU ImmersedLatGrid MatrixImplicitFreeSurface 173.313 ms 187.766 ms 192.494 ms 223.839 ms 1.99 MiB 3065 10
3 │ CPU ImmersedLatGrid PCGImplicitFreeSurface 391.530 ms 404.177 ms 413.479 ms 455.854 ms 131.91 MiB 34366 10
4 │ CPU ImmersedRecGrid KrylovImplicitFreeSurface 163.314 ms 170.190 ms 188.254 ms 238.819 ms 61.91 MiB 16136 10
5 │ CPU ImmersedRecGrid MatrixImplicitFreeSurface 143.925 ms 144.347 ms 145.509 ms 153.327 ms 1.31 MiB 3017 10
6 │ CPU ImmersedRecGrid PCGImplicitFreeSurface 198.658 ms 207.317 ms 211.586 ms 229.593 ms 60.83 MiB 17507 10
7 │ CPU LatitudeLongitudeGrid KrylovImplicitFreeSurface 279.414 ms 294.586 ms 293.728 ms 310.094 ms 128.77 MiB 31340 10
8 │ CPU LatitudeLongitudeGrid MatrixImplicitFreeSurface 139.786 ms 160.065 ms 162.874 ms 192.073 ms 1.00 MiB 2937 10
9 │ CPU LatitudeLongitudeGrid PCGImplicitFreeSurface 287.915 ms 324.909 ms 327.460 ms 376.602 ms 130.59 MiB 35278 10
10 │ CPU RectilinearGrid KrylovImplicitFreeSurface 133.404 ms 135.980 ms 138.901 ms 159.827 ms 62.75 MiB 16458 10
11 │ CPU RectilinearGrid MatrixImplicitFreeSurface 105.586 ms 108.312 ms 115.942 ms 144.733 ms 696.24 KiB 2889 10
12 │ CPU RectilinearGrid PCGImplicitFreeSurface 43.789 ms 44.243 ms 50.401 ms 66.237 ms 2.84 MiB 3605 10
13 │ GPU ImmersedLatGrid KrylovImplicitFreeSurface 328.343 ms 355.066 ms 356.494 ms 390.588 ms 131.67 MiB 31468 10
14 │ GPU ImmersedLatGrid MatrixImplicitFreeSurface 195.230 ms 197.134 ms 203.265 ms 237.329 ms 1.99 MiB 3065 10
15 │ GPU ImmersedLatGrid PCGImplicitFreeSurface 420.760 ms 434.179 ms 435.282 ms 447.561 ms 131.91 MiB 34366 10
16 │ GPU ImmersedRecGrid KrylovImplicitFreeSurface 22.045 ms 22.496 ms 22.971 ms 26.660 ms 4.16 MiB 38697 10
17 │ GPU ImmersedRecGrid MatrixImplicitFreeSurface 13.435 ms 13.517 ms 13.617 ms 14.497 ms 1.88 MiB 20020 10
18 │ GPU ImmersedRecGrid PCGImplicitFreeSurface 30.590 ms 31.448 ms 31.409 ms 32.398 ms 6.48 MiB 47535 10
19 │ GPU LatitudeLongitudeGrid KrylovImplicitFreeSurface 90.268 ms 91.438 ms 91.393 ms 92.628 ms 9.77 MiB 148681 10
20 │ GPU LatitudeLongitudeGrid MatrixImplicitFreeSurface 34.286 ms 34.673 ms 34.973 ms 36.712 ms 2.37 MiB 55856 10
21 │ GPU LatitudeLongitudeGrid PCGImplicitFreeSurface 116.274 ms 118.882 ms 125.957 ms 171.908 ms 15.89 MiB 183747 10
22 │ GPU RectilinearGrid KrylovImplicitFreeSurface 21.110 ms 21.435 ms 21.443 ms 21.907 ms 2.65 MiB 37871 10
23 │ GPU RectilinearGrid MatrixImplicitFreeSurface 12.821 ms 14.766 ms 14.969 ms 18.419 ms 1.15 MiB 19494 10
24 │ GPU RectilinearGrid PCGImplicitFreeSurface 4.281 ms 4.358 ms 4.457 ms 5.354 ms 1.04 MiB 7022 10 It seems to be a little bit better than the PCG solver (except for the rectilinear grid which cannot be compared because the PCG solver is basically just the FFT solver) |
@simone-silvestri It means that your preconditioner is not SPD. |
What is the preconditioner used when you have the error? |
both the |
You can check if you have an incorrect behavior in PCG too by checking if |
@amontoison can you potentially make a change to Krylov to throw a more informative error? |
Yes, I can add a check if And returns a nice error to say that the preconditioner is not SPD. I will do that tonight. |
Oceananigans' custom PCG has a couple of performance problems. If the objective is eventually to remove the
Matrix
solver we need to make sure that the PCG is up to speed. This PR is propedeutic to #4386 and refurbishes some benchmark tests that we can use to improve the PCG performance. The result at the moment is that the PCG works well only on rectilinear non-immersed grids.Another PR will come before #4386 to bring the PCG up to speed
The output of the benchmark is here