Conversation
Evaluating the potential (or it's derivatives) at a target is O(N), leading to O(M*N) scaling for M targets. I'm setting up a Barnes-Hut method to bring that down to O(log N). ImplicitBVH creates a bounded volume hierarchy which we can apply to the panels. Then we accumulate the panel properties we need (dA, n, x, and q strength) onto the nodes and use those when well outside a volume. Currently, I have the generic acculumate! and evaluate functions working. I need to add the panel specific logic and tests. Then I want to switch to an matrix free solver to avoid the O(N^2) construction cost entirely.
Working nicely on 8 panel sphere.
bvh.skips is already the cumsum. now its fixed.
Looking great.
Super simple struct to hold the panels & nodes (augmented with q vectors), bvh, and stack. Use set_q! and uₙ! functions to define the matrix-free multiplication. Use gmres to solve with O(N) memory and O(N log N) time. Error compared to direct solve is 0.2% with d²=4.
I'm trying out AcceleratedKernels since that's what is used in ImplicitBVH and it's seems pretty good. I can't preallocate the search stack, but I'm still getting >2x speed-up on my old 4 virtual thread laptop. I've also generalized the BarnesHut with ϕ,kwargs inputs, and written specialzed functions for Φ and steady_force.
AK reductions need much more specification than ThreadsX. Maybe that implies they will be faster?
|
Added PanelSystem <: AbstractPanelSystem to give a uniform interface for solving and measuring. Now we have Φ(x,sys) which specializes on the system type, and nearly everything else is uniform. I renamed the matrix-free solver to GMRESsolver!(sys) and it works on either type. It's in a new file gmres.jl with it's own tests. I added a nice surface_integral function, which is called for added_mass and steady_force. Finally, I split the BarnesHutCore functions into their own file which (potentially) could be moved into ImplicitBVH in the future. kelvin and reflected potentials are untested with BarnesHut.
I enabled easy symmetry boundary condition enforcement by adding a list of coordinate mirrors factors to the panel system. When \Phi is called, it loops through all the mirrors, applying reflections to the _target_. That way, the Barnes-Hut tree is unchanged. I also added directsolve! to balance out GMRESsolve!. The mirrors are applied to the potentials to form the correct matrix. (Like before, but easier.)
|
The code is working nicely for flows without a free surface. julia> S(θ,φ) = SA[sin(θ)*cos(φ), sin(θ)*sin(φ), cos(θ)]
S (generic function with 1 method)
julia> panels = panelize(S, 0, π/2, 0, π, hᵤ=1/64, N_max=Inf) # quite a few panels
Table with 7 columns and 12934 rows:
x n dA x₄ w₄ xᵤᵥ ⋯
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ [0.00933365, 0.0038… [0.00933374, 0.0038… 9.49835e-5 SVector{3, Float64}… [1.00364e-5 1.00364… SVector{3, Float64}… ⋯
2 │ [0.00386613, 0.0093… [0.00386616, 0.0093… 9.49835e-5 SVector{3, Float64}… [1.00364e-5 1.00364… SVector{3, Float64}… ⋯
3 │ [-0.00386613, 0.009… [-0.00386616, 0.009… 9.49835e-5 SVector{3, Float64}… [1.00364e-5 1.00364… SVector{3, Float64}… ⋯
4 │ [-0.00933365, 0.003… [-0.00933374, 0.003… 9.49835e-5 SVector{3, Float64}… [1.00364e-5 1.00364… SVector{3, Float64}… ⋯
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱
julia> @time extrema(panel_cp(directsolve!(PanelSystem(panels,sym_axes=(2,3)),verbose=false)))
181.588483 seconds (299.26 k allocations: 2.517 GiB, 0.52% gc time, 0.01% compilation time)
(-1.2519952210157759, 0.9997268849291908)
julia> @time extrema(panel_cp(GMRESsolve!(PanelSystem(panels,sym_axes=(2,3)),verbose=false)))
84.243938 seconds (918.91 k allocations: 82.896 MiB, 0.05% compilation time)
(-1.2519970778461436, 0.9997268847578672)
julia> @time extrema(panel_cp(GMRESsolve!(BarnesHut(panels,sym_axes=(2,3),d²=32),verbose=false)))
14.113298 seconds (595.82 k allocations: 56.185 MiB)
(-1.2478511862965824, 0.999732833082339)
julia> @time extrema(panel_cp(GMRESsolve!(BarnesHut(panels,sym_axes=(2,3),d²=16),verbose=false)))
8.959613 seconds (595.82 k allocations: 56.185 MiB)
(-1.2433446694645616, 0.999739893335223)
julia> @time extrema(panel_cp(GMRESsolve!(BarnesHut(panels,sym_axes=(2,3),d²=8),verbose=false)))
5.331441 seconds (595.82 k allocations: 56.185 MiB)
(-1.2358450518391968, 0.9997187398080566)
julia> @time extrema(panel_cp(GMRESsolve!(BarnesHut(panels,sym_axes=(2,3),d²=4),verbose=false)))
4.126204 seconds (699.34 k allocations: 64.974 MiB)
(-1.2273541396547127, 0.9998140097913244)
julia> @time extrema(panel_cp(GMRESsolve!(BarnesHut(panels,sym_axes=(2,3),d²=2),verbose=false)))
2.893607 seconds (699.34 k allocations: 65.146 MiB, 0.59% gc time)
(-1.2226066498335322, 0.9990576811980934)So we get a nice tradeoff of speed-up and error with Barnes-Hut d². Compare this to the main branch: julia> ∫G₂₃ = reflect(reflect(∫G,3),2) # 2² calls
julia> @time q = influence(panels,ϕ=∫G₂₃)\components(panels.n,1)
157.456592 seconds (1.51 G allocations: 44.998 GiB, 8.97% gc time, 1.54% compilation time)
julia> @time steady_force(q,panels,ϕ=∫G₂₃)
91.609284 seconds (2.21 G allocations: 161.880 GiB, 31.28% gc time, 0.00% compilation time)Look at those ALLOCATIONS! Good lord. 50+162 GiB -> 2.5GiB for the direct solve!! And Matrix-free gets that down to 56-83 MiB! So 1000x smaller memory footprint. Speed-wise, it's 4 min - > 3 min for direct, and 3-14 sec for GMRES+Barnet-Hut depending on d²! So 20-90x speed up. |
Identity crisis. Barnes-Hut is working, and Kelvin green's functions aren't. Time to clean house.
Added free surface panel logic but gmressolver! isn't working - even for Fn=0. The new constructor is `PanelSystem(body;freesurf=nothing)`. PanelSystem vcats the two Tables and generates views sys.body, sys.freesurf for each part. I replaced total_area with body_area since that's the area we tend to want. I also set up direct solve to only operate on sys.body (ignoring sys.freesurf completely) to avoid forming a mixed BC matrix. That's working. I also redefined BarnesHut as a PanelSystem wrapper, avoiding double implementation of all this logic.
and a few docs adjustments
I had to use an upwinded FD version of Φₓₓ to get a qualitatively reasonable freesurf result. It's extremely far from ideal (it takes O(100) gmres iterations to converge, I've hard coded the spacing h, the wave seems very diffused) but it's a start.
Porting over the PanelSystem changes from the BarnesHut PR.
Optmized as much as I can. Here's the N=12.9K case: julia> @Btime extrema(cₚ(gmressolve!(PanelSystem($panels,sym_axes=(2,3)),verbose=false))) 63.136 s (214 allocations: 2.68 MiB) (-1.2519970778461476, 0.9997268847578672) julia> @Btime extrema(cₚ(directsolve!(PanelSystem($panels,sym_axes=(2,3)),verbose=false))) 143.595 s (82993 allocations: 2.50 GiB) (-1.2519952210157665, 0.9997268849291909) And here's N=805: julia> @Btime extrema(cₚ(directsolve!(PanelSystem($panels,sym_axes=(2,3)),verbose=false))) 150.640 ms (69 allocations: 9.93 MiB) (-1.2575391174014392, 0.9955853298424956) julia> @Btime extrema(cₚ(gmressolve!(PanelSystem($panels,sym_axes=(2,3)),verbose=false))) 215.061 ms (214 allocations: 186.08 KiB) (-1.257545978734143, 0.9955853089025847)
… aren't available.
This lets me type dispatch ∫G_kernel for different panel types.
Added a nice batch of test for stl panels - so this is certainly working properly now. Also updated a few places to adapt to variable Floating point type. Fixed addedmass to be dimensionless by including an approximate body_vol() calculator using the divergence theorem. It's within a few percent for the D20.
The panels = [body;freesurf] idea was fundamentally terrible. Unleashing tons of free surface panel specific code all over the package. Instead I've walked back to only have a BodyPanelSystem, and simplifying panel_method and solvers to only operate on AbstractPanelSystems for now. BarnesHut<:AbstractPanelSystem is also wrong. BarnesHut<:AbstractVector as a Table wrapper is much much better.
Also renamed the BarnesHutCore functions and variables
The submerged spheroid test is working, but the wigley and prism tests aren't. I'm missing the freesurface contour & the thinship facilities, so that is probably the issue.
|
Benchmarks. First the big quarter sphere has around the same benchmark as above: julia> using NeumannKelvin,BenchmarkTools
julia> S(θ,φ) = SA[sin(θ)*cos(φ), sin(θ)*sin(φ), cos(θ)]
S (generic function with 1 method)
julia> panels = panelize(S, 0, π/2, 0, π, hᵤ=1/64, N_max=Inf) # quite a few panels
Table with 8 columns and 12934 rows:
x n dA xg wg verts nverts kernel
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ [0.00933365, 0.0038… [0.00933374, 0.0038… 9.49835e-5 SVector{3, Float64}… [1.00364e-5 1.00364… SVector{3, Float64}… SVector{3, Float64}… QuadKernel()
2 │ [0.00386613, 0.0093… [0.00386616, 0.0093… 9.49835e-5 SVector{3, Float64}… [1.00364e-5 1.00364… SVector{3, Float64}… SVector{3, Float64}… QuadKernel()
3 │ [-0.00386613, 0.009… [-0.00386616, 0.009… 9.49835e-5 SVector{3, Float64}… [1.00364e-5 1.00364… SVector{3, Float64}… SVector{3, Float64}… QuadKernel()
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
julia> @btime extrema(cₚ(gmressolve!(BodyPanelSystem($panels,sym_axes=(2,3),wrap=PanelTree),verbose=false)))
2.835 s (78309 allocations: 13.11 MiB)
(-1.2283857777503266, 0.9998190404893949)Free surface benchmarks! First the reflection body julia> function spheroid(h;L=1,Z=-1/8,r=1/12,AR=1/2r,kwargs...)
S(θ₁,θ₂) = SA[0.5L*cos(θ₁),-r*sin(θ₂)*sin(θ₁),r*cos(θ₂)*sin(θ₁)+Z]
panelize(S,0,π,0,π,hᵤ=h*√AR,hᵥ=h/√AR;kwargs...)
end
spheroid (generic function with 1 method)
julia> function halfplane(L::T,ℓ;hᵤ=0.3ℓ,hᵥ=hᵤ,s=L/2+2π*ℓ) where T
measure.((u,v)->SA[u,v,0],s:-hᵤ:-2s,(hᵥ/2:hᵥ:4s/3)',hᵤ,hᵥ,flip=true;T)
end
halfplane (generic function with 1 method)
julia> body = spheroid(1/60); PanelTree(body)
PanelTree(766 panels, 11 levels, θ²: 4)
bounds: (-0.5f0, -0.083266795f0, -0.2082668f0) to (0.5f0, -0.0f0, -0.04173321f0)
panel type: NeumannKelvin.QuadKernel
julia> @btime extrema(cₚ(gmressolve!(BodyPanelSystem($body,sym_axes=(2,3),wrap=PanelTree),verbose=false)))
179.872 ms (5210 allocations: 846.55 KiB)
(-0.18214363989174598, 0.7029211352868603)
julia> @btime extrema(cₚ(directsolve!(BodyPanelSystem($body,sym_axes=(2,3)),verbose=false)))
237.954 ms (100 allocations: 9.04 MiB)
(-0.17960233524532265, 0.7059135143873083)So the base time is 0.18s and skipping the PanelTree and using a direct solve of only 30% slower at this size. Now we can compare with the Neumann-Kelvin and the free surface panels methods. julia> ℓ=1/2π; freesurf = halfplane(1.,ℓ); PanelTree(Table(freesurf))
PanelTree(3990 panels, 13 levels, θ²: 4)
bounds: (-3.0120425f0, 0.0f0, 0.0f0) to (1.5238732f0, 2.0053523f0, 0.0f0)
panel type: NeumannKelvin.QuadKernel
julia> @btime extrema(cₚ(gmressolve!(FSPanelSystem($body,$freesurf;sym_axes=2,ℓ),verbose=false)))
23.317 s (43082 allocations: 12.91 MiB)
(-0.5750295082213954, 0.870222940023746)
julia> @btime extrema(cₚ(gmressolve!(FSPanelSystem($body,$freesurf;sym_axes=2,ℓ,θ²=16),verbose=false)))
50.816 s (42688 allocations: 12.75 MiB)
(-0.5573270678171744, 0.8721034727299559)
julia> @btime extrema(cₚ(directsolve!(NKPanelSystem($body,sym_axes=2,ℓ),verbose=false)))
7.666 s (96 allocations: 9.04 MiB)
(-0.595137720851715, 0.8843762358082723)So the free surface panel solve is 130-250 times slower than using reflection, depending on θ²=16. But considering that we have 6x the number of panels, this means the solver is only 20+ times slower, which makes perfect sense since we need O(100) iterations instead of 4-8. The NK panels are much faster, only 43 times slower than reflection, and only 32 time slower than the reflection direct solve, which is the more direct comparison. Green's functions timings are similar julia> sys = gmressolve!(BodyPanelSystem(body,sym_axes=(2,3)),verbose=false);
julia> @btime ∫G($body.x[1],$sys.body[end])
41.554 ns (0 allocations: 0 bytes)
-0.000103798557938834
julia> @btime kelvin($body.x[1],$sys.body.x[end])
312.350 ns (0 allocations: 0 bytes)
-1.0231497267700616
julia> @btime kelvin($body.x[1],$sys.body.x[2])
1.375 μs (0 allocations: 0 bytes)
-3.431773209643262which is only 8-33 times slower. I'm really happy with those numbers. |
though it doesn't change any of the tests
Giant update integrating all the changes from PR #18. Free surface source panels, tree summation, gmres solver, new multi-threading library, plotting functions, zero allocation potential evaluations and structs to keep everything organized and easy to use.
Evaluating the potential (or it's derivatives) at a target is O(N), leading to O(M*N) scaling for M targets. I'm setting up a Barnes-Hut method to bring that down to O(log N) per target.
ImplicitBVH creates a bounded volume hierarchy which we can apply to the panels. Then we accumulate the panel properties we need (dA, n, x, and q strength) onto the nodes and use those when well outside a volume. I'm also using the same AcceleratedKernel approach for multi-threading the targets as used in ImplicitBVH.
I've written a matrix-free linear operator and used it to solve the system with Krylov.gmres. Accuracy and speed are excellent compared to the full solve for basically any number of panels.
This is such a huge improvement that I should probably get rid of the convenience functions like
influence(panels)to make sure the students don't ever use them.