Skip to content

Barnes-Hut evaluation using ImplicitBVH and solution using Krylov#18

Merged
weymouth merged 68 commits intomainfrom
BarnesHut
Jan 20, 2026
Merged

Barnes-Hut evaluation using ImplicitBVH and solution using Krylov#18
weymouth merged 68 commits intomainfrom
BarnesHut

Conversation

@weymouth
Copy link
Owner

Evaluating the potential (or it's derivatives) at a target is O(N), leading to O(M*N) scaling for M targets. I'm setting up a Barnes-Hut method to bring that down to O(log N) per target.

ImplicitBVH creates a bounded volume hierarchy which we can apply to the panels. Then we accumulate the panel properties we need (dA, n, x, and q strength) onto the nodes and use those when well outside a volume. I'm also using the same AcceleratedKernel approach for multi-threading the targets as used in ImplicitBVH.

I've written a matrix-free linear operator and used it to solve the system with Krylov.gmres. Accuracy and speed are excellent compared to the full solve for basically any number of panels.

This is such a huge improvement that I should probably get rid of the convenience functions like influence(panels) to make sure the students don't ever use them.

Evaluating the potential (or it's derivatives) at a target is O(N), leading to O(M*N) scaling for M targets. I'm setting up a Barnes-Hut method to bring that down to O(log N).

ImplicitBVH creates a bounded volume hierarchy which we can apply to the panels. Then we accumulate the panel properties we need (dA, n, x, and q strength) onto the nodes and use those when well outside a volume.

Currently, I have the generic acculumate! and evaluate functions working. I need to add the panel specific logic and tests. Then I want to switch to an matrix free solver to avoid the O(N^2) construction cost entirely.
Working nicely on 8 panel sphere.
bvh.skips is already the cumsum. now its fixed.
Super simple struct to hold the panels & nodes (augmented with q vectors), bvh, and stack.

Use set_q! and uₙ! functions to define the matrix-free multiplication. Use gmres to solve with O(N) memory and O(N log N) time. Error compared to direct solve is 0.2% with d²=4.
I'm trying out AcceleratedKernels since that's what is used in ImplicitBVH and it's seems pretty good. I can't preallocate the search stack, but I'm still getting >2x speed-up on my old 4 virtual thread laptop.
I've also generalized the BarnesHut with ϕ,kwargs inputs, and written specialzed functions for Φ and steady_force.
AK reductions need much more specification than  ThreadsX. Maybe that implies they will be faster?
@weymouth
Copy link
Owner Author

weymouth commented Dec 30, 2025

  • test with reflections
  • work out distance logic for BH and panels (maybe get rid of the panel version?)
  • create BH added mass function
  • clear out or consolidate old functions

Added PanelSystem <: AbstractPanelSystem to give a uniform interface for solving and measuring. Now we have Φ(x,sys) which specializes on the system type, and nearly everything else is uniform.

I renamed the matrix-free solver to GMRESsolver!(sys) and it works on either type. It's in a new file gmres.jl with it's own tests.

I added a nice surface_integral function, which is called for added_mass and steady_force.

Finally, I split the BarnesHutCore functions into their own file which (potentially) could be moved into ImplicitBVH in the future.

kelvin and reflected potentials are untested with BarnesHut.
I enabled easy symmetry boundary condition enforcement by adding a list of coordinate mirrors factors to the panel system. When \Phi is called, it loops through all the mirrors, applying reflections to the _target_. That way, the Barnes-Hut tree is unchanged.

I also added directsolve! to balance out GMRESsolve!. The mirrors are applied to the potentials to form the correct matrix. (Like before, but easier.)
@weymouth
Copy link
Owner Author

weymouth commented Jan 1, 2026

The code is working nicely for flows without a free surface.

julia> S(θ,φ) = SA[sin(θ)*cos(φ), sin(θ)*sin(φ), cos(θ)]
S (generic function with 1 method)

julia> panels = panelize(S, 0, π/2, 0, π, hᵤ=1/64, N_max=Inf) # quite a few panels
Table with 7 columns and 12934 rows:
     x                     n                     dA          x₄                    w₄                    xᵤᵥ                   
   ┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 1 │ [0.00933365, 0.0038  [0.00933374, 0.0038  9.49835e-5  SVector{3, Float64}  [1.00364e-5 1.00364  SVector{3, Float64}  
 2 │ [0.00386613, 0.0093  [0.00386616, 0.0093  9.49835e-5  SVector{3, Float64}  [1.00364e-5 1.00364  SVector{3, Float64}  
 3 │ [-0.00386613, 0.009  [-0.00386616, 0.009  9.49835e-5  SVector{3, Float64}  [1.00364e-5 1.00364  SVector{3, Float64}  
 4 │ [-0.00933365, 0.003  [-0.00933374, 0.003  9.49835e-5  SVector{3, Float64}  [1.00364e-5 1.00364  SVector{3, Float64}  
                                                                                                            

julia> @time extrema(panel_cp(directsolve!(PanelSystem(panels,sym_axes=(2,3)),verbose=false)))
181.588483 seconds (299.26 k allocations: 2.517 GiB, 0.52% gc time, 0.01% compilation time)
(-1.2519952210157759, 0.9997268849291908)

julia> @time extrema(panel_cp(GMRESsolve!(PanelSystem(panels,sym_axes=(2,3)),verbose=false)))
 84.243938 seconds (918.91 k allocations: 82.896 MiB, 0.05% compilation time)
(-1.2519970778461436, 0.9997268847578672)

julia> @time extrema(panel_cp(GMRESsolve!(BarnesHut(panels,sym_axes=(2,3),d²=32),verbose=false)))
 14.113298 seconds (595.82 k allocations: 56.185 MiB)
(-1.2478511862965824, 0.999732833082339)

julia> @time extrema(panel_cp(GMRESsolve!(BarnesHut(panels,sym_axes=(2,3),d²=16),verbose=false)))
  8.959613 seconds (595.82 k allocations: 56.185 MiB)
(-1.2433446694645616, 0.999739893335223)

julia> @time extrema(panel_cp(GMRESsolve!(BarnesHut(panels,sym_axes=(2,3),d²=8),verbose=false)))
  5.331441 seconds (595.82 k allocations: 56.185 MiB)
(-1.2358450518391968, 0.9997187398080566)

julia> @time extrema(panel_cp(GMRESsolve!(BarnesHut(panels,sym_axes=(2,3),d²=4),verbose=false)))
  4.126204 seconds (699.34 k allocations: 64.974 MiB)
(-1.2273541396547127, 0.9998140097913244)

julia> @time extrema(panel_cp(GMRESsolve!(BarnesHut(panels,sym_axes=(2,3),d²=2),verbose=false)))
  2.893607 seconds (699.34 k allocations: 65.146 MiB, 0.59% gc time)
(-1.2226066498335322, 0.9990576811980934)

So we get a nice tradeoff of speed-up and error with Barnes-Hut d². Compare this to the main branch:

julia> ∫G₂₃ = reflect(reflect(∫G,3),2)  # 2² calls

julia> @time q = influence(panels,ϕ=∫G₂₃)\components(panels.n,1)
157.456592 seconds (1.51 G allocations: 44.998 GiB, 8.97% gc time, 1.54% compilation time)

julia> @time steady_force(q,panels,ϕ=∫G₂₃)
 91.609284 seconds (2.21 G allocations: 161.880 GiB, 31.28% gc time, 0.00% compilation time)

Look at those ALLOCATIONS! Good lord. 50+162 GiB -> 2.5GiB for the direct solve!! And Matrix-free gets that down to 56-83 MiB! So 1000x smaller memory footprint.

Speed-wise, it's 4 min - > 3 min for direct, and 3-14 sec for GMRES+Barnet-Hut depending on d²! So 20-90x speed up.

Identity crisis. Barnes-Hut is working, and Kelvin green's functions aren't. Time to clean house.
Added free surface panel logic but gmressolver! isn't working - even for Fn=0.

The new constructor is `PanelSystem(body;freesurf=nothing)`. PanelSystem vcats the two Tables and generates views sys.body, sys.freesurf for each part. I replaced total_area with body_area since that's the area we tend to want. I also set up direct solve to only operate on sys.body (ignoring sys.freesurf completely) to avoid forming a mixed BC matrix. That's working.

I also redefined BarnesHut as a PanelSystem wrapper, avoiding double implementation of all this logic.
and a few docs adjustments
I had to use an upwinded FD version of Φₓₓ to get a qualitatively reasonable freesurf result. It's extremely far from ideal (it takes O(100) gmres iterations to converge, I've hard coded the spacing h, the wave seems very diffused) but it's a start.
Porting over the PanelSystem changes from the BarnesHut PR.
Optmized as much as I can. Here's the N=12.9K case:
julia> @Btime extrema(cₚ(gmressolve!(PanelSystem($panels,sym_axes=(2,3)),verbose=false)))
  63.136 s (214 allocations: 2.68 MiB)
(-1.2519970778461476, 0.9997268847578672)

julia> @Btime extrema(cₚ(directsolve!(PanelSystem($panels,sym_axes=(2,3)),verbose=false)))
  143.595 s (82993 allocations: 2.50 GiB)
(-1.2519952210157665, 0.9997268849291909)

And here's N=805:
julia> @Btime extrema(cₚ(directsolve!(PanelSystem($panels,sym_axes=(2,3)),verbose=false)))
  150.640 ms (69 allocations: 9.93 MiB)
(-1.2575391174014392, 0.9955853298424956)

julia> @Btime extrema(cₚ(gmressolve!(PanelSystem($panels,sym_axes=(2,3)),verbose=false)))
  215.061 ms (214 allocations: 186.08 KiB)
(-1.257545978734143, 0.9955853089025847)
This lets me type dispatch ∫G_kernel for different panel types.
Added a nice batch of test for stl panels - so this is certainly working properly now. Also updated a few places to adapt to variable Floating point type.

Fixed addedmass to be dimensionless by including an approximate body_vol() calculator using the divergence theorem. It's within a few percent for the D20.
The panels = [body;freesurf] idea was fundamentally terrible. Unleashing tons of free surface panel specific code all over the package. Instead I've walked back to only have a BodyPanelSystem, and simplifying panel_method and solvers to only operate on AbstractPanelSystems for now.

BarnesHut<:AbstractPanelSystem is also wrong. BarnesHut<:AbstractVector as a Table wrapper is much much better.
Also renamed the BarnesHutCore functions and variables
The submerged spheroid test is working, but the wigley and prism tests aren't. I'm missing the freesurface contour & the thinship facilities, so that is probably the issue.
@weymouth
Copy link
Owner Author

weymouth commented Jan 18, 2026

Benchmarks. First the big quarter sphere has around the same benchmark as above:

julia> using NeumannKelvin,BenchmarkTools

julia> S(θ,φ) = SA[sin(θ)*cos(φ), sin(θ)*sin(φ), cos(θ)]
S (generic function with 1 method)

julia> panels = panelize(S, 0, π/2, 0, π, hᵤ=1/64, N_max=Inf) # quite a few panels
Table with 8 columns and 12934 rows:
      x                     n                     dA           xg                    wg                    verts                 nverts                kernel
    ┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 1  │ [0.00933365, 0.0038  [0.00933374, 0.0038  9.49835e-5   SVector{3, Float64}  [1.00364e-5 1.00364  SVector{3, Float64}  SVector{3, Float64}  QuadKernel()
 2  │ [0.00386613, 0.0093  [0.00386616, 0.0093  9.49835e-5   SVector{3, Float64}  [1.00364e-5 1.00364  SVector{3, Float64}  SVector{3, Float64}  QuadKernel()
 3  │ [-0.00386613, 0.009  [-0.00386616, 0.009  9.49835e-5   SVector{3, Float64}  [1.00364e-5 1.00364  SVector{3, Float64}  SVector{3, Float64}  QuadKernel()
                                                                                                                                       

julia> @btime extrema(cₚ(gmressolve!(BodyPanelSystem($panels,sym_axes=(2,3),wrap=PanelTree),verbose=false)))
  2.835 s (78309 allocations: 13.11 MiB)
(-1.2283857777503266, 0.9998190404893949)

Free surface benchmarks! First the reflection body

julia> function spheroid(h;L=1,Z=-1/8,r=1/12,AR=1/2r,kwargs...)
           S(θ₁,θ₂) = SA[0.5L*cos(θ₁),-r*sin(θ₂)*sin(θ₁),r*cos(θ₂)*sin(θ₁)+Z]
           panelize(S,0,π,0,π,hᵤ=h*√AR,hᵥ=h/√AR;kwargs...)
       end
spheroid (generic function with 1 method)

julia> function halfplane(L::T,ℓ;hᵤ=0.3ℓ,hᵥ=hᵤ,s=L/2+2π*ℓ) where T
           measure.((u,v)->SA[u,v,0],s:-hᵤ:-2s,(hᵥ/2:hᵥ:4s/3)',hᵤ,hᵥ,flip=true;T)
       end
halfplane (generic function with 1 method)

julia> body = spheroid(1/60); PanelTree(body)
PanelTree(766 panels, 11 levels, θ²: 4)
  bounds: (-0.5f0, -0.083266795f0, -0.2082668f0) to (0.5f0, -0.0f0, -0.04173321f0)
  panel type: NeumannKelvin.QuadKernel

julia> @btime extrema(cₚ(gmressolve!(BodyPanelSystem($body,sym_axes=(2,3),wrap=PanelTree),verbose=false)))
  179.872 ms (5210 allocations: 846.55 KiB)
(-0.18214363989174598, 0.7029211352868603)

julia> @btime extrema(cₚ(directsolve!(BodyPanelSystem($body,sym_axes=(2,3)),verbose=false)))
  237.954 ms (100 allocations: 9.04 MiB)
(-0.17960233524532265, 0.7059135143873083)

So the base time is 0.18s and skipping the PanelTree and using a direct solve of only 30% slower at this size. Now we can compare with the Neumann-Kelvin and the free surface panels methods.

julia>=1/2π; freesurf = halfplane(1.,ℓ); PanelTree(Table(freesurf))
PanelTree(3990 panels, 13 levels, θ²: 4)
  bounds: (-3.0120425f0, 0.0f0, 0.0f0) to (1.5238732f0, 2.0053523f0, 0.0f0)
  panel type: NeumannKelvin.QuadKernel

julia> @btime extrema(cₚ(gmressolve!(FSPanelSystem($body,$freesurf;sym_axes=2,ℓ),verbose=false)))
  23.317 s (43082 allocations: 12.91 MiB)
(-0.5750295082213954, 0.870222940023746)

julia> @btime extrema(cₚ(gmressolve!(FSPanelSystem($body,$freesurf;sym_axes=2,ℓ,θ²=16),verbose=false)))
  50.816 s (42688 allocations: 12.75 MiB)
(-0.5573270678171744, 0.8721034727299559)

julia> @btime extrema(cₚ(directsolve!(NKPanelSystem($body,sym_axes=2,ℓ),verbose=false)))
  7.666 s (96 allocations: 9.04 MiB)
(-0.595137720851715, 0.8843762358082723)

So the free surface panel solve is 130-250 times slower than using reflection, depending on θ²=16. But considering that we have 6x the number of panels, this means the solver is only 20+ times slower, which makes perfect sense since we need O(100) iterations instead of 4-8.

The NK panels are much faster, only 43 times slower than reflection, and only 32 time slower than the reflection direct solve, which is the more direct comparison. Green's functions timings are similar

julia> sys = gmressolve!(BodyPanelSystem(body,sym_axes=(2,3)),verbose=false);

julia> @btime ∫G($body.x[1],$sys.body[end])
  41.554 ns (0 allocations: 0 bytes)
-0.000103798557938834

julia> @btime kelvin($body.x[1],$sys.body.x[end])
  312.350 ns (0 allocations: 0 bytes)
-1.0231497267700616

julia> @btime kelvin($body.x[1],$sys.body.x[2])
  1.375 μs (0 allocations: 0 bytes)
-3.431773209643262

which is only 8-33 times slower. I'm really happy with those numbers.

@weymouth weymouth merged commit c68eb04 into main Jan 20, 2026
2 checks passed
weymouth added a commit that referenced this pull request Jan 21, 2026
Giant update integrating all the changes from PR #18. Free surface source panels, tree summation, gmres solver, new multi-threading library, plotting functions, zero allocation potential evaluations and structs to keep everything organized and easy to use.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant