Barnes-Hut evaluation using ImplicitBVH and solution using Krylov by weymouth · Pull Request #18 · weymouth/NeumannKelvin.jl

weymouth · 2025-12-29T23:47:24Z

Evaluating the potential (or it's derivatives) at a target is O(N), leading to O(M*N) scaling for M targets. I'm setting up a Barnes-Hut method to bring that down to O(log N) per target.

ImplicitBVH creates a bounded volume hierarchy which we can apply to the panels. Then we accumulate the panel properties we need (dA, n, x, and q strength) onto the nodes and use those when well outside a volume. I'm also using the same AcceleratedKernel approach for multi-threading the targets as used in ImplicitBVH.

I've written a matrix-free linear operator and used it to solve the system with Krylov.gmres. Accuracy and speed are excellent compared to the full solve for basically any number of panels.

This is such a huge improvement that I should probably get rid of the convenience functions like influence(panels) to make sure the students don't ever use them.

Evaluating the potential (or it's derivatives) at a target is O(N), leading to O(M*N) scaling for M targets. I'm setting up a Barnes-Hut method to bring that down to O(log N). ImplicitBVH creates a bounded volume hierarchy which we can apply to the panels. Then we accumulate the panel properties we need (dA, n, x, and q strength) onto the nodes and use those when well outside a volume. Currently, I have the generic acculumate! and evaluate functions working. I need to add the panel specific logic and tests. Then I want to switch to an matrix free solver to avoid the O(N^2) construction cost entirely.

Working nicely on 8 panel sphere.

bvh.skips is already the cumsum. now its fixed.

Looking great.

Super simple struct to hold the panels & nodes (augmented with q vectors), bvh, and stack. Use set_q! and uₙ! functions to define the matrix-free multiplication. Use gmres to solve with O(N) memory and O(N log N) time. Error compared to direct solve is 0.2% with d²=4.

I'm trying out AcceleratedKernels since that's what is used in ImplicitBVH and it's seems pretty good. I can't preallocate the search stack, but I'm still getting >2x speed-up on my old 4 virtual thread laptop. I've also generalized the BarnesHut with ϕ,kwargs inputs, and written specialzed functions for Φ and steady_force.

AK reductions need much more specification than ThreadsX. Maybe that implies they will be faster?

weymouth · 2025-12-30T07:55:33Z

test with reflections
work out distance logic for BH and panels (maybe get rid of the panel version?)
create BH added mass function
clear out or consolidate old functions

Added PanelSystem <: AbstractPanelSystem to give a uniform interface for solving and measuring. Now we have Φ(x,sys) which specializes on the system type, and nearly everything else is uniform. I renamed the matrix-free solver to GMRESsolver!(sys) and it works on either type. It's in a new file gmres.jl with it's own tests. I added a nice surface_integral function, which is called for added_mass and steady_force. Finally, I split the BarnesHutCore functions into their own file which (potentially) could be moved into ImplicitBVH in the future. kelvin and reflected potentials are untested with BarnesHut.

I enabled easy symmetry boundary condition enforcement by adding a list of coordinate mirrors factors to the panel system. When \Phi is called, it loops through all the mirrors, applying reflections to the _target_. That way, the Barnes-Hut tree is unchanged. I also added directsolve! to balance out GMRESsolve!. The mirrors are applied to the potentials to form the correct matrix. (Like before, but easier.)

weymouth · 2026-01-01T21:39:06Z

The code is working nicely for flows without a free surface.

julia> S(θ,φ) = SA[sin(θ)*cos(φ), sin(θ)*sin(φ), cos(θ)]
S (generic function with 1 method)

julia> panels = panelize(S, 0, π/2, 0, π, hᵤ=1/64, N_max=Inf) # quite a few panels
Table with 7 columns and 12934 rows:
     x                     n                     dA          x₄                    w₄                    xᵤᵥ                   ⋯
   ┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 1 │ [0.00933365, 0.0038…  [0.00933374, 0.0038…  9.49835e-5  SVector{3, Float64}…  [1.00364e-5 1.00364…  SVector{3, Float64}…  ⋯
 2 │ [0.00386613, 0.0093…  [0.00386616, 0.0093…  9.49835e-5  SVector{3, Float64}…  [1.00364e-5 1.00364…  SVector{3, Float64}…  ⋯
 3 │ [-0.00386613, 0.009…  [-0.00386616, 0.009…  9.49835e-5  SVector{3, Float64}…  [1.00364e-5 1.00364…  SVector{3, Float64}…  ⋯
 4 │ [-0.00933365, 0.003…  [-0.00933374, 0.003…  9.49835e-5  SVector{3, Float64}…  [1.00364e-5 1.00364…  SVector{3, Float64}…  ⋯
 ⋮ │          ⋮                     ⋮                ⋮                ⋮                     ⋮                     ⋮            ⋱

julia> @time extrema(panel_cp(directsolve!(PanelSystem(panels,sym_axes=(2,3)),verbose=false)))
181.588483 seconds (299.26 k allocations: 2.517 GiB, 0.52% gc time, 0.01% compilation time)
(-1.2519952210157759, 0.9997268849291908)

julia> @time extrema(panel_cp(GMRESsolve!(PanelSystem(panels,sym_axes=(2,3)),verbose=false)))
 84.243938 seconds (918.91 k allocations: 82.896 MiB, 0.05% compilation time)
(-1.2519970778461436, 0.9997268847578672)

julia> @time extrema(panel_cp(GMRESsolve!(BarnesHut(panels,sym_axes=(2,3),d²=32),verbose=false)))
 14.113298 seconds (595.82 k allocations: 56.185 MiB)
(-1.2478511862965824, 0.999732833082339)

julia> @time extrema(panel_cp(GMRESsolve!(BarnesHut(panels,sym_axes=(2,3),d²=16),verbose=false)))
  8.959613 seconds (595.82 k allocations: 56.185 MiB)
(-1.2433446694645616, 0.999739893335223)

julia> @time extrema(panel_cp(GMRESsolve!(BarnesHut(panels,sym_axes=(2,3),d²=8),verbose=false)))
  5.331441 seconds (595.82 k allocations: 56.185 MiB)
(-1.2358450518391968, 0.9997187398080566)

julia> @time extrema(panel_cp(GMRESsolve!(BarnesHut(panels,sym_axes=(2,3),d²=4),verbose=false)))
  4.126204 seconds (699.34 k allocations: 64.974 MiB)
(-1.2273541396547127, 0.9998140097913244)

julia> @time extrema(panel_cp(GMRESsolve!(BarnesHut(panels,sym_axes=(2,3),d²=2),verbose=false)))
  2.893607 seconds (699.34 k allocations: 65.146 MiB, 0.59% gc time)
(-1.2226066498335322, 0.9990576811980934)

So we get a nice tradeoff of speed-up and error with Barnes-Hut d². Compare this to the main branch:

julia> ∫G₂₃ = reflect(reflect(∫G,3),2)  # 2² calls

julia> @time q = influence(panels,ϕ=∫G₂₃)\components(panels.n,1)
157.456592 seconds (1.51 G allocations: 44.998 GiB, 8.97% gc time, 1.54% compilation time)

julia> @time steady_force(q,panels,ϕ=∫G₂₃)
 91.609284 seconds (2.21 G allocations: 161.880 GiB, 31.28% gc time, 0.00% compilation time)

Look at those ALLOCATIONS! Good lord. 50+162 GiB -> 2.5GiB for the direct solve!! And Matrix-free gets that down to 56-83 MiB! So 1000x smaller memory footprint.

Speed-wise, it's 4 min - > 3 min for direct, and 3-14 sec for GMRES+Barnet-Hut depending on d²! So 20-90x speed up.

Identity crisis. Barnes-Hut is working, and Kelvin green's functions aren't. Time to clean house.

Added free surface panel logic but gmressolver! isn't working - even for Fn=0. The new constructor is `PanelSystem(body;freesurf=nothing)`. PanelSystem vcats the two Tables and generates views sys.body, sys.freesurf for each part. I replaced total_area with body_area since that's the area we tend to want. I also set up direct solve to only operate on sys.body (ignoring sys.freesurf completely) to avoid forming a mixed BC matrix. That's working. I also redefined BarnesHut as a PanelSystem wrapper, avoiding double implementation of all this logic.

and a few docs adjustments

I had to use an upwinded FD version of Φₓₓ to get a qualitatively reasonable freesurf result. It's extremely far from ideal (it takes O(100) gmres iterations to converge, I've hard coded the spacing h, the wave seems very diffused) but it's a start.

Porting over the PanelSystem changes from the BarnesHut PR.

Optmized as much as I can. Here's the N=12.9K case: julia> @Btime extrema(cₚ(gmressolve!(PanelSystem($panels,sym_axes=(2,3)),verbose=false))) 63.136 s (214 allocations: 2.68 MiB) (-1.2519970778461476, 0.9997268847578672) julia> @Btime extrema(cₚ(directsolve!(PanelSystem($panels,sym_axes=(2,3)),verbose=false))) 143.595 s (82993 allocations: 2.50 GiB) (-1.2519952210157665, 0.9997268849291909) And here's N=805: julia> @Btime extrema(cₚ(directsolve!(PanelSystem($panels,sym_axes=(2,3)),verbose=false))) 150.640 ms (69 allocations: 9.93 MiB) (-1.2575391174014392, 0.9955853298424956) julia> @Btime extrema(cₚ(gmressolve!(PanelSystem($panels,sym_axes=(2,3)),verbose=false))) 215.061 ms (214 allocations: 186.08 KiB) (-1.257545978734143, 0.9955853089025847)

… aren't available.

This lets me type dispatch ∫G_kernel for different panel types.

Added a nice batch of test for stl panels - so this is certainly working properly now. Also updated a few places to adapt to variable Floating point type. Fixed addedmass to be dimensionless by including an approximate body_vol() calculator using the divergence theorem. It's within a few percent for the D20.

The panels = [body;freesurf] idea was fundamentally terrible. Unleashing tons of free surface panel specific code all over the package. Instead I've walked back to only have a BodyPanelSystem, and simplifying panel_method and solvers to only operate on AbstractPanelSystems for now. BarnesHut<:AbstractPanelSystem is also wrong. BarnesHut<:AbstractVector as a Table wrapper is much much better.

Also renamed the BarnesHutCore functions and variables

The submerged spheroid test is working, but the wigley and prism tests aren't. I'm missing the freesurface contour & the thinship facilities, so that is probably the issue.

weymouth · 2026-01-18T14:18:47Z

Benchmarks. First the big quarter sphere has around the same benchmark as above:

julia> using NeumannKelvin,BenchmarkTools

julia> S(θ,φ) = SA[sin(θ)*cos(φ), sin(θ)*sin(φ), cos(θ)]
S (generic function with 1 method)

julia> panels = panelize(S, 0, π/2, 0, π, hᵤ=1/64, N_max=Inf) # quite a few panels
Table with 8 columns and 12934 rows:
      x                     n                     dA           xg                    wg                    verts                 nverts                kernel
    ┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 1  │ [0.00933365, 0.0038…  [0.00933374, 0.0038…  9.49835e-5   SVector{3, Float64}…  [1.00364e-5 1.00364…  SVector{3, Float64}…  SVector{3, Float64}…  QuadKernel()
 2  │ [0.00386613, 0.0093…  [0.00386616, 0.0093…  9.49835e-5   SVector{3, Float64}…  [1.00364e-5 1.00364…  SVector{3, Float64}…  SVector{3, Float64}…  QuadKernel()
 3  │ [-0.00386613, 0.009…  [-0.00386616, 0.009…  9.49835e-5   SVector{3, Float64}…  [1.00364e-5 1.00364…  SVector{3, Float64}…  SVector{3, Float64}…  QuadKernel()
 ⋮  │          ⋮                     ⋮                 ⋮                ⋮                     ⋮                     ⋮                     ⋮                 ⋮

julia> @btime extrema(cₚ(gmressolve!(BodyPanelSystem($panels,sym_axes=(2,3),wrap=PanelTree),verbose=false)))
  2.835 s (78309 allocations: 13.11 MiB)
(-1.2283857777503266, 0.9998190404893949)

Free surface benchmarks! First the reflection body

julia> function spheroid(h;L=1,Z=-1/8,r=1/12,AR=1/2r,kwargs...)
           S(θ₁,θ₂) = SA[0.5L*cos(θ₁),-r*sin(θ₂)*sin(θ₁),r*cos(θ₂)*sin(θ₁)+Z]
           panelize(S,0,π,0,π,hᵤ=h*√AR,hᵥ=h/√AR;kwargs...)
       end
spheroid (generic function with 1 method)

julia> function halfplane(L::T,ℓ;hᵤ=0.3ℓ,hᵥ=hᵤ,s=L/2+2π*ℓ) where T
           measure.((u,v)->SA[u,v,0],s:-hᵤ:-2s,(hᵥ/2:hᵥ:4s/3)',hᵤ,hᵥ,flip=true;T)
       end
halfplane (generic function with 1 method)

julia> body = spheroid(1/60); PanelTree(body)
PanelTree(766 panels, 11 levels, θ²: 4)
  bounds: (-0.5f0, -0.083266795f0, -0.2082668f0) to (0.5f0, -0.0f0, -0.04173321f0)
  panel type: NeumannKelvin.QuadKernel

julia> @btime extrema(cₚ(gmressolve!(BodyPanelSystem($body,sym_axes=(2,3),wrap=PanelTree),verbose=false)))
  179.872 ms (5210 allocations: 846.55 KiB)
(-0.18214363989174598, 0.7029211352868603)

julia> @btime extrema(cₚ(directsolve!(BodyPanelSystem($body,sym_axes=(2,3)),verbose=false)))
  237.954 ms (100 allocations: 9.04 MiB)
(-0.17960233524532265, 0.7059135143873083)

So the base time is 0.18s and skipping the PanelTree and using a direct solve of only 30% slower at this size. Now we can compare with the Neumann-Kelvin and the free surface panels methods.

julia> ℓ=1/2π; freesurf = halfplane(1.,ℓ); PanelTree(Table(freesurf))
PanelTree(3990 panels, 13 levels, θ²: 4)
  bounds: (-3.0120425f0, 0.0f0, 0.0f0) to (1.5238732f0, 2.0053523f0, 0.0f0)
  panel type: NeumannKelvin.QuadKernel

julia> @btime extrema(cₚ(gmressolve!(FSPanelSystem($body,$freesurf;sym_axes=2,ℓ),verbose=false)))
  23.317 s (43082 allocations: 12.91 MiB)
(-0.5750295082213954, 0.870222940023746)

julia> @btime extrema(cₚ(gmressolve!(FSPanelSystem($body,$freesurf;sym_axes=2,ℓ,θ²=16),verbose=false)))
  50.816 s (42688 allocations: 12.75 MiB)
(-0.5573270678171744, 0.8721034727299559)

julia> @btime extrema(cₚ(directsolve!(NKPanelSystem($body,sym_axes=2,ℓ),verbose=false)))
  7.666 s (96 allocations: 9.04 MiB)
(-0.595137720851715, 0.8843762358082723)

So the free surface panel solve is 130-250 times slower than using reflection, depending on θ²=16. But considering that we have 6x the number of panels, this means the solver is only 20+ times slower, which makes perfect sense since we need O(100) iterations instead of 4-8.

The NK panels are much faster, only 43 times slower than reflection, and only 32 time slower than the reflection direct solve, which is the more direct comparison. Green's functions timings are similar

julia> sys = gmressolve!(BodyPanelSystem(body,sym_axes=(2,3)),verbose=false);

julia> @btime ∫G($body.x[1],$sys.body[end])
  41.554 ns (0 allocations: 0 bytes)
-0.000103798557938834

julia> @btime kelvin($body.x[1],$sys.body.x[end])
  312.350 ns (0 allocations: 0 bytes)
-1.0231497267700616

julia> @btime kelvin($body.x[1],$sys.body.x[2])
  1.375 μs (0 allocations: 0 bytes)
-3.431773209643262

which is only 8-33 times slower. I'm really happy with those numbers.

though it doesn't change any of the tests

Giant update integrating all the changes from PR #18. Free surface source panels, tree summation, gmres solver, new multi-threading library, plotting functions, zero allocation potential evaluations and structs to keep everything organized and easy to use.

weymouth added 11 commits December 25, 2025 10:14

add nonmutatin accumulate

0ba5214

Add BEM functions and test

b9956be

Working nicely on 8 panel sphere.

fixed typo in accumulate

e9a9edc

bvh.skips is already the cumsum. now its fixed.

Fixed gradient error test and added timing

50db5e0

Looking great.

Fix green's function allocation problem!

79d124a

Working with gradient and added alloc tests

83f967c

Add documentation tests and exports

96e0410

Fixed steady_force(BarnesHut)

d0a42d5

AK reductions need much more specification than ThreadsX. Maybe that implies they will be faster?

weymouth added 3 commits December 30, 2025 21:49

Clean up docs and add verbose evaluate flag.

8d54062

weymouth added 14 commits January 2, 2026 08:57

No more Neumann Kelvin!?!

556e736

Identity crisis. Barnes-Hut is working, and Kelvin green's functions aren't. Time to clean house.

Continue clean up

1f6af3d

Removed BarnesHutsolve and fixed Fn=0 test

514b471

Added \zeta function

574735f

and a few docs adjustments

Add preliminary FSBC

85e73f0

I had to use an upwinded FD version of Φₓₓ to get a qualitatively reasonable freesurf result. It's extremely far from ideal (it takes O(100) gmres iterations to converge, I've hard coded the spacing h, the wave seems very diffused) but it's a start.

PanelSystem

1354f57

Porting over the PanelSystem changes from the BarnesHut PR.

Updated BarnesHut to fall back on a bounding sphere when the vertices…

5f2dc97

… aren't available.

Add kernel type to panels

ccc87ff

This lets me type dispatch ∫G_kernel for different panel types.

Added in STL panel logic as GeometryBasics extension

f09e2d5

Path typo

27b509a

re-introduce multi-threaded influence(sys) function

26e1317

weymouth added 19 commits January 13, 2026 16:40

Add steadyforce(;S) keyword argument

317c1c2

Add Makie freesurface plot and PanelSystem default

b1173be

add colorange

7eee95d

untested

8773169

Added (and tested) U within PanelSystem

10d723e

Better argument error catching

fa5d7f8

Reintroduce kelvin

e657d83

Allocation free accumulate

f0eb97e

Rename PanelTree

ebd74fc

Also renamed the BarnesHutCore functions and variables

Add FSPanelSystem

437d448

Update docstrings, and show

b20b248

Add velocity function and update MakieExt

9cb9ebd

Clean up panel methods a bit

8530e97

Added swimming animal examples

a59a842

Add quantitative FSPanel test

706e6b6

Zero allocation N+W and sketch of NKPanelSystem

c81cf67

NKPanelSystem working

1e70026

The submerged spheroid test is working, but the wigley and prism tests aren't. I'm missing the freesurface contour & the thinship facilities, so that is probably the issue.

Fix test and limit benchmark time

7c6bafb

weymouth added 8 commits January 18, 2026 22:17

bug fix in finite_ranges

334955a

though it doesn't change any of the tests

Reintroduce filter and waterline contour

66e2f69

Mutable arguments for free surface systems

9aae9f2

Undo mutable ℓ and update viz

7560f07

Change default to θ²=9 and clean up allocations

2698036

Refactor MakieExt to avoid overlap.

2f30247

Working on PlotsExt

a7bb986

still not working well...

7591f55

weymouth merged commit c68eb04 into main Jan 20, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Barnes-Hut evaluation using ImplicitBVH and solution using Krylov#18

Barnes-Hut evaluation using ImplicitBVH and solution using Krylov#18
weymouth merged 68 commits intomainfrom
BarnesHut

weymouth commented Dec 29, 2025

Uh oh!

weymouth commented Dec 30, 2025 •

edited

Loading

Uh oh!

weymouth commented Jan 1, 2026 •

edited

Loading

Uh oh!

weymouth commented Jan 18, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

weymouth commented Dec 29, 2025

Uh oh!

weymouth commented Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

weymouth commented Jan 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

weymouth commented Jan 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

weymouth commented Dec 30, 2025 •

edited

Loading

weymouth commented Jan 1, 2026 •

edited

Loading

weymouth commented Jan 18, 2026 •

edited

Loading