Skip to content

new MeshBody type and BVH for fast measures#267

Draft
marinlauber wants to merge 35 commits intoWaterLily-jl:masterfrom
marinlauber:MeshBody
Draft

new MeshBody type and BVH for fast measures#267
marinlauber wants to merge 35 commits intoWaterLily-jl:masterfrom
marinlauber:MeshBody

Conversation

@marinlauber
Copy link
Copy Markdown
Member

@marinlauber marinlauber commented Dec 17, 2025

This PR implements a new MeshBody type to enable loading and measuring any geometry available in MeshIO.jl and Abaqus/Calculix .inp files into WaterLily. For now, only linear triangular elements are supported, but I have code to load and measure quads as well.

The classical measure is performed through a brute force approach that measures every single point on the mesh and uses the smallest distance.

For a large mesh on the GPU, this does not work. For this, this PR uses ImplicitBVH to generate a bounded volume hierarchy that can be traversed rapidly to measure the geometry.

This PR is still in an experimental state. The current main issue before porting to the GPU is that every leaf bounding volume must hold the list of triangles that are within this volume (to perform the final brute force measure). Since the number of triangle they contain differ, it is difficult to allocate the list of lists on the GPU. Maybe a better data structure is required.

Additionally, I didn't manage to store the Mesh type that is native to GeometryBasics in the MeshBody type, and I must unpack all the triangles into a Vector{SVector{3,3,T}} > mem from which I can then measure.

The code works on the CPU and on the GPU. The BVH still has some trouble measuring accurately in some regions.

Some of the functions that are currently on the PR should go in MeshIO.jl and FileIO.jl, I've opened two PR's JuliaIO/MeshIO.jl#110 JuliaIO/FileIO.jl#414 there, but before they are merged, this has to live here.

Finally, the meshes used in the examples are here:
meshes.zip

@weymouth
Copy link
Copy Markdown
Member

weymouth commented Dec 23, 2025

Thanks for all the work on this. It'll be really awesome to have generic STL reading in the code base.

I notice you tried to use ImplicitBVH to speed up the SDF. Why didn't you go with that? It looks perfect! I'm hoping to use it for a Barnes-Hut method in a panel code, and that will take some adjustment. But WaterLily's use case seems like it should be plug and play...

@marinlauber
Copy link
Copy Markdown
Member Author

@weymouth

I initially planned to do this, but the main issue is that the BVH only computes neighbouring pairs with the BoundingObject, it doesn't compute the distance after that. It would require writting a new function for that, which is doable.

Additionally, I couldn't figure out how to make the Mesh type GPU-friendly, the BVH works on the GPU, but because the whole body is passed to a kernel function when we measure, all it's atributes must be GPU compatible. This is more painful to solve, but my currentl implementation also phases this issue, which I solve by not sotring th Mesh in the MeshBody, only a surogate.

It is maybe true that it's worth another shot now that I think of it.

@marinlauber
Copy link
Copy Markdown
Member Author

marinlauber commented Jan 27, 2026

@weymouth I've followed the suggestion to digged deeper with ImplicitBVH and it now works on CPU and GPUs 🥳

Still not perfect everywhere when we measure, but getting there. The measure is roughly 100x faster for the dragon.

Screenshot From 2026-01-27 12-20-26

@weymouth
Copy link
Copy Markdown
Member

That's awesome!

@marinlauber
Copy link
Copy Markdown
Member Author

marinlauber commented Feb 6, 2026

This starts to shape up nicely!
With the Adapt.@adapt_structure SetBody allows us to combine a MeshBody with any other AbstractBody.
The remaining issues are:

  • Fix signed distance for points inside the mesh that never reach a leaf and never get assigned the correct signed distance. There is an edge case of periodic structures that are always kind of open.
  • Implement force routines, which will require a custom Kernel, similar to what is done here https://github.com/WaterLily-jl/flexible-cylinder-project/blob/master/src/Forces.jl
  • Specialise for 2D simulations with 3D meshes (or a mesh slice).
  • Compatibility with QuadMeshes
  • Split leaf and node primitives, maybe be better/faster

@weymouth
Copy link
Copy Markdown
Member

weymouth commented Feb 6, 2026

I think signed SDF can't be done efficiently without changing the traversal order. We need to check both sibling distances and descend into the closer one first. Once we get down to a leaf, we'll have a true signed distance, and we can prune anything farther away. If we don't descend into the closest sibling each time, you could end up checking every leaf!

It seems like this will require allocating a state vector. We will no longer be able to fast quit once we get a virtual node. And it's going to be more conservative with pruning. I'm guessing it's no better than half as fast as we could get the squared distance.

What do you think?

@marinlauber
Copy link
Copy Markdown
Member Author

marinlauber commented Feb 6, 2026

Yes, I think this is clearly the best approach. It also means that the number of measures scales with the number of levels, which is probably better on average than trying all the neighboring branches.

I've coded up something like this, see next update. It always measures the closet tri, but it's not quite as good as the depth-first approach for now.

@weymouth
Copy link
Copy Markdown
Member

What do you mean by, not quite as good? How are you testing these?

@marinlauber
Copy link
Copy Markdown
Member Author

What do you mean by, not quite as good? How are you testing these?

I mean that it works on some geometries, and on some others, I get garbage. It's probably linked to the BVH structure.

@weymouth
Copy link
Copy Markdown
Member

weymouth commented Feb 16, 2026 via email

@marinlauber
Copy link
Copy Markdown
Member Author

marinlauber commented Feb 17, 2026

I don't have the chain links specifically. I have attached a sphere, and an aorta in the original comment of the PR, see meshes.zip

There are a few others here (sphere, naca, dragon)
https://github.com/marinlauber/WaterLilyPreCICE/tree/master/meshes
and here as well
https://github.com/rhellemans1/WaterLilyPreCICE_KD/tree/master/bvh/obj

Cleaned up a few bugs/confusing bits in closest(). This is not clever (!) but I think it is now correct. It only prunes when the BV (square) distance is greater than the current best distance. This is even applied at the leaf level since that distance function is much much faster than the triangle's.

If you want a signed distance function, you need to initialize to floatmax (or the size of the total bounding box), forcing you to find the closest triangle so the sign is correct. However, when the mesh defines a membrane body (or you just want the unsigned distance) you can initialize with the max BDIM value. I've set this cutoff to 4 everywhere I noticed.

A more efficient traversal will require a stack. I'm not positive about the best way to implement a stack on each thread of a GPU... and I'm also not sure how much better it will work. So let's try this out first.
@weymouth
Copy link
Copy Markdown
Member

I updated closest() and it's working on your sphere geometry as well as being perfect on a cube. I didn't know how to allocate a stack per thread on the GPU, so I stuck with the stackless. If this is working, but still too slow, I'll try to add the stack and descend into the closer node.

Copy link
Copy Markdown
Member

@weymouth weymouth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Premerge clean-up.

Comment on lines +93 to +95
using Adapt
# make it GPU compatible
Adapt.@adapt_structure SetBody
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed? It would be nicer to leave Adapt as an weakdep if possible.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's needed if we combine MeshBody and AbstractBodies, if I remember correctly.

src/WaterLily.jl Outdated
export viz!, get_body, plot_body_obs!

# default GeometryBasics extention
function MeshBody end
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this throw an error or something to tell people to use GeometryBasics before using MeshBody?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea

Comment on lines +655 to +752
T = Float32; mem = Array
# the reference triangle
tri1 = SA{T}[0 1 0; 0 0 1; 0 0 0]
R = SA{T}[cos(π/4) -sin(π/4) 0; sin(π/4) cos(π/4) 0; 0 0 1]
normal = Base.get_extension(WaterLily, :WaterLilyGeometryBasicsExt).normal
@test all(normal(tri1) .≈ [0,0,1])
@test all(normal(R*tri1) .≈ R*normal(tri1))
hat = Base.get_extension(WaterLily, :WaterLilyGeometryBasicsExt).hat
vec = SA{T}[3,4,0]
@test all(hat(vec) .≈ vec./5)
@test all(hat(zero(vec)) .≈ zero(vec)) # edge case
d²_fast = Base.get_extension(WaterLily, :WaterLilyGeometryBasicsExt).d²_fast
@test d²_fast(SA{T}[0.1,0.1,0.1], tri1) ≈ 0.1^2
@test d²_fast(SA{T}[0.5,0.5,0.0], tri1) ≈ 0^2
@test d²_fast(R*SA{T}[0.1,0.1,0.1], R*tri1) ≈ 0.1^2 # invariant under rotation
center = Base.get_extension(WaterLily, :WaterLilyGeometryBasicsExt).center
@test all(center(tri1) .≈ SA{T}[1/3,1/3,0])
@test all(abs.(center(R*tri1) .- R*SA{T}[1/3,1/3,0]) .< eps(Float32))

# bounding volume functions
import ImplicitBVH: BBox, BSphere
inside = Base.get_extension(WaterLily, :WaterLilyGeometryBasicsExt).inside
bbox = BBox{T}(SA{T}[-1.0,-1.0,-1.0], SA{T}[1.0,1.0,1.0]) # extends from -5:5
bsphere = BSphere{T}(SA{T}[0.0,0.0,0.0], T(1.0)) # radius 5.0
@test inside(SA{T}[0.0,0.0,0.0], bbox)
@test !inside(SA{T}[5.01,0.0,0.0], bbox)
@test inside(SA{T}[0.0,0.0,0.0], bsphere)
@test !inside(SA{T}[5.01,0.0,0.0], bsphere)

# shape function and get_velocity
shape_value = Base.get_extension(WaterLily, :WaterLilyGeometryBasicsExt).shape_value
tri = SA{T}[0 1 0; 0 0 1; 0 0 0]
p = SA{T}[0.1,0.1,0.0]
# value at nodes
@test all(shape_value(SA{T}[0,0,0], tri) .≈ [1,0,0])
@test all(shape_value(SA{T}[0,1,0], tri) .≈ [0,0,1])
@test all(shape_value(SA{T}[1,0,0], tri) .≈ [0,1,0])
# value at mid edges
@test all(shape_value(SA{T}[0,.5,0], tri) .≈ [.5,0,.5])
@test all(shape_value(SA{T}[.5,0,0], tri) .≈ [.5,.5,0])
@test all(shape_value(SA{T}[.5,.5,0], tri) .≈ [0,.5,.5])
# value inside
x = rand() # in the plane of the triangle
@test sum(shape_value(SA{T}[x,1-x,0], tri)) .≈ 1 #partition to unity
# velocity interpolation
get_velocity = Base.get_extension(WaterLily, :WaterLilyGeometryBasicsExt).get_velocity
vel = SA{T}[1 1 1; 0 0 0; 0 0 0]
@test all(get_velocity(p, tri, vel) .≈ [1,0,0])
@test all(get_velocity(p, tri, R*vel) .≈ R*[1,0,0])

# test locator
locate = Base.get_extension(WaterLily, :WaterLilyGeometryBasicsExt).locate
x1 = SA{T}[0,0,0]
@test all(locate(x1, tri1) .≈ x1)
x2 = SA{T}[0.1,0.1,0.0]
@test all(locate(x2, tri1) .≈ x2)
@test all(locate(x2.+SA{T}[0,0,10.0], tri1) .≈ x2)
x3 = SA{T}[-1.0,0.5,0.0]
@test all(locate(x3, tri1) .≈ SA{T}[0.0,0.5,0.0])
x4 = SA{T}[0.5,0.5,0.0]
@test all(locate(x4, tri1) .≈ x4)
@test all(locate(SA{T}[.5,.5,10], tri1) .≈ x4)
@test all(locate(SA{T}[1,1,1], tri1) .≈ [0.5,0.5,0.0])

for mem in arrays
# test closest point search in BVH
closest = Base.get_extension(WaterLily, :WaterLilyGeometryBasicsExt).closest
mesh = mem([tri1, R*tri1 .+ 1])
bounding_boxes = ImplicitBVH.BBox{T}.(mesh)
bvh = ImplicitBVH.BVH(bounding_boxes, ImplicitBVH.BBox{T})
# trivial locate
@test GPUArrays.@allowscalar all(closest(x1, bvh, mesh) .≈ (1, 0))
# @btime $closest($x1, $bvh,$mesh) 30.225 ns (0 allocations: 0 bytes)
@test GPUArrays.@allowscalar all(closest(SA{T}[1,1,1], bvh, mesh) .≈ (2, 0))
# not so trivial
@test GPUArrays.@allowscalar all(closest(SA{T}[0.1,0.1,0.5], bvh, mesh) .≈ (1, 0.5^2))

# test measure all at once
measure = WaterLily.measure
body = MeshBody(mesh, zero(mesh), bvh)
@test GPUArrays.@allowscalar all(body.bvh.nodes[1].lo .≈ [0,0,0]) # lowest point of tri1
# we know this point
@test GPUArrays.@allowscalar all(measure(body, x1, 0) .≈ (0,[0,0,1],[0,0,0]))
@test GPUArrays.@allowscalar all(measure(body, x2, 0) .≈ (0,[0,0,1],[0,0,0]))
@test GPUArrays.@allowscalar all(measure(body, SA{T}[.5,.5,100], 0) .≈ (8,[0,0,0],[0,0,0]))
xr = SVector{3,T}(rand(3))
@test GPUArrays.@allowscalar all(measure(body, xr, 0)[1] .≈ sdf(body, xr, 0)) # same call behind, should be the same
# update! by moving the mesh by +1 in all directions
new_mesh = mem([tri1 .+ 1, R*tri1 .+ 2])
body = update!(body, new_mesh, 1.0)
@test GPUArrays.@allowscalar all(body.velocity[1] .≈ 1) && all(body.velocity[2] .≈ 1) # unit velocity is all directions
@test GPUArrays.@allowscalar all(measure(body, x1.+1, 0) .≈ (0,[0,0,1],[1,1,1])) # now we have a unit velocity
# check that bvh has aslo moved
@test GPUArrays.@allowscalar all(body.bvh.nodes[1].lo .≈ [1,1,1])
# try inside SetBody
body += AutoBody((x,t)->42.f0) # the answer!
@test GPUArrays.@allowscalar all(measure(body, x1.+1, 0) .≈ (0,[0,0,1],[1,1,1]))
end
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm usually a big fan of tests, but that 100 lines! Does it make sense to keep all of these or is there some redundancy?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the last for loop is the most important since its the "final product", we could remove the smaller function test.

Comment on lines +7 to +25
Adapt = "79e6a3ab-5dfb-504d-930d-738a2a938a0e"
ConstructionBase = "187b0558-2788-49d3-abe0-74a17ed4e7c9"
DocStringExtensions = "ffbed154-4ef7-542d-bbb7-c09d3a79fcae"
EllipsisNotation = "da5c29d0-fa7d-589e-88eb-ea29b0a81949"
FileIO = "5789e2e9-d7fb-5bc7-8068-2c6fae9b9549"
ForwardDiff = "f6369f11-7733-5829-9624-2563aa707210"
GeometryBasics = "5c1252a2-5f33-56bf-86c9-59e7332b4326"
ImplicitBVH = "932a18dc-bb55-4cd5-bdd6-1368ec9cea29"
KernelAbstractions = "63c18a36-062a-441e-b654-da1e3ab1ce7c"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
LoggingExtras = "e6f89c97-d47a-5376-807f-9c37f3926c36"
MeshIO = "7269a6da-0436-5bbc-96c2-40638cbb6118"
Preferences = "21216c6a-2e73-6563-6e65-726566657250"
Printf = "de0858da-6303-5e67-8744-51eddeeeb8d7"
Reexport = "189a3867-3050-52da-a836-e630ba90ab69"
Requires = "ae029012-a4dd-5104-9daa-d747884805df"
StaticArrays = "90137ffa-7385-5640-81b9-e52037218182"
TOML = "fa267f1f-6049-4f14-aa54-33bafae1ed76"
WriteVTK = "64499a7a-5c06-52f2-abe2-ccb03c286192"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully, most of these can be moved to weakdeps.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't cleaned this yet, but it seems they could go in weakdeps for all of them, except Adapt which might be required for SetBody.

Comment on lines 67 to 84
@@ -68,4 +84,4 @@ UnicodePlots = "b8865327-cd53-5732-bb35-84acbb429228"
WriteVTK = "64499a7a-5c06-52f2-abe2-ccb03c286192"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't these just be weakdeps? Or do we need them here as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants