new MeshBody type and BVH for fast measures#267
new MeshBody type and BVH for fast measures#267marinlauber wants to merge 35 commits intoWaterLily-jl:masterfrom
MeshBody type and BVH for fast measures#267Conversation
|
Thanks for all the work on this. It'll be really awesome to have generic STL reading in the code base. I notice you tried to use ImplicitBVH to speed up the SDF. Why didn't you go with that? It looks perfect! I'm hoping to use it for a Barnes-Hut method in a panel code, and that will take some adjustment. But WaterLily's use case seems like it should be plug and play... |
|
I initially planned to do this, but the main issue is that the BVH only computes neighbouring pairs with the Additionally, I couldn't figure out how to make the It is maybe true that it's worth another shot now that I think of it. |
|
@weymouth I've followed the suggestion to digged deeper with ImplicitBVH and it now works on CPU and GPUs 🥳 Still not perfect everywhere when we measure, but getting there. The measure is roughly 100x faster for the dragon.
|
|
That's awesome! |
|
This starts to shape up nicely!
|
|
I think signed SDF can't be done efficiently without changing the traversal order. We need to check both sibling distances and descend into the closer one first. Once we get down to a leaf, we'll have a true signed distance, and we can prune anything farther away. If we don't descend into the closest sibling each time, you could end up checking every leaf! It seems like this will require allocating a state vector. We will no longer be able to fast quit once we get a virtual node. And it's going to be more conservative with pruning. I'm guessing it's no better than half as fast as we could get the squared distance. What do you think? |
|
Yes, I think this is clearly the best approach. It also means that the number of measures scales with the number of levels, which is probably better on average than trying all the neighboring branches. I've coded up something like this, see next update. It always measures the closet tri, but it's not quite as good as the depth-first approach for now. |
|
What do you mean by, not quite as good? How are you testing these? |
I mean that it works on some geometries, and on some others, I get garbage. It's probably linked to the BVH structure. |
|
Do you have a few geometries to test with? Like the chain links that Rajan
used or something?
Gabriel D Weymouth
…_______________________________________________
"Computers are useless. They can only give you answers." Pablo Picasso
On Mon, Feb 16, 2026 at 1:12 PM Marin Lauber ***@***.***> wrote:
*marinlauber* left a comment (WaterLily-jl/WaterLily.jl#267)
<#267 (comment)>
What do you mean by, not quite as good? How are you testing these?
I mean that it works on some geometries, and on some others, I get
garbage. It's probably linked to the BVH structure.
—
Reply to this email directly, view it on GitHub
<#267 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADSKJ2TC35FUKZO3V2NO6T4MGX4JAVCNFSM6AAAAACPKAQADOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTSMBYGE2TANBXGY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
I don't have the chain links specifically. I have attached a sphere, and an aorta in the original comment of the PR, see meshes.zip There are a few others here (sphere, naca, dragon) |
Cleaned up a few bugs/confusing bits in closest(). This is not clever (!) but I think it is now correct. It only prunes when the BV (square) distance is greater than the current best distance. This is even applied at the leaf level since that distance function is much much faster than the triangle's. If you want a signed distance function, you need to initialize to floatmax (or the size of the total bounding box), forcing you to find the closest triangle so the sign is correct. However, when the mesh defines a membrane body (or you just want the unsigned distance) you can initialize with the max BDIM value. I've set this cutoff to 4 everywhere I noticed. A more efficient traversal will require a stack. I'm not positive about the best way to implement a stack on each thread of a GPU... and I'm also not sure how much better it will work. So let's try this out first.
|
I updated |
| using Adapt | ||
| # make it GPU compatible | ||
| Adapt.@adapt_structure SetBody |
There was a problem hiding this comment.
Why is this needed? It would be nicer to leave Adapt as an weakdep if possible.
There was a problem hiding this comment.
It's needed if we combine MeshBody and AbstractBodies, if I remember correctly.
src/WaterLily.jl
Outdated
| export viz!, get_body, plot_body_obs! | ||
|
|
||
| # default GeometryBasics extention | ||
| function MeshBody end |
There was a problem hiding this comment.
Should this throw an error or something to tell people to use GeometryBasics before using MeshBody?
| T = Float32; mem = Array | ||
| # the reference triangle | ||
| tri1 = SA{T}[0 1 0; 0 0 1; 0 0 0] | ||
| R = SA{T}[cos(π/4) -sin(π/4) 0; sin(π/4) cos(π/4) 0; 0 0 1] | ||
| normal = Base.get_extension(WaterLily, :WaterLilyGeometryBasicsExt).normal | ||
| @test all(normal(tri1) .≈ [0,0,1]) | ||
| @test all(normal(R*tri1) .≈ R*normal(tri1)) | ||
| hat = Base.get_extension(WaterLily, :WaterLilyGeometryBasicsExt).hat | ||
| vec = SA{T}[3,4,0] | ||
| @test all(hat(vec) .≈ vec./5) | ||
| @test all(hat(zero(vec)) .≈ zero(vec)) # edge case | ||
| d²_fast = Base.get_extension(WaterLily, :WaterLilyGeometryBasicsExt).d²_fast | ||
| @test d²_fast(SA{T}[0.1,0.1,0.1], tri1) ≈ 0.1^2 | ||
| @test d²_fast(SA{T}[0.5,0.5,0.0], tri1) ≈ 0^2 | ||
| @test d²_fast(R*SA{T}[0.1,0.1,0.1], R*tri1) ≈ 0.1^2 # invariant under rotation | ||
| center = Base.get_extension(WaterLily, :WaterLilyGeometryBasicsExt).center | ||
| @test all(center(tri1) .≈ SA{T}[1/3,1/3,0]) | ||
| @test all(abs.(center(R*tri1) .- R*SA{T}[1/3,1/3,0]) .< eps(Float32)) | ||
|
|
||
| # bounding volume functions | ||
| import ImplicitBVH: BBox, BSphere | ||
| inside = Base.get_extension(WaterLily, :WaterLilyGeometryBasicsExt).inside | ||
| bbox = BBox{T}(SA{T}[-1.0,-1.0,-1.0], SA{T}[1.0,1.0,1.0]) # extends from -5:5 | ||
| bsphere = BSphere{T}(SA{T}[0.0,0.0,0.0], T(1.0)) # radius 5.0 | ||
| @test inside(SA{T}[0.0,0.0,0.0], bbox) | ||
| @test !inside(SA{T}[5.01,0.0,0.0], bbox) | ||
| @test inside(SA{T}[0.0,0.0,0.0], bsphere) | ||
| @test !inside(SA{T}[5.01,0.0,0.0], bsphere) | ||
|
|
||
| # shape function and get_velocity | ||
| shape_value = Base.get_extension(WaterLily, :WaterLilyGeometryBasicsExt).shape_value | ||
| tri = SA{T}[0 1 0; 0 0 1; 0 0 0] | ||
| p = SA{T}[0.1,0.1,0.0] | ||
| # value at nodes | ||
| @test all(shape_value(SA{T}[0,0,0], tri) .≈ [1,0,0]) | ||
| @test all(shape_value(SA{T}[0,1,0], tri) .≈ [0,0,1]) | ||
| @test all(shape_value(SA{T}[1,0,0], tri) .≈ [0,1,0]) | ||
| # value at mid edges | ||
| @test all(shape_value(SA{T}[0,.5,0], tri) .≈ [.5,0,.5]) | ||
| @test all(shape_value(SA{T}[.5,0,0], tri) .≈ [.5,.5,0]) | ||
| @test all(shape_value(SA{T}[.5,.5,0], tri) .≈ [0,.5,.5]) | ||
| # value inside | ||
| x = rand() # in the plane of the triangle | ||
| @test sum(shape_value(SA{T}[x,1-x,0], tri)) .≈ 1 #partition to unity | ||
| # velocity interpolation | ||
| get_velocity = Base.get_extension(WaterLily, :WaterLilyGeometryBasicsExt).get_velocity | ||
| vel = SA{T}[1 1 1; 0 0 0; 0 0 0] | ||
| @test all(get_velocity(p, tri, vel) .≈ [1,0,0]) | ||
| @test all(get_velocity(p, tri, R*vel) .≈ R*[1,0,0]) | ||
|
|
||
| # test locator | ||
| locate = Base.get_extension(WaterLily, :WaterLilyGeometryBasicsExt).locate | ||
| x1 = SA{T}[0,0,0] | ||
| @test all(locate(x1, tri1) .≈ x1) | ||
| x2 = SA{T}[0.1,0.1,0.0] | ||
| @test all(locate(x2, tri1) .≈ x2) | ||
| @test all(locate(x2.+SA{T}[0,0,10.0], tri1) .≈ x2) | ||
| x3 = SA{T}[-1.0,0.5,0.0] | ||
| @test all(locate(x3, tri1) .≈ SA{T}[0.0,0.5,0.0]) | ||
| x4 = SA{T}[0.5,0.5,0.0] | ||
| @test all(locate(x4, tri1) .≈ x4) | ||
| @test all(locate(SA{T}[.5,.5,10], tri1) .≈ x4) | ||
| @test all(locate(SA{T}[1,1,1], tri1) .≈ [0.5,0.5,0.0]) | ||
|
|
||
| for mem in arrays | ||
| # test closest point search in BVH | ||
| closest = Base.get_extension(WaterLily, :WaterLilyGeometryBasicsExt).closest | ||
| mesh = mem([tri1, R*tri1 .+ 1]) | ||
| bounding_boxes = ImplicitBVH.BBox{T}.(mesh) | ||
| bvh = ImplicitBVH.BVH(bounding_boxes, ImplicitBVH.BBox{T}) | ||
| # trivial locate | ||
| @test GPUArrays.@allowscalar all(closest(x1, bvh, mesh) .≈ (1, 0)) | ||
| # @btime $closest($x1, $bvh,$mesh) 30.225 ns (0 allocations: 0 bytes) | ||
| @test GPUArrays.@allowscalar all(closest(SA{T}[1,1,1], bvh, mesh) .≈ (2, 0)) | ||
| # not so trivial | ||
| @test GPUArrays.@allowscalar all(closest(SA{T}[0.1,0.1,0.5], bvh, mesh) .≈ (1, 0.5^2)) | ||
|
|
||
| # test measure all at once | ||
| measure = WaterLily.measure | ||
| body = MeshBody(mesh, zero(mesh), bvh) | ||
| @test GPUArrays.@allowscalar all(body.bvh.nodes[1].lo .≈ [0,0,0]) # lowest point of tri1 | ||
| # we know this point | ||
| @test GPUArrays.@allowscalar all(measure(body, x1, 0) .≈ (0,[0,0,1],[0,0,0])) | ||
| @test GPUArrays.@allowscalar all(measure(body, x2, 0) .≈ (0,[0,0,1],[0,0,0])) | ||
| @test GPUArrays.@allowscalar all(measure(body, SA{T}[.5,.5,100], 0) .≈ (8,[0,0,0],[0,0,0])) | ||
| xr = SVector{3,T}(rand(3)) | ||
| @test GPUArrays.@allowscalar all(measure(body, xr, 0)[1] .≈ sdf(body, xr, 0)) # same call behind, should be the same | ||
| # update! by moving the mesh by +1 in all directions | ||
| new_mesh = mem([tri1 .+ 1, R*tri1 .+ 2]) | ||
| body = update!(body, new_mesh, 1.0) | ||
| @test GPUArrays.@allowscalar all(body.velocity[1] .≈ 1) && all(body.velocity[2] .≈ 1) # unit velocity is all directions | ||
| @test GPUArrays.@allowscalar all(measure(body, x1.+1, 0) .≈ (0,[0,0,1],[1,1,1])) # now we have a unit velocity | ||
| # check that bvh has aslo moved | ||
| @test GPUArrays.@allowscalar all(body.bvh.nodes[1].lo .≈ [1,1,1]) | ||
| # try inside SetBody | ||
| body += AutoBody((x,t)->42.f0) # the answer! | ||
| @test GPUArrays.@allowscalar all(measure(body, x1.+1, 0) .≈ (0,[0,0,1],[1,1,1])) | ||
| end |
There was a problem hiding this comment.
I'm usually a big fan of tests, but that 100 lines! Does it make sense to keep all of these or is there some redundancy?
There was a problem hiding this comment.
I guess the last for loop is the most important since its the "final product", we could remove the smaller function test.
| Adapt = "79e6a3ab-5dfb-504d-930d-738a2a938a0e" | ||
| ConstructionBase = "187b0558-2788-49d3-abe0-74a17ed4e7c9" | ||
| DocStringExtensions = "ffbed154-4ef7-542d-bbb7-c09d3a79fcae" | ||
| EllipsisNotation = "da5c29d0-fa7d-589e-88eb-ea29b0a81949" | ||
| FileIO = "5789e2e9-d7fb-5bc7-8068-2c6fae9b9549" | ||
| ForwardDiff = "f6369f11-7733-5829-9624-2563aa707210" | ||
| GeometryBasics = "5c1252a2-5f33-56bf-86c9-59e7332b4326" | ||
| ImplicitBVH = "932a18dc-bb55-4cd5-bdd6-1368ec9cea29" | ||
| KernelAbstractions = "63c18a36-062a-441e-b654-da1e3ab1ce7c" | ||
| LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e" | ||
| LoggingExtras = "e6f89c97-d47a-5376-807f-9c37f3926c36" | ||
| MeshIO = "7269a6da-0436-5bbc-96c2-40638cbb6118" | ||
| Preferences = "21216c6a-2e73-6563-6e65-726566657250" | ||
| Printf = "de0858da-6303-5e67-8744-51eddeeeb8d7" | ||
| Reexport = "189a3867-3050-52da-a836-e630ba90ab69" | ||
| Requires = "ae029012-a4dd-5104-9daa-d747884805df" | ||
| StaticArrays = "90137ffa-7385-5640-81b9-e52037218182" | ||
| TOML = "fa267f1f-6049-4f14-aa54-33bafae1ed76" | ||
| WriteVTK = "64499a7a-5c06-52f2-abe2-ccb03c286192" |
There was a problem hiding this comment.
Hopefully, most of these can be moved to weakdeps.
There was a problem hiding this comment.
I haven't cleaned this yet, but it seems they could go in weakdeps for all of them, except Adapt which might be required for SetBody.
| @@ -68,4 +84,4 @@ UnicodePlots = "b8865327-cd53-5732-bb35-84acbb429228" | |||
| WriteVTK = "64499a7a-5c06-52f2-abe2-ccb03c286192" | |||
There was a problem hiding this comment.
Shouldn't these just be weakdeps? Or do we need them here as well?

This PR implements a new
MeshBodytype to enable loading and measuring any geometry available inMeshIO.jland Abaqus/Calculix.inpfiles into WaterLily. For now, only linear triangular elements are supported, but I have code to load and measure quads as well.The classical measure is performed through a brute force approach that measures every single point on the mesh and uses the smallest distance.
For a large mesh on the GPU, this does not work. For this, this PR uses ImplicitBVH to generate a bounded volume hierarchy that can be traversed rapidly to measure the geometry.
This PR is still in an experimental state. The current main issue before porting to the GPU is that every leaf bounding volume must hold the list of triangles that are within this volume (to perform the final brute force measure). Since the number of triangle they contain differ, it is difficult to allocate the list of lists on the GPU. Maybe a better data structure is required.Additionally, I didn't manage to store theMeshtype that is native toGeometryBasicsin theMeshBodytype, and I must unpack all the triangles into aVector{SVector{3,3,T}} > memfrom which I can then measure.The code works on the CPU and on the GPU. The BVH still has some trouble measuring accurately in some regions.
Some of the functions that are currently on the PR should go inMeshIO.jlandFileIO.jl, I've opened two PR's JuliaIO/MeshIO.jl#110 JuliaIO/FileIO.jl#414 there, but before they are merged, this has to live here.Finally, the meshes used in the examples are here:
meshes.zip