Non-uniform boundary conditions by marinlauber · Pull Request #165 · WaterLily-jl/WaterLily.jl

marinlauber · 2024-09-13T09:45:55Z

This pull request adds the capability of using space and time-varying inflow boundary conditions as well as prescribing space-time body force to the momentum step.

The major change is that now the u_BC::Function passed to the Flow is of the form u_BC(i,x,t) where i is the component, x the location in physical space and t the time. The flow is then accelerated correctly, and the boundary conditions on the velocity field are applied correctly. This allows to prescribe velocity profiles at the inlet and outlet.

Another feature grabbed from #174 is to enable passing either a body_force::AbstractArray or body_force::Function=(i,x,t)->... to mom_step! and sim_step!, which allows the user to either prescribe some custom term or accelerate the flow locally.

marinlauber · 2025-02-27T09:41:23Z

From some basic tests. The issue was the definition of the function for the non-uniform velocity BC (it was type unstable, which allocated a lot).
Additionally, passing an unspecified U the BC! function was also type unstable; here, I have made two different functions, and internally, one passes to the other.

The only real difference in the allocation is in accelerate, where the the @loop is a bit slower than brodcasting.

using WaterLily
using StaticArrays
using ForwardDiff
import WaterLily: accelerate!,@loop,size_u,slice
using BenchmarkTools

# classical accelerate!
# accelerate!(r,dt,g::Function,::Tuple,t=sum(dt)) = for i ∈ 1:last(size(r))
    # r[..,i] .+= g(i,t)
# end

accelerate_loop!(r,dt,g::Function,::Tuple,t=sum(dt)) = for i ∈ 1:last(size(r))
    @loop r[I,i] += g(i,t) over I ∈ CartesianIndices(Base.front(size(r)))
end

new_accelerate!(r,g::Function,t) = for i ∈ 1:last(size(r))
    @loop r[I,i] += g(i,loc(i,I,eltype(r)),t) over I ∈ CartesianIndices(Base.front(size(r)))
end

# some array to test this
N = 2^7
u = zeros(Float32,N,N,N,3);

# utils
fun(i,t)=sin(2π*t)*cos(2π*t)*cos(2π*t)
fun2(i,x,t) = sin(2π*t)*cos(2π*t)*cos(2π*t)
fun3(i,x,t) = sin(2π*x[1])*cos(2π*x[2])*cos(2π*x[3])
tu=(); ts=[1.0]; t=0

# old method
@btime accelerate!($u,$ts,$fun,$tu) # 5.265 ms (9 allocations: 24.00 MiB)
# same but loop
@btime accelerate_loop!($u,$ts,$fun,$tu) # 9.490 ms (849 allocations: 67.02 KiB)
# new method
@btime new_accelerate!($u,$fun2,$t) # 9.516 ms (849 allocations: 66.94 KiB)
@btime new_accelerate!($u,$fun3,$t) # 10.829 ms (849 allocations: 66.94 KiB)

# the new BC function
new_BC!(a,U,saveexit=false,perdir=(),t=0) = new_BC!(a,(i,x,t)->U[i],saveexit,perdir,t)
function new_BC!(a,uBC::Function,saveexit=false,perdir=(),t=0)
    N,n = size_u(a)
    for i ∈ 1:n, j ∈ 1:n
        if j in perdir
            @loop a[I,i] = a[CIj(j,I,N[j]-1),i] over I ∈ slice(N,1,j)
            @loop a[I,i] = a[CIj(j,I,2),i] over I ∈ slice(N,N[j],j)
        else
            if i==j # Normal direction, Dirichlet
                for s ∈ (1,2)
                    @loop a[I,i] = uBC(i,loc(i,I),t) over I ∈ slice(N,s,j)
                end
                (!saveexit || i>1) && (@loop a[I,i] = uBC(i,loc(i,I),t) over I ∈ slice(N,N[j],j)) # overwrite exit
            else    # Tangential directions, Neumann
                @loop a[I,i] = a[I+δ(j,I),i] over I ∈ slice(N,1,j)
                @loop a[I,i] = a[I-δ(j,I),i] over I ∈ slice(N,N[j],j)
            end
        end
    end
end

# very simple BCs
U = SA[1.0,0.,0.]
Ubc(i,I,t) = i==1 ? 1 : 0

# utils
x = SA[10.0,10.0,10.0]; t = 0.0; conv_exit=false
tu = (); i = 1

# test the bcs
@btime BC!($u,$U,$conv_exit,$tu) # 1.025 ms μs (5905 allocations: 477.30 KiB)
# new function with an array
@btime new_BC!($u,$U,$conv_exit,$tu,$t) # 1.038 ms (5905 allocations: 477.30 KiB)
# new function with a function
@btime new_BC!($u,$Ubc,$conv_exit,$tu,$t) # 770.551 μs (5878 allocations: 468.30 KiB)
@btime Ubc($i,$x,$t) # 1.031 ns (0 allocations: 0 bytes)

# more complex function
function u_pipe(i,x::SVector{3,T},t::T) where T
    i ≠ 1 && return zero(T)
    r = √sum(abs2,SA[x[2],x[3]].-64.0)
    return 2r>125 ? zero(T) : convert(T,1-r^2/16384)
end
tf32 =0.f0
@btime new_BC!($u,$u_pipe,$conv_exit,$tu,$tf32) # 780.826 μs (5905 allocations: 468.72 KiB)
@btime u_pipe($i,$x,$t) # 2.512 ns (0 allocations: 0 bytes)

marinlauber · 2025-02-27T14:55:23Z

@weymouth @b-fg let me know what you think. It seems that we could maybe clean a bit more the acceleration! / body_force! interface.

codecov · 2025-02-27T16:55:04Z

Codecov Report

Attention: Patch coverage is 90.90909% with 5 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/util.jl	72.22%	5 Missing ⚠️

Files with missing lines	Coverage Δ
src/Flow.jl	`100.00% <100.00%> (ø)`
src/Metrics.jl	`93.47% <100.00%> (ø)`
src/WaterLily.jl	`96.29% <100.00%> (-0.26%)`	⬇️
src/util.jl	`81.98% <72.22%> (+0.79%)`	⬆️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

b-fg · 2025-03-03T08:33:55Z

Just had a quick look and I like it! Also, I guess this PR solves #174 too, right? Happy to close that one if so. Also I like this body_force implementation, which can be array or function. It will be useful for including turbulence models ;)

marinlauber · 2025-03-03T11:56:59Z

Yes, this fixes #174 (the body_force part of it).

There is still one part of the test failing, which is related to constructing the simulation with a uBC(i,t), and internally transforming that into a uBC(i,x,t). The behaviour is breaking the impulsive cylinder force test.

Additionally, I am not sure if we want to keep the possibility of constructing the sim with three different velocity BC, Tuple, Function{Int,Number} and Function{Int,SVector,Number} or if we change everything to the second function type.

marinlauber · 2025-03-03T12:08:32Z

Just had a quick look and I like it! Also, I guess this PR solves #174 too, right? Happy to close that one if so. Also I like this body_force implementation, which can be array or function. It will be useful for including turbulence models ;)

That was the idea: simple body force can be implemented as g(i,x,t), but more complex like turbulence models need to access Flow variables, which not easy to implement just with a function ;)

b-fg · 2025-03-03T18:04:32Z

The only problem now is that passing a time varying body force which needs to make use of instantaneous flow field data (ie velocity) into sim_step! is not possible. That would require an instantaneous body_force::AbstractArray implementation, but currently the body_force in sim_step! is not updated until t_end. One could use the "single-step" sim_step! though.. but there needs to be some indication on how to use this.

b-fg · 2025-03-03T19:24:30Z

I have debugged your test error. Actually, the tests already fail on GPU for the "Flow.jl with increasing body force" test. The is nonbits error is arising from this new line
https://github.com/WaterLily-jl/WaterLily.jl/pull/165/files#diff-83b57561857c34d538577fdf7a3f03622f409f3dc9279d2fbfc5b9cf070daad7R72
Apparently, the GPU does not like the u_BC = uBC aliasing, instead just passing it directly uλ = (i,x)->u_BC[i] works.
I have not debugged further (the error you were referring to), but I would fix that first and then move to the next one. Remember to always pass tests on both CPU and GPU!

marinlauber · 2025-03-04T08:00:03Z

@b-fg I didn't like this aliasing anyway, I'll try to find a better way to accommodate the backward compatibility. Do you have any ideas on how to do that?
I was debbuging on my laptop this weekend, so I didn't test on the GPU ;)

b-fg · 2025-03-04T11:20:07Z

Ha, I was working on a fix too. Are your tests ok on the GPU now?

…ctor.

…type for Simulation and Flow (Flow was Float64 previously).

…m_step with arbitrary keyword arguments. The function can modify the ::Flow object and is called both in the predictor and corrector steps. Added tests for its implementation of an increasing body force, but using udf instead of body_force. All tests passing on Array and CuArray.

b-fg · 2025-03-04T16:06:16Z

The latest commit implements a general user-defined function (UDF) that is called both in predictor and corrector steps. This gives users the flexibility to implement any type of instantaneous and local function in their own scripts, with arbitrary keyword arguments, and pass it into the solver.

I do not love the fact that all g, body_force, and udf can be passed into sim_step!, but the current implementation is fairly clean otherwise. Maybe we can try to unify g and body_force since currently both of these features have a very similar utility, although them being separated also has added benefits. Thoughts @marinlauber @weymouth @TzuYaoHuang ?

marinlauber · 2025-03-04T16:26:33Z

I think body_force can go, because if it's only an (i,x,t) function, we can make g like that and apply it if needed or not. Then udf is the only additional argument to the mom_step function.

weymouth · 2025-04-01T05:18:20Z

I think it's fine like this. I like that it's consistent with the velocity function even if it's typically slightly overkill. We need to benchmark the performance though. I'm a little worried about that...

…

On Mon, Mar 31, 2025, 10:11 Marin Lauber ***@***.***> wrote: It looks nice like this! So, maybe the last thing is whether we should keep g(i,x,t) or revert to g(i,t) since now the udf can be used for this complex space-varying forcing. — Reply to this email directly, view it on GitHub <#165 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AADSKJ2NJL6NVR2WSWPJEXT2XD2EDAVCNFSM6AAAAABOFAJCZOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONRVGQ2TKNJQGI> . You are receiving this because you were mentioned.Message ID: ***@***.***> [image: marinlauber]*marinlauber* left a comment (WaterLily-jl/WaterLily.jl#165) <#165 (comment)> It looks nice like this! So, maybe the last thing is whether we should keep g(i,x,t) or revert to g(i,t) since now the udf can be used for this complex space-varying forcing. — Reply to this email directly, view it on GitHub <#165 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AADSKJ2NJL6NVR2WSWPJEXT2XD2EDAVCNFSM6AAAAABOFAJCZOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONRVGQ2TKNJQGI> . You are receiving this because you were mentioned.Message ID: ***@***.***>

marinlauber · 2025-04-01T05:27:08Z

I'll do the performance benchmark. It was fine when we first introduced the u_BC so it should be OK.

marinlauber · 2025-04-01T08:01:26Z

Testing Running tests...
[ Info: Test backends: Array, CuArray
Test Summary: | Pass  Total   Time
util.jl       |   54     54  41.3s
Test Summary: | Pass  Total   Time
Poisson.jl    |   14     14  27.4s
Test Summary:        | Pass  Total   Time
MultiLevelPoisson.jl |   13     13  11.3s
Test Summary: | Pass  Total   Time
Flow.jl       |   30     30  22.8s
Test Summary: | Pass  Total  Time
Body.jl       |    3      3  0.0s
Test Summary: | Pass  Total   Time
AutoBody.jl   |   16     16  13.0s
Test Summary:        | Pass  Total  Time
Flow.jl periodic TGV |    2      2  8.7s
Test Summary: | Pass  Total   Time
ForwardDiff   |    2      2  39.8s
Test Summary:                      | Pass  Total   Time
Flow.jl with increasing body force |    4      4  15.5s
Test Summary:       | Pass  Total  Time
Boundary Layer Flow |    2      2  7.7s
Test Summary:            | Pass  Total  Time
Rotating reference frame |    1      1  2.9s
Test Summary:               | Pass  Total   Time
Circle in accelerating flow |    8      8  10.4s
Test Summary: | Pass  Total   Time
Metrics.jl    |   37     37  16.8s
Test Summary: | Pass  Total  Time
WaterLily.jl  |   30     30  7.8s
Test Summary: | Pass  Total   Time
VTKExt.jl     |   28     28  21.9s
     Testing WaterLily tests passed

▶ Allocated 8 KiB
▶ Allocated 11 KiB
Test Summary:         | Pass  Total   Time
mom_step! allocations |    2      2  33.2s
     Testing WaterLily tests passed

the results of the benchmark ( I had to run them separately since the TGV benchmark requires some changes between the two version)

sh benchmark.sh -w "master 01d9a24" -v "1.11" -t "1 4" -b "Array CuArray" -c "tgv jelly" -p "6,7 5,6" -s "100 100" -ft "Float32 Float64"

Benchmark environment: tgv sim_step! (max_steps=100)
▶ log2p = 6
┌────────────┬────────────────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────────────┬──────────┐
│  Backend   │     WaterLily      │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Cost [ns/DOF/dt] │ Speed-up │
├────────────┼────────────────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────────────┼──────────┤
│     CPUx01 │ add-nonuniform-BCs │ 1.11.4 │   Float32 │       80936 │   0.00 │     6.40 │           244.06 │     1.00 │
│     CPUx01 │             master │ 1.11.4 │   Float32 │       80021 │   0.00 │     4.30 │           163.92 │     1.49 │
│     CPUx04 │ add-nonuniform-BCs │ 1.11.4 │   Float32 │     2343862 │   0.00 │     4.04 │           154.00 │     1.58 │
│     CPUx04 │             master │ 1.11.4 │   Float32 │     2315093 │   0.00 │    10.67 │           407.08 │     0.60 │
│ GPU-NVIDIA │ add-nonuniform-BCs │ 1.11.4 │   Float32 │     2994053 │   0.00 │     0.59 │            22.59 │    10.80 │
│ GPU-NVIDIA │             master │ 1.11.4 │   Float32 │     2828270 │   0.00 │     0.54 │            20.44 │    11.94 │
└────────────┴────────────────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────────────┴──────────┘
▶ log2p = 7
┌────────────┬────────────────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────────────┬──────────┐
│  Backend   │     WaterLily      │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Cost [ns/DOF/dt] │ Speed-up │
├────────────┼────────────────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────────────┼──────────┤
│     CPUx01 │ add-nonuniform-BCs │ 1.11.4 │   Float32 │       75708 │   0.00 │    43.59 │           207.88 │     1.00 │
│     CPUx01 │             master │ 1.11.4 │   Float32 │       74816 │   0.00 │    30.43 │           145.11 │     1.43 │
│     CPUx04 │ add-nonuniform-BCs │ 1.11.4 │   Float32 │     2181049 │   0.00 │    27.45 │           130.91 │     1.59 │
│     CPUx04 │             master │ 1.11.4 │   Float32 │     2153203 │   0.00 │    47.59 │           226.90 │     0.92 │
│ GPU-NVIDIA │ add-nonuniform-BCs │ 1.11.4 │   Float32 │     2738138 │   0.00 │     2.41 │            11.48 │    18.10 │
│ GPU-NVIDIA │             master │ 1.11.4 │   Float32 │     2573981 │   0.00 │     2.11 │            10.07 │    20.64 │
└────────────┴────────────────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────────────┴──────────┘
Figure stored in /home/marin/Workspace/WaterLily-Benchmarks/plots/tgv_cost_add-nonuniform-BCs_master_1.11.4_Float32.pdf
Figure stored in /home/marin/Workspace/WaterLily-Benchmarks/plots/tgv_benchmark_add-nonuniform-BCs_master_1.11.4_Float32.pdf
Benchmark environment: jelly sim_step! (max_steps=100)
▶ log2p = 5
┌────────────┬────────────────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────────────┬──────────┐
│  Backend   │     WaterLily      │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Cost [ns/DOF/dt] │ Speed-up │
├────────────┼────────────────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────────────┼──────────┤
│     CPUx01 │ add-nonuniform-BCs │ 1.11.4 │   Float64 │      173027 │   0.00 │     7.59 │           578.72 │     1.00 │
│     CPUx01 │             master │ 1.11.4 │   Float64 │      172527 │   0.00 │    14.01 │          1069.16 │     0.54 │
│     CPUx04 │ add-nonuniform-BCs │ 1.11.4 │   Float64 │     4803475 │   0.67 │     7.56 │           576.85 │     1.00 │
│     CPUx04 │             master │ 1.11.4 │   Float64 │     4804775 │   0.41 │    13.04 │           994.86 │     0.58 │
│ GPU-NVIDIA │ add-nonuniform-BCs │ 1.11.4 │   Float64 │     7126426 │   1.61 │     1.18 │            90.11 │     6.42 │
│ GPU-NVIDIA │             master │ 1.11.4 │   Float64 │     6845780 │   1.88 │     1.30 │            99.27 │     5.83 │
└────────────┴────────────────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────────────┴──────────┘
▶ log2p = 6
┌────────────┬────────────────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────────────┬──────────┐
│  Backend   │     WaterLily      │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Cost [ns/DOF/dt] │ Speed-up │
├────────────┼────────────────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────────────┼──────────┤
│     CPUx01 │ add-nonuniform-BCs │ 1.11.4 │   Float64 │      224141 │   0.00 │    42.21 │           402.55 │     1.00 │
│     CPUx01 │             master │ 1.11.4 │   Float64 │      223641 │   0.00 │    89.73 │           855.70 │     0.47 │
│     CPUx04 │ add-nonuniform-BCs │ 1.11.4 │   Float64 │     6309610 │   0.11 │    41.80 │           398.62 │     1.01 │
│     CPUx04 │             master │ 1.11.4 │   Float64 │     6313610 │   0.11 │    59.71 │           569.47 │     0.71 │
│ GPU-NVIDIA │ add-nonuniform-BCs │ 1.11.4 │   Float64 │     9727169 │   0.41 │     4.78 │            45.58 │     8.83 │
│ GPU-NVIDIA │             master │ 1.11.4 │   Float64 │     9415538 │   0.54 │     5.22 │            49.79 │     8.09 │
└────────────┴────────────────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────────────┴──────────┘

weymouth · 2025-04-02T12:21:00Z

Why in the world is the PR twice as fast for the Jelly? There should be no changes for that test right?

And a 50% slow down for the TGV is a problem. Is that the cost of applying (and then ignoring) the domain BC function?

b-fg · 2025-04-02T12:29:58Z

I'd focus on the GPU timings, these are typically more consistent, and the difference is not much here. I can also run them to double-check.

Also, I see you run the TGV in Float32 and the Jelly in Float64. Better run both cases with both precisions to see what's going on.

marinlauber · 2025-04-02T12:47:01Z

Yeah, they are wired. I can also try to run the benchmark on Nefertem of DelftBlue; it might be a fairer comparison, I ran those on my workstation but you never know what process are in the background as well.

@b-fg I use the first example command line for the benchmark, do you have a "better" one?

b-fg · 2025-04-02T12:57:35Z

So that I run the same benchmark as you, what exact function for TGV are you using now for this PR? (the WaterLily-Benchamarks TGV case only works for master).

marinlauber · 2025-04-02T13:00:33Z

I've modified the original one to this

function tgv(p, backend; Re=1600, T=Float32)
    L = 2^p; U = T(1); κ=T(π/L); ν = T(1/(κ*Re))
    function Uλ(i,xyz,t)
        x,y,z = @. xyz*κ
        i==1 && return -U*sin(x)*cos(y)*cos(z)
        i==2 && return  U*cos(x)*sin(y)*cos(z)
        return 0*U
    end
    Simulation((L, L, L), Uλ, 1/κ; U=U, ν=ν, T=T, mem=backend)
end

And I was wondering if this case should not be triple-periodic?

b-fg · 2025-04-02T13:06:15Z

return 0*U smart haha.

b-fg · 2025-04-02T13:25:21Z

This is what I got

Details

Benchmark environment: tgv sim_step! (max_steps=100)
▶ log2p = 6
┌────────────┬────────────────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────────────┬──────────┐
│  Backend   │     WaterLily      │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Cost [ns/DOF/dt] │ Speed-up │
├────────────┼────────────────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────────────┼──────────┤
│     CPUx01 │ add-nonuniform-BCs │ 1.11.3 │   Float32 │       80936 │   0.00 │     6.93 │           264.33 │     1.00 │
│     CPUx01 │             master │ 1.11.3 │   Float32 │       80021 │   0.00 │     4.23 │           161.24 │     1.64 │
│     CPUx04 │ add-nonuniform-BCs │ 1.11.3 │   Float32 │     2343862 │   0.00 │     4.30 │           163.89 │     1.61 │
│     CPUx04 │             master │ 1.11.3 │   Float32 │     2315093 │   0.00 │     3.21 │           122.45 │     2.16 │
│ GPU-NVIDIA │ add-nonuniform-BCs │ 1.11.3 │   Float32 │     2994144 │   0.00 │     0.71 │            27.00 │     9.79 │
│ GPU-NVIDIA │             master │ 1.11.3 │   Float32 │     2828362 │   0.00 │     0.61 │            23.10 │    11.44 │
└────────────┴────────────────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────────────┴──────────┘
▶ log2p = 7
┌────────────┬────────────────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────────────┬──────────┐
│  Backend   │     WaterLily      │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Cost [ns/DOF/dt] │ Speed-up │
├────────────┼────────────────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────────────┼──────────┤
│     CPUx01 │ add-nonuniform-BCs │ 1.11.3 │   Float32 │       75708 │   0.00 │    44.25 │           211.00 │     1.00 │
│     CPUx01 │             master │ 1.11.3 │   Float32 │       74816 │   0.00 │    26.27 │           125.28 │     1.68 │
│     CPUx04 │ add-nonuniform-BCs │ 1.11.3 │   Float32 │     2181049 │   0.00 │    26.16 │           124.72 │     1.69 │
│     CPUx04 │             master │ 1.11.3 │   Float32 │     2153203 │   0.00 │    19.98 │            95.29 │     2.21 │
│ GPU-NVIDIA │ add-nonuniform-BCs │ 1.11.3 │   Float32 │     2738125 │   0.00 │     3.68 │            17.54 │    12.03 │
│ GPU-NVIDIA │             master │ 1.11.3 │   Float32 │     2573948 │   0.00 │     3.15 │            15.04 │    14.03 │
└────────────┴────────────────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────────────┴──────────┘
Figure stored in /home/b-fg/Workspace/tudelft1/WaterLily-Benchmarks/plots/tgv_cost_add-nonuniform-BCs_master_1.11.3_Float32.pdf
Figure stored in /home/b-fg/Workspace/tudelft1/WaterLily-Benchmarks/plots/tgv_benchmark_add-nonuniform-BCs_master_1.11.3_Float32.pdf
Benchmark environment: jelly sim_step! (max_steps=100)
▶ log2p = 5
┌────────────┬────────────────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────────────┬──────────┐
│  Backend   │     WaterLily      │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Cost [ns/DOF/dt] │ Speed-up │
├────────────┼────────────────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────────────┼──────────┤
│     CPUx01 │ add-nonuniform-BCs │ 1.11.3 │   Float32 │      161455 │   0.00 │     3.80 │           289.55 │     1.00 │
│     CPUx01 │             master │ 1.11.3 │   Float32 │      160955 │   0.00 │     3.79 │           289.13 │     1.00 │
│     CPUx04 │ add-nonuniform-BCs │ 1.11.3 │   Float32 │     4456091 │   0.51 │     3.82 │           291.56 │     0.99 │
│     CPUx04 │             master │ 1.11.3 │   Float32 │     4457391 │   0.44 │     4.17 │           318.20 │     0.91 │
│ GPU-NVIDIA │ add-nonuniform-BCs │ 1.11.3 │   Float32 │     5697955 │   0.00 │     1.03 │            78.77 │     3.68 │
│ GPU-NVIDIA │             master │ 1.11.3 │   Float32 │     5410254 │   0.00 │     1.04 │            79.28 │     3.65 │
└────────────┴────────────────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────────────┴──────────┘
▶ log2p = 6
┌────────────┬────────────────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────────────┬──────────┐
│  Backend   │     WaterLily      │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Cost [ns/DOF/dt] │ Speed-up │
├────────────┼────────────────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────────────┼──────────┤
│     CPUx01 │ add-nonuniform-BCs │ 1.11.3 │   Float32 │      208466 │   0.00 │    27.52 │           262.47 │     1.00 │
│     CPUx01 │             master │ 1.11.3 │   Float32 │      207966 │   0.00 │    26.87 │           256.21 │     1.02 │
│     CPUx04 │ add-nonuniform-BCs │ 1.11.3 │   Float32 │     5839681 │   0.13 │    21.00 │           200.27 │     1.31 │
│     CPUx04 │             master │ 1.11.3 │   Float32 │     5843681 │   0.12 │    21.32 │           203.30 │     1.29 │
│ GPU-NVIDIA │ add-nonuniform-BCs │ 1.11.3 │   Float32 │     7679971 │   0.40 │     3.91 │            37.29 │     7.04 │
│ GPU-NVIDIA │             master │ 1.11.3 │   Float32 │     7369919 │   0.38 │     3.97 │            37.83 │     6.94 │
└────────────┴────────────────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────────────┴──────────┘
Figure stored in /home/b-fg/Workspace/tudelft1/WaterLily-Benchmarks/plots/jelly_cost_add-nonuniform-BCs_master_1.11.3_Float32.pdf
Figure stored in /home/b-fg/Workspace/tudelft1/WaterLily-Benchmarks/plots/jelly_benchmark_add-nonuniform-BCs_master_1.11.3_Float32.pdf

Looking at the GPU results, there is indeed something like 10-15% slowdown on TGV (and indeed severe slowdown on CPU). Is it because the Uλ function is now used to set the BCs at every time step? Of course evaluating that function (also used for setting initial condition) is more expensive that just getting a value from a Tuple (ie. U=(0,0,0)), as we have currently in master for this test case.

weymouth · 2025-04-03T18:22:18Z

It has to either be this or the evaluation of the acceleration term, right? Those are the only two changes.

b-fg · 2025-04-13T17:46:27Z

I am working on a hacky way to fix the performance issues using BC buffers computed during pre-processing. I will push the changes and if, you like this, we can clean it up together. This won't fix the (unavoidable) cost of space-time varying BC functions, but it should behave like master otherwise.

b-fg · 2025-04-13T22:48:46Z

Doing some tests with the pre-processing of BCs, I get the same behaviour as we are observing. Complex Uλ functions yield severe slowdown on serial CPU (mutli-threaded CPU or GPU does not care much since we work with D-1 slices). Here are some results (full MWE in details):

T = Float32
mem = CuArray
a = rand(T,150,100,50,3) |> mem
N,n = size_u(a)
normal_buffers, tangential_buffers = get_buffers(a, Uλ; T, mem)

b, c = copy(a), copy(a)
apply_BCs!(b, Uλ)
apply_BCs!(c, normal_buffers, tangential_buffers)
@assert isapprox(b, c, atol=1e-5)

@btime apply_BCs!($b, $Uλ) # CPU -t 1: 3.571 ms (54 allocations: 864 bytes) | GPU: 84.264 μs (936 allocations: 34.69 KiB)
@btime apply_BCs!($c, $normal_buffers, $tangential_buffers) # CPU -t 1: 184.782 μs (84 allocations: 1.88 KiB) | GPU: 102.383 μs (1197 allocations: 43.55 KiB)

Details

using Revise, WaterLily, CUDA, StaticArrays, BenchmarkTools

@inline CI(a...) = CartesianIndex(a...)
CIj(j,I::CartesianIndex{d},k) where d = CI(ntuple(i -> i==j ? k : I[i], d))
δ(i,::Val{N}) where N = CI(ntuple(j -> j==i ? 1 : 0, N))
δ(i,I::CartesianIndex{N}) where N = δ(i, Val{N}())
function slice(dims::NTuple{N},i,j,low=1) where N
    CartesianIndices(ntuple( k-> k==j ? (i:i) : (low:dims[k]), N))
end
@inline loc(i,I::CartesianIndex{N},T=Float32) where N = SVector{N,T}(I.I .- 1.5 .- 0.5 .* δ(i,I).I)
@inline loc(Ii::CartesianIndex,T=Float32) = loc(last(Ii),Base.front(Ii),T)
splitn(n) = Base.front(n),last(n)
size_u(u) = splitn(size(u))

function Uλ(i,xyz,t)
    x,y,z = @. xyz
    i==1 && return -sin(x)*cos(y)*cos(z)
    i==2 && return  cos(x)*sin(y)*cos(z)
    return 0
end

function apply_BCs!(a, u_BC::Function)
    N,n = size_u(a)
    for i ∈ 1:n, j ∈ 1:n
        if i==j # Normal direction, Dirichlet
            for s ∈ (1,2)
                WaterLily.@loop a[I,i] = u_BC(i,loc(i,I),0) over I ∈ slice(N,s,j)
            end
            # if i>1
                WaterLily.@loop a[I,i] = u_BC(i,loc(i,I),0) over I ∈ slice(N,N[j],j)
            # end
        else    # Tangential directions, Neumann
            WaterLily.@loop a[I,i] = u_BC(i,loc(i,I),0)+a[I+δ(j,I),i]-u_BC(i,loc(i,I+δ(j,I)),0) over I ∈ slice(N,1,j)
            WaterLily.@loop a[I,i] = u_BC(i,loc(i,I),0)+a[I-δ(j,I),i]-u_BC(i,loc(i,I-δ(j,I)),0) over I ∈ slice(N,N[j],j)
        end
    end
end

function apply_BCs!(a, normal_buffers, tangential_buffers)
    N,n = size_u(a)
    for i ∈ 1:n, j ∈ 1:n
        if i==j # Normal direction, Dirichlet
            for s ∈ (1,2)
                b = normal_buffers[i][s]
                WaterLily.@loop a[I,i] = b[I-δ(j,I)*(s-1)] over I ∈ slice(N,s,j)
            end
            # if i>1
                b = normal_buffers[i][3]
                WaterLily.@loop a[I,i] = b[I-δ(i,I)*(N[i]-1)] over I ∈ slice(N,N[j],j)
            # end
        else    # Tangential directions, Neumann
            b = tangential_buffers[i,j][1]
            WaterLily.@loop a[I,i] = a[I+δ(j,I),i] + b[I] over I ∈ slice(N,1,j)
            b = tangential_buffers[i,j][2]
            WaterLily.@loop a[I,i] = a[I-δ(j,I),i] + b[I-δ(j,I)*(N[j]-1)] over I ∈ slice(N,N[j],j)
        end
    end
end

function get_buffers(a, u_BC; T=Float32, mem=Array)
    N,n = size_u(a)
    normal_buffers = collect(mem{T,n}[] for i ∈ 1:n)
    tangential_buffers = collect(mem{T,n}[] for i ∈ 1:n, j ∈ 1:n)
    for i ∈ 1:n, j ∈ 1:n
        b = zeros(eltype(a), size(slice(N,1,j))...) |> mem
        if i == j
            for s ∈ (1,2)
                WaterLily.@loop b[I-δ(j,I)*(s-1)] = u_BC(i,loc(i,I),0) over I ∈ slice(N,s,j)
                push!(normal_buffers[i], copy(b))

            end
            WaterLily.@loop b[I-δ(j,I)*(N[j]-1)] = u_BC(i,loc(i,I),0) over I ∈ slice(N,N[j],j)
            push!(normal_buffers[i], copy(b))
        else
            WaterLily.@loop b[I] = u_BC(i,loc(i,I),0)-u_BC(i,loc(i,I+δ(j,I)),0) over I ∈ slice(N,1,j)
            push!(tangential_buffers[i,j], copy(b))
            WaterLily.@loop b[I-δ(j,I)*(N[j]-1)] = u_BC(i,loc(i,I),0)-u_BC(i,loc(i,I-δ(j,I)),0) over I ∈ slice(N,N[j],j)
            push!(tangential_buffers[i,j], copy(b))
        end
    end
    return normal_buffers, tangential_buffers
end

T = Float32
mem = CuArray
a = rand(T,150,100,50,3) |> mem
N,n = size_u(a)
normal_buffers, tangential_buffers = get_buffers(a, Uλ; T, mem)

b, c = copy(a), copy(a)
apply_BCs!(b, Uλ)
apply_BCs!(c, normal_buffers, tangential_buffers)
@assert isapprox(b, c, atol=1e-5)

@btime apply_BCs!($b, $Uλ) # CPU -t 1: 3.571 ms (54 allocations: 864 bytes) | GPU: 84.264 μs (936 allocations: 34.69 KiB)
@btime apply_BCs!($c, $normal_buffers, $tangential_buffers) # CPU -t 1: 184.782 μs (84 allocations: 1.88 KiB) | GPU: 102.383 μs (1197 allocations: 43.55 KiB)

marinlauber · 2025-04-14T06:21:47Z

I think, regardless of how well the BC kernel is launched, we will never be able to get the same evaluation time for Tuple or Function. Should we type-dispatch to two different functions to keep the fast runtime for Tuple?

marinlauber · 2025-04-14T06:25:03Z

What I was thinking is to steal the @vecloop from BiotSavartBCs and try to launch one/two kernel per BC calls, one for the normal and one for the tangential component.

b-fg · 2025-04-14T08:22:21Z

I like the @vecloop approach. I was also already dispatching by Tuple of Function (not shown in the MWE though). So I will try to push these changes and you maybe can try to add the @vecloop?

weymouth · 2025-04-14T10:23:24Z

FYI: I couldn't get vecloop to be nearly as fast as loop when doing the same task. Maybe I was making some kind of mistake...

…

On Mon, Apr 14, 2025, 10:22 Bernat Font ***@***.***> wrote: I like the @vecloop approach. I was also already dispatching by Tuple of Function (not shown in the MWE though). So I will try to push these changes and you maybe can try to add the @vecloop? — Reply to this email directly, view it on GitHub <#165 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AADSKJ7KMFA4BGJ2SHA5SWT2ZNV5FAVCNFSM6AAAAABOFAJCZOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQMBQHA2TAMJVHE> . You are receiving this because you were mentioned.Message ID: ***@***.***> *b-fg* left a comment (WaterLily-jl/WaterLily.jl#165) <#165 (comment)> I like the @vecloop approach. I was also already dispatching by Tuple of Function (not shown in the MWE though). So I will try to push these changes and you maybe can try to add the @vecloop? — Reply to this email directly, view it on GitHub <#165 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AADSKJ7KMFA4BGJ2SHA5SWT2ZNV5FAVCNFSM6AAAAABOFAJCZOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQMBQHA2TAMJVHE> . You are receiving this because you were mentioned.Message ID: ***@***.***>

b-fg · 2025-04-14T15:51:20Z

I have implemented the BC buffers computed during pre-processing. All tests locally passing on CPU and GPU, and I have added an additional extra test in @testset "util.jl" as well. Then, when I was checking performance, I saw that we do not improve much anyways (master is still much faster). So for the TGV case, I set up all the BC to be periodic, so that the whole BC shenanigans are bypassed in both master and current PR. Then I saw that master is still ~30% faster on serial CPU. This indicated that there is something else going on, right? Maybe is the new dispatch? On the other hand, the performance on the jelly test case is fine..

Below is the benchmark results for of TGV periodic for serial CPU, where 01d9a24 is the current state of this PR, add-nonuniform-BCs is the latest BC buffer implementation (still not committed here), and then master.

Benchmark environment: tgv sim_step! (max_steps=100)
▶ log2p = 7
┌─────────┬────────────────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────────────┬──────────┐
│ Backend │     WaterLily      │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Cost [ns/DOF/dt] │ Speed-up │
├─────────┼────────────────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────────────┼──────────┤
│  CPUx01 │            01d9a24 │ 1.11.3 │   Float32 │      410697 │   0.00 │    67.85 │           323.52 │     1.00 │
│  CPUx01 │ add-nonuniform-BCs │ 1.11.3 │   Float32 │      410697 │   0.00 │    68.93 │           328.68 │     0.98 │
│  CPUx01 │             master │ 1.11.3 │   Float32 │      406772 │   0.00 │    51.55 │           245.80 │     1.32 │
└─────────┴────────────────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────────────┴──────────┘

b-fg · 2025-04-14T16:26:38Z

I think I know what's going on... With the current setup, accelerate is being called when running the TGV since

accelerate!(r,t,::Nothing,U::Function) = accelerate!(r,t,(i,x,t)->ForwardDiff.derivative(τ->U(i,x,τ),t))

where accelerate!(r,t,g,U)
-.-

b-fg · 2025-04-14T16:35:00Z

How should we handle this? Currently accelerate uses both g and Uλ:

accelerate!(r,t,::Nothing,::Union{Nothing,Tuple}) = nothing
accelerate!(r,t,f::Function) = @loop r[Ii] += f(last(Ii),loc(Ii,eltype(r)),t) over Ii ∈ CartesianIndices(r)
accelerate!(r,t,g::Function,::Union{Nothing,Tuple}) = accelerate!(r,t,g)
accelerate!(r,t,::Nothing,U::Function) = accelerate!(r,t,(i,x,t)->ForwardDiff.derivative(τ->U(i,x,τ),t))
accelerate!(r,t,g::Function,U::Function) = accelerate!(r,t,(i,x,t)->g(i,x,t)+ForwardDiff.derivative(τ->U(i,x,τ),t))

I am not sure what combination we should actually support. Let me know your thoughts.

b-fg · 2025-04-14T17:06:45Z

(Also we need to think why this was not caught by the tests).

weymouth · 2025-04-15T05:27:50Z

The acceleration is required when the user supplies a background velocity since it implies an accelerating reference frame. That background velocity should also be used to set the initial velocity. We have test to make sure this happens...

However, the TGV is just an initial condition, not a background velocity. More logic is needed in the input conditions, I guess...

b-fg · 2025-04-15T08:05:08Z

I think we are trying to squeeze to much stuff in Uλ: IC, BC, accelerating frame. I think it makes sense to have a different argument for IC (as we had before), since BC and accelerating frame need to be in sync anyways.

b-fg · 2025-04-15T09:58:12Z

I am thinking to introduce this (breaking) change:

function Simulation(dims::NTuple{N}, U₀, Uλ;
                    L=1, U=nothing, Δt=0.25, ν=0., g=nothing, ϵ=1, perdir=(),
                    exitBC=false, body::AbstractBody=NoBody(),
                    T=Float32, mem=Array) where N
    @assert !(isnothing(U) && isa(Uλ,Function)) "`U` (velocity scale) must be specified if `Uλ` is a `Function`"
    isnothing(U) && (U = √sum(abs2,Uλ))
    check_fn(g,N,T); check_fn(Uλ,N,T)
    flow = Flow(dims,U₀,Uλ;Δt,ν,g,T,f=mem,perdir,exitBC)
    measure!(flow,body;ϵ)
    new(U,L,ϵ,flow,body,MultiLevelPoisson(flow.p,flow.μ₀,flow.σ;perdir))
end

So when creating a Simulation users always have to specify the initial condition (U₀) and the boundary conditions (Uλ) (seems reasonable from a physics standpoint). If Uλ::Function, then this will also used during accelerate! (even when g::Nothing), to apply the background velocity.

Alternatively, we could also have Uλ as a keyword argument and use U₀ to define the BC if Uλ is not defined.

If you like this I can update the code base and the tests. WaterLily-Examples should also be updated. So it would be reasonable to bump WaterLily to 1.4 as well. Let me know!

marinlauber · 2025-04-15T10:33:18Z

But this is kind of what we had before, u_BC was always specified, Function or Tuple, and the user could decide to use uλ or not to initialise the fields.

b-fg · 2025-04-15T10:49:45Z

Yes, and I am not sure how to do this without using another argument in Simulation(...). Why was this removed in first place? Was it causing a conflict or just trying to simplify it for the user?

…leaned up across Simulation and Flow. All tests passing locally on CPU and GPU.

b-fg · 2025-04-16T18:16:32Z

After a short conversation with Gabe, I have re-introduce the different arguments uBC (boundary conditions and non-uniform background flow) and uλ (initial condition) when creating a Simulation. uBC can be a Tuple or a Function of (i,x,t), and uλ can be a Tuple or a Function of (i,x). If uλ is not specified, it defaults to uBC(i,x,0), or just uBC::Tuple.

I took the chance to homogenize some of these arguments across the different objects (Simulation, Flow). All test are currently passing for CPU and GPU. Benchmarks are shown in details, and the performance is the same as master for TGV and Jelly cases. So this PR should be pretty much done.

I might make some performance tests to check the impact of non-uniform BCs and see if it using BC buffers, when uBC is only a function of space, is worth it or not. But that will be a separate PR, if any.

Details

Benchmark environment: tgv sim_step! (max_steps=100)
▶ log2p = 6
┌────────────┬────────────────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────────────┬──────────┐
│  Backend   │     WaterLily      │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Cost [ns/DOF/dt] │ Speed-up │
├────────────┼────────────────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────────────┼──────────┤
│     CPUx01 │ add-nonuniform-BCs │ 1.11.3 │   Float32 │       80521 │   0.00 │     4.27 │           162.84 │     1.00 │
│     CPUx01 │             master │ 1.11.3 │   Float32 │       80021 │   0.00 │     4.27 │           163.07 │     1.00 │
│     CPUx04 │ add-nonuniform-BCs │ 1.11.3 │   Float32 │     2329993 │   0.00 │     3.18 │           121.38 │     1.34 │
│     CPUx04 │             master │ 1.11.3 │   Float32 │     2315093 │   0.00 │     3.20 │           122.19 │     1.33 │
│ GPU-NVIDIA │ add-nonuniform-BCs │ 1.11.3 │   Float32 │     2978759 │   0.00 │     0.60 │            22.88 │     7.12 │
│ GPU-NVIDIA │             master │ 1.11.3 │   Float32 │     2828263 │   0.00 │     0.57 │            21.90 │     7.43 │
└────────────┴────────────────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────────────┴──────────┘
▶ log2p = 7
┌────────────┬────────────────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────────────┬──────────┐
│  Backend   │     WaterLily      │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Cost [ns/DOF/dt] │ Speed-up │
├────────────┼────────────────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────────────┼──────────┤
│     CPUx01 │ add-nonuniform-BCs │ 1.11.3 │   Float32 │       75316 │   0.00 │    26.66 │           127.10 │     1.00 │
│     CPUx01 │             master │ 1.11.3 │   Float32 │       74816 │   0.00 │    26.46 │           126.15 │     1.01 │
│     CPUx04 │ add-nonuniform-BCs │ 1.11.3 │   Float32 │     2168103 │   0.00 │    18.56 │            88.48 │     1.44 │
│     CPUx04 │             master │ 1.11.3 │   Float32 │     2153203 │   0.00 │    18.81 │            89.68 │     1.42 │
│ GPU-NVIDIA │ add-nonuniform-BCs │ 1.11.3 │   Float32 │     2724354 │   0.00 │     3.12 │            14.86 │     8.55 │
│ GPU-NVIDIA │             master │ 1.11.3 │   Float32 │     2573717 │   0.00 │     3.15 │            15.02 │     8.46 │
└────────────┴────────────────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────────────┴──────────┘
Benchmark environment: jelly sim_step! (max_steps=100)
▶ log2p = 5
┌────────────┬────────────────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────────────┬──────────┐
│  Backend   │     WaterLily      │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Cost [ns/DOF/dt] │ Speed-up │
├────────────┼────────────────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────────────┼──────────┤
│     CPUx01 │ add-nonuniform-BCs │ 1.11.3 │   Float32 │       75436 │   0.00 │     2.06 │           156.96 │     1.00 │
│     CPUx01 │             master │ 1.11.3 │   Float32 │       74936 │   0.00 │     2.10 │           160.01 │     0.98 │
│     CPUx04 │ add-nonuniform-BCs │ 1.11.3 │   Float32 │     2173964 │   0.00 │     2.20 │           167.82 │     0.94 │
│     CPUx04 │             master │ 1.11.3 │   Float32 │     2159064 │   0.00 │     2.19 │           167.45 │     0.94 │
│ GPU-NVIDIA │ add-nonuniform-BCs │ 1.11.3 │   Float32 │     2796042 │   0.00 │     0.50 │            38.13 │     4.12 │
│ GPU-NVIDIA │             master │ 1.11.3 │   Float32 │     2645540 │   0.00 │     0.50 │            38.07 │     4.12 │
└────────────┴────────────────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────────────┴──────────┘
▶ log2p = 6
┌────────────┬────────────────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────────────┬──────────┐
│  Backend   │     WaterLily      │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Cost [ns/DOF/dt] │ Speed-up │
├────────────┼────────────────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────────────┼──────────┤
│     CPUx01 │ add-nonuniform-BCs │ 1.11.3 │   Float32 │       85065 │   0.00 │    15.01 │           143.18 │     1.00 │
│     CPUx01 │             master │ 1.11.3 │   Float32 │       84565 │   0.00 │    15.29 │           145.83 │     0.98 │
│     CPUx04 │ add-nonuniform-BCs │ 1.11.3 │   Float32 │     2474753 │   0.00 │    12.64 │           120.51 │     1.19 │
│     CPUx04 │             master │ 1.11.3 │   Float32 │     2459853 │   0.00 │    12.32 │           117.54 │     1.22 │
│ GPU-NVIDIA │ add-nonuniform-BCs │ 1.11.3 │   Float32 │     3216477 │   0.00 │     1.95 │            18.64 │     7.68 │
│ GPU-NVIDIA │             master │ 1.11.3 │   Float32 │     3066785 │   0.00 │     1.97 │            18.79 │     7.62 │
└────────────┴────────────────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────────────┴──────────┘

weymouth · 2025-04-19T18:53:09Z

Sounds good to me

marinlauber and others added 5 commits September 6, 2024 16:07

initial commit

111c8f7

correct inflow mass flux

81724f0

fix accelerate!

31086c2

fix test and accelerate!

e97d012

fix maintest error

2766de4

marinlauber mentioned this pull request Dec 2, 2024

Custom body force #174

Closed

marinlauber added 2 commits February 27, 2025 15:04

fix allocation issues, merge body_force! and update test

d451f1c

tidy interface and update test

633b6d9

marinlauber marked this pull request as ready for review February 27, 2025 14:50

merge master and start fixing stuff

cde7299

try solve compatibility issue

ab62bbc

fix Simulation constructor

05336b5

fix BC time and constructor

3653bbf

b-fg added 4 commits March 4, 2025 13:31

Small adjustements on how to create u_BC and uλ in Simulation constru…

5226d1a

…ctor.

Reverted from zero(T) to 0 in defining uλ cause of GPU nonbits error

247c1a8

u_BC everywhere (instead of mixed with uBC) and same default Float32 …

f4f4697

…type for Simulation and Flow (Flow was Float64 previously).

Fixed body_force tests. Remember to always runs all the tests...

24aa6b7

Reverted behaviour to uBC and uλ for BCs and IC, respectively. Also c…

399061a

…leaned up across Simulation and Flow. All tests passing locally on CPU and GPU.

weymouth merged commit 6bb77c2 into WaterLily-jl:master Apr 19, 2025
12 checks passed

Conversation

marinlauber commented Sep 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marinlauber commented Feb 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marinlauber commented Feb 27, 2025

Uh oh!

codecov bot commented Feb 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

b-fg commented Mar 3, 2025

Uh oh!

marinlauber commented Mar 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marinlauber commented Mar 3, 2025

Uh oh!

b-fg commented Mar 3, 2025

Uh oh!

b-fg commented Mar 3, 2025

Uh oh!

marinlauber commented Mar 4, 2025

Uh oh!

b-fg commented Mar 4, 2025

Uh oh!

b-fg commented Mar 4, 2025

Uh oh!

marinlauber commented Mar 4, 2025

Uh oh!

weymouth commented Apr 1, 2025 via email

Uh oh!

marinlauber commented Apr 1, 2025

Uh oh!

marinlauber commented Apr 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

weymouth commented Apr 2, 2025

Uh oh!

b-fg commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marinlauber commented Apr 2, 2025

Uh oh!

b-fg commented Apr 2, 2025

Uh oh!

marinlauber commented Apr 2, 2025

Uh oh!

b-fg commented Apr 2, 2025

Uh oh!

b-fg commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

weymouth commented Apr 3, 2025

Uh oh!

b-fg commented Apr 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

b-fg commented Apr 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marinlauber commented Apr 14, 2025

Uh oh!

marinlauber commented Apr 14, 2025

Uh oh!

b-fg commented Apr 14, 2025

Uh oh!

weymouth commented Apr 14, 2025 via email

Uh oh!

b-fg commented Apr 14, 2025

Uh oh!

b-fg commented Apr 14, 2025

Uh oh!

b-fg commented Apr 14, 2025

Uh oh!

b-fg commented Apr 14, 2025

Uh oh!

weymouth commented Apr 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

marinlauber commented Sep 13, 2024 •

edited

Loading

marinlauber commented Feb 27, 2025 •

edited

Loading

codecov bot commented Feb 27, 2025 •

edited

Loading

marinlauber commented Mar 3, 2025 •

edited

Loading

marinlauber commented Apr 1, 2025 •

edited

Loading

b-fg commented Apr 2, 2025 •

edited

Loading

b-fg commented Apr 2, 2025 •

edited

Loading

b-fg commented Apr 13, 2025 •

edited

Loading

b-fg commented Apr 13, 2025 •

edited

Loading

weymouth commented Apr 15, 2025 •

edited

Loading

b-fg commented Apr 15, 2025 •

edited

Loading

b-fg commented Apr 16, 2025 •

edited

Loading