Non-uniform boundary conditions#165
Conversation
|
From some basic tests. The issue was the definition of the function for the non-uniform velocity BC (it was type unstable, which allocated a lot). The only real difference in the allocation is in using WaterLily
using StaticArrays
using ForwardDiff
import WaterLily: accelerate!,@loop,size_u,slice
using BenchmarkTools
# classical accelerate!
# accelerate!(r,dt,g::Function,::Tuple,t=sum(dt)) = for i ∈ 1:last(size(r))
# r[..,i] .+= g(i,t)
# end
accelerate_loop!(r,dt,g::Function,::Tuple,t=sum(dt)) = for i ∈ 1:last(size(r))
@loop r[I,i] += g(i,t) over I ∈ CartesianIndices(Base.front(size(r)))
end
new_accelerate!(r,g::Function,t) = for i ∈ 1:last(size(r))
@loop r[I,i] += g(i,loc(i,I,eltype(r)),t) over I ∈ CartesianIndices(Base.front(size(r)))
end
# some array to test this
N = 2^7
u = zeros(Float32,N,N,N,3);
# utils
fun(i,t)=sin(2π*t)*cos(2π*t)*cos(2π*t)
fun2(i,x,t) = sin(2π*t)*cos(2π*t)*cos(2π*t)
fun3(i,x,t) = sin(2π*x[1])*cos(2π*x[2])*cos(2π*x[3])
tu=(); ts=[1.0]; t=0
# old method
@btime accelerate!($u,$ts,$fun,$tu) # 5.265 ms (9 allocations: 24.00 MiB)
# same but loop
@btime accelerate_loop!($u,$ts,$fun,$tu) # 9.490 ms (849 allocations: 67.02 KiB)
# new method
@btime new_accelerate!($u,$fun2,$t) # 9.516 ms (849 allocations: 66.94 KiB)
@btime new_accelerate!($u,$fun3,$t) # 10.829 ms (849 allocations: 66.94 KiB)
# the new BC function
new_BC!(a,U,saveexit=false,perdir=(),t=0) = new_BC!(a,(i,x,t)->U[i],saveexit,perdir,t)
function new_BC!(a,uBC::Function,saveexit=false,perdir=(),t=0)
N,n = size_u(a)
for i ∈ 1:n, j ∈ 1:n
if j in perdir
@loop a[I,i] = a[CIj(j,I,N[j]-1),i] over I ∈ slice(N,1,j)
@loop a[I,i] = a[CIj(j,I,2),i] over I ∈ slice(N,N[j],j)
else
if i==j # Normal direction, Dirichlet
for s ∈ (1,2)
@loop a[I,i] = uBC(i,loc(i,I),t) over I ∈ slice(N,s,j)
end
(!saveexit || i>1) && (@loop a[I,i] = uBC(i,loc(i,I),t) over I ∈ slice(N,N[j],j)) # overwrite exit
else # Tangential directions, Neumann
@loop a[I,i] = a[I+δ(j,I),i] over I ∈ slice(N,1,j)
@loop a[I,i] = a[I-δ(j,I),i] over I ∈ slice(N,N[j],j)
end
end
end
end
# very simple BCs
U = SA[1.0,0.,0.]
Ubc(i,I,t) = i==1 ? 1 : 0
# utils
x = SA[10.0,10.0,10.0]; t = 0.0; conv_exit=false
tu = (); i = 1
# test the bcs
@btime BC!($u,$U,$conv_exit,$tu) # 1.025 ms μs (5905 allocations: 477.30 KiB)
# new function with an array
@btime new_BC!($u,$U,$conv_exit,$tu,$t) # 1.038 ms (5905 allocations: 477.30 KiB)
# new function with a function
@btime new_BC!($u,$Ubc,$conv_exit,$tu,$t) # 770.551 μs (5878 allocations: 468.30 KiB)
@btime Ubc($i,$x,$t) # 1.031 ns (0 allocations: 0 bytes)
# more complex function
function u_pipe(i,x::SVector{3,T},t::T) where T
i ≠ 1 && return zero(T)
r = √sum(abs2,SA[x[2],x[3]].-64.0)
return 2r>125 ? zero(T) : convert(T,1-r^2/16384)
end
tf32 =0.f0
@btime new_BC!($u,$u_pipe,$conv_exit,$tu,$tf32) # 780.826 μs (5905 allocations: 468.72 KiB)
@btime u_pipe($i,$x,$t) # 2.512 ns (0 allocations: 0 bytes) |
Codecov ReportAttention: Patch coverage is
🚀 New features to boost your workflow:
|
|
Just had a quick look and I like it! Also, I guess this PR solves #174 too, right? Happy to close that one if so. Also I like this |
|
Yes, this fixes #174 (the body_force part of it). There is still one part of the test failing, which is related to constructing the simulation with a Additionally, I am not sure if we want to keep the possibility of constructing the sim with three different velocity BC, |
That was the idea: simple body force can be implemented as g(i,x,t), but more complex like turbulence models need to access |
|
The only problem now is that passing a time varying body force which needs to make use of instantaneous flow field data (ie velocity) into |
|
I have debugged your test error. Actually, the tests already fail on GPU for the |
|
@b-fg I didn't like this aliasing anyway, I'll try to find a better way to accommodate the backward compatibility. Do you have any ideas on how to do that? |
|
Ha, I was working on a fix too. Are your tests ok on the GPU now? |
…type for Simulation and Flow (Flow was Float64 previously).
…m_step with arbitrary keyword arguments. The function can modify the ::Flow object and is called both in the predictor and corrector steps. Added tests for its implementation of an increasing body force, but using udf instead of body_force. All tests passing on Array and CuArray.
|
The latest commit implements a general user-defined function (UDF) that is called both in predictor and corrector steps. This gives users the flexibility to implement any type of instantaneous and local function in their own scripts, with arbitrary keyword arguments, and pass it into the solver. I do not love the fact that all |
|
I think body_force can go, because if it's only an |
|
I think it's fine like this. I like that it's consistent with the velocity
function even if it's typically slightly overkill.
We need to benchmark the performance though. I'm a little worried about
that...
…On Mon, Mar 31, 2025, 10:11 Marin Lauber ***@***.***> wrote:
It looks nice like this! So, maybe the last thing is whether we should
keep g(i,x,t) or revert to g(i,t) since now the udf can be used for this
complex space-varying forcing.
—
Reply to this email directly, view it on GitHub
<#165 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADSKJ2NJL6NVR2WSWPJEXT2XD2EDAVCNFSM6AAAAABOFAJCZOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONRVGQ2TKNJQGI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
[image: marinlauber]*marinlauber* left a comment
(WaterLily-jl/WaterLily.jl#165)
<#165 (comment)>
It looks nice like this! So, maybe the last thing is whether we should
keep g(i,x,t) or revert to g(i,t) since now the udf can be used for this
complex space-varying forcing.
—
Reply to this email directly, view it on GitHub
<#165 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADSKJ2NJL6NVR2WSWPJEXT2XD2EDAVCNFSM6AAAAABOFAJCZOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONRVGQ2TKNJQGI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
I'll do the performance benchmark. It was fine when we first introduced the |
Testing Running tests...
[ Info: Test backends: Array, CuArray
Test Summary: | Pass Total Time
util.jl | 54 54 41.3s
Test Summary: | Pass Total Time
Poisson.jl | 14 14 27.4s
Test Summary: | Pass Total Time
MultiLevelPoisson.jl | 13 13 11.3s
Test Summary: | Pass Total Time
Flow.jl | 30 30 22.8s
Test Summary: | Pass Total Time
Body.jl | 3 3 0.0s
Test Summary: | Pass Total Time
AutoBody.jl | 16 16 13.0s
Test Summary: | Pass Total Time
Flow.jl periodic TGV | 2 2 8.7s
Test Summary: | Pass Total Time
ForwardDiff | 2 2 39.8s
Test Summary: | Pass Total Time
Flow.jl with increasing body force | 4 4 15.5s
Test Summary: | Pass Total Time
Boundary Layer Flow | 2 2 7.7s
Test Summary: | Pass Total Time
Rotating reference frame | 1 1 2.9s
Test Summary: | Pass Total Time
Circle in accelerating flow | 8 8 10.4s
Test Summary: | Pass Total Time
Metrics.jl | 37 37 16.8s
Test Summary: | Pass Total Time
WaterLily.jl | 30 30 7.8s
Test Summary: | Pass Total Time
VTKExt.jl | 28 28 21.9s
Testing WaterLily tests passed ▶ Allocated 8 KiB
▶ Allocated 11 KiB
Test Summary: | Pass Total Time
mom_step! allocations | 2 2 33.2s
Testing WaterLily tests passed the results of the benchmark ( I had to run them separately since the TGV benchmark requires some changes between the two version) sh benchmark.sh -w "master 01d9a24" -v "1.11" -t "1 4" -b "Array CuArray" -c "tgv jelly" -p "6,7 5,6" -s "100 100" -ft "Float32 Float64"Benchmark environment: tgv sim_step! (max_steps=100)
▶ log2p = 6
┌────────────┬────────────────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────────────┬──────────┐
│ Backend │ WaterLily │ Julia │ Precision │ Allocations │ GC [%] │ Time [s] │ Cost [ns/DOF/dt] │ Speed-up │
├────────────┼────────────────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────────────┼──────────┤
│ CPUx01 │ add-nonuniform-BCs │ 1.11.4 │ Float32 │ 80936 │ 0.00 │ 6.40 │ 244.06 │ 1.00 │
│ CPUx01 │ master │ 1.11.4 │ Float32 │ 80021 │ 0.00 │ 4.30 │ 163.92 │ 1.49 │
│ CPUx04 │ add-nonuniform-BCs │ 1.11.4 │ Float32 │ 2343862 │ 0.00 │ 4.04 │ 154.00 │ 1.58 │
│ CPUx04 │ master │ 1.11.4 │ Float32 │ 2315093 │ 0.00 │ 10.67 │ 407.08 │ 0.60 │
│ GPU-NVIDIA │ add-nonuniform-BCs │ 1.11.4 │ Float32 │ 2994053 │ 0.00 │ 0.59 │ 22.59 │ 10.80 │
│ GPU-NVIDIA │ master │ 1.11.4 │ Float32 │ 2828270 │ 0.00 │ 0.54 │ 20.44 │ 11.94 │
└────────────┴────────────────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────────────┴──────────┘
▶ log2p = 7
┌────────────┬────────────────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────────────┬──────────┐
│ Backend │ WaterLily │ Julia │ Precision │ Allocations │ GC [%] │ Time [s] │ Cost [ns/DOF/dt] │ Speed-up │
├────────────┼────────────────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────────────┼──────────┤
│ CPUx01 │ add-nonuniform-BCs │ 1.11.4 │ Float32 │ 75708 │ 0.00 │ 43.59 │ 207.88 │ 1.00 │
│ CPUx01 │ master │ 1.11.4 │ Float32 │ 74816 │ 0.00 │ 30.43 │ 145.11 │ 1.43 │
│ CPUx04 │ add-nonuniform-BCs │ 1.11.4 │ Float32 │ 2181049 │ 0.00 │ 27.45 │ 130.91 │ 1.59 │
│ CPUx04 │ master │ 1.11.4 │ Float32 │ 2153203 │ 0.00 │ 47.59 │ 226.90 │ 0.92 │
│ GPU-NVIDIA │ add-nonuniform-BCs │ 1.11.4 │ Float32 │ 2738138 │ 0.00 │ 2.41 │ 11.48 │ 18.10 │
│ GPU-NVIDIA │ master │ 1.11.4 │ Float32 │ 2573981 │ 0.00 │ 2.11 │ 10.07 │ 20.64 │
└────────────┴────────────────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────────────┴──────────┘
Figure stored in /home/marin/Workspace/WaterLily-Benchmarks/plots/tgv_cost_add-nonuniform-BCs_master_1.11.4_Float32.pdf
Figure stored in /home/marin/Workspace/WaterLily-Benchmarks/plots/tgv_benchmark_add-nonuniform-BCs_master_1.11.4_Float32.pdf
Benchmark environment: jelly sim_step! (max_steps=100)
▶ log2p = 5
┌────────────┬────────────────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────────────┬──────────┐
│ Backend │ WaterLily │ Julia │ Precision │ Allocations │ GC [%] │ Time [s] │ Cost [ns/DOF/dt] │ Speed-up │
├────────────┼────────────────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────────────┼──────────┤
│ CPUx01 │ add-nonuniform-BCs │ 1.11.4 │ Float64 │ 173027 │ 0.00 │ 7.59 │ 578.72 │ 1.00 │
│ CPUx01 │ master │ 1.11.4 │ Float64 │ 172527 │ 0.00 │ 14.01 │ 1069.16 │ 0.54 │
│ CPUx04 │ add-nonuniform-BCs │ 1.11.4 │ Float64 │ 4803475 │ 0.67 │ 7.56 │ 576.85 │ 1.00 │
│ CPUx04 │ master │ 1.11.4 │ Float64 │ 4804775 │ 0.41 │ 13.04 │ 994.86 │ 0.58 │
│ GPU-NVIDIA │ add-nonuniform-BCs │ 1.11.4 │ Float64 │ 7126426 │ 1.61 │ 1.18 │ 90.11 │ 6.42 │
│ GPU-NVIDIA │ master │ 1.11.4 │ Float64 │ 6845780 │ 1.88 │ 1.30 │ 99.27 │ 5.83 │
└────────────┴────────────────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────────────┴──────────┘
▶ log2p = 6
┌────────────┬────────────────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────────────┬──────────┐
│ Backend │ WaterLily │ Julia │ Precision │ Allocations │ GC [%] │ Time [s] │ Cost [ns/DOF/dt] │ Speed-up │
├────────────┼────────────────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────────────┼──────────┤
│ CPUx01 │ add-nonuniform-BCs │ 1.11.4 │ Float64 │ 224141 │ 0.00 │ 42.21 │ 402.55 │ 1.00 │
│ CPUx01 │ master │ 1.11.4 │ Float64 │ 223641 │ 0.00 │ 89.73 │ 855.70 │ 0.47 │
│ CPUx04 │ add-nonuniform-BCs │ 1.11.4 │ Float64 │ 6309610 │ 0.11 │ 41.80 │ 398.62 │ 1.01 │
│ CPUx04 │ master │ 1.11.4 │ Float64 │ 6313610 │ 0.11 │ 59.71 │ 569.47 │ 0.71 │
│ GPU-NVIDIA │ add-nonuniform-BCs │ 1.11.4 │ Float64 │ 9727169 │ 0.41 │ 4.78 │ 45.58 │ 8.83 │
│ GPU-NVIDIA │ master │ 1.11.4 │ Float64 │ 9415538 │ 0.54 │ 5.22 │ 49.79 │ 8.09 │
└────────────┴────────────────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────────────┴──────────┘ |
|
Why in the world is the PR twice as fast for the Jelly? There should be no changes for that test right? And a 50% slow down for the TGV is a problem. Is that the cost of applying (and then ignoring) the domain BC function? |
|
I'd focus on the GPU timings, these are typically more consistent, and the difference is not much here. I can also run them to double-check. Also, I see you run the TGV in Float32 and the Jelly in Float64. Better run both cases with both precisions to see what's going on. |
|
Yeah, they are wired. I can also try to run the benchmark on Nefertem of DelftBlue; it might be a fairer comparison, I ran those on my workstation but you never know what process are in the background as well. @b-fg I use the first example command line for the benchmark, do you have a "better" one? |
|
So that I run the same benchmark as you, what exact function for TGV are you using now for this PR? (the WaterLily-Benchamarks TGV case only works for master). |
|
I've modified the original one to this function tgv(p, backend; Re=1600, T=Float32)
L = 2^p; U = T(1); κ=T(π/L); ν = T(1/(κ*Re))
function Uλ(i,xyz,t)
x,y,z = @. xyz*κ
i==1 && return -U*sin(x)*cos(y)*cos(z)
i==2 && return U*cos(x)*sin(y)*cos(z)
return 0*U
end
Simulation((L, L, L), Uλ, 1/κ; U=U, ν=ν, T=T, mem=backend)
endAnd I was wondering if this case should not be triple-periodic? |
|
|
|
This is what I got DetailsLooking at the GPU results, there is indeed something like 10-15% slowdown on TGV (and indeed severe slowdown on CPU). Is it because the |
|
It has to either be this or the evaluation of the acceleration term, right? Those are the only two changes. |
|
I am working on a hacky way to fix the performance issues using BC buffers computed during pre-processing. I will push the changes and if, you like this, we can clean it up together. This won't fix the (unavoidable) cost of space-time varying BC functions, but it should behave like master otherwise. |
|
Doing some tests with the pre-processing of BCs, I get the same behaviour as we are observing. Complex T = Float32
mem = CuArray
a = rand(T,150,100,50,3) |> mem
N,n = size_u(a)
normal_buffers, tangential_buffers = get_buffers(a, Uλ; T, mem)
b, c = copy(a), copy(a)
apply_BCs!(b, Uλ)
apply_BCs!(c, normal_buffers, tangential_buffers)
@assert isapprox(b, c, atol=1e-5)
@btime apply_BCs!($b, $Uλ) # CPU -t 1: 3.571 ms (54 allocations: 864 bytes) | GPU: 84.264 μs (936 allocations: 34.69 KiB)
@btime apply_BCs!($c, $normal_buffers, $tangential_buffers) # CPU -t 1: 184.782 μs (84 allocations: 1.88 KiB) | GPU: 102.383 μs (1197 allocations: 43.55 KiB)Detailsusing Revise, WaterLily, CUDA, StaticArrays, BenchmarkTools
@inline CI(a...) = CartesianIndex(a...)
CIj(j,I::CartesianIndex{d},k) where d = CI(ntuple(i -> i==j ? k : I[i], d))
δ(i,::Val{N}) where N = CI(ntuple(j -> j==i ? 1 : 0, N))
δ(i,I::CartesianIndex{N}) where N = δ(i, Val{N}())
function slice(dims::NTuple{N},i,j,low=1) where N
CartesianIndices(ntuple( k-> k==j ? (i:i) : (low:dims[k]), N))
end
@inline loc(i,I::CartesianIndex{N},T=Float32) where N = SVector{N,T}(I.I .- 1.5 .- 0.5 .* δ(i,I).I)
@inline loc(Ii::CartesianIndex,T=Float32) = loc(last(Ii),Base.front(Ii),T)
splitn(n) = Base.front(n),last(n)
size_u(u) = splitn(size(u))
function Uλ(i,xyz,t)
x,y,z = @. xyz
i==1 && return -sin(x)*cos(y)*cos(z)
i==2 && return cos(x)*sin(y)*cos(z)
return 0
end
function apply_BCs!(a, u_BC::Function)
N,n = size_u(a)
for i ∈ 1:n, j ∈ 1:n
if i==j # Normal direction, Dirichlet
for s ∈ (1,2)
WaterLily.@loop a[I,i] = u_BC(i,loc(i,I),0) over I ∈ slice(N,s,j)
end
# if i>1
WaterLily.@loop a[I,i] = u_BC(i,loc(i,I),0) over I ∈ slice(N,N[j],j)
# end
else # Tangential directions, Neumann
WaterLily.@loop a[I,i] = u_BC(i,loc(i,I),0)+a[I+δ(j,I),i]-u_BC(i,loc(i,I+δ(j,I)),0) over I ∈ slice(N,1,j)
WaterLily.@loop a[I,i] = u_BC(i,loc(i,I),0)+a[I-δ(j,I),i]-u_BC(i,loc(i,I-δ(j,I)),0) over I ∈ slice(N,N[j],j)
end
end
end
function apply_BCs!(a, normal_buffers, tangential_buffers)
N,n = size_u(a)
for i ∈ 1:n, j ∈ 1:n
if i==j # Normal direction, Dirichlet
for s ∈ (1,2)
b = normal_buffers[i][s]
WaterLily.@loop a[I,i] = b[I-δ(j,I)*(s-1)] over I ∈ slice(N,s,j)
end
# if i>1
b = normal_buffers[i][3]
WaterLily.@loop a[I,i] = b[I-δ(i,I)*(N[i]-1)] over I ∈ slice(N,N[j],j)
# end
else # Tangential directions, Neumann
b = tangential_buffers[i,j][1]
WaterLily.@loop a[I,i] = a[I+δ(j,I),i] + b[I] over I ∈ slice(N,1,j)
b = tangential_buffers[i,j][2]
WaterLily.@loop a[I,i] = a[I-δ(j,I),i] + b[I-δ(j,I)*(N[j]-1)] over I ∈ slice(N,N[j],j)
end
end
end
function get_buffers(a, u_BC; T=Float32, mem=Array)
N,n = size_u(a)
normal_buffers = collect(mem{T,n}[] for i ∈ 1:n)
tangential_buffers = collect(mem{T,n}[] for i ∈ 1:n, j ∈ 1:n)
for i ∈ 1:n, j ∈ 1:n
b = zeros(eltype(a), size(slice(N,1,j))...) |> mem
if i == j
for s ∈ (1,2)
WaterLily.@loop b[I-δ(j,I)*(s-1)] = u_BC(i,loc(i,I),0) over I ∈ slice(N,s,j)
push!(normal_buffers[i], copy(b))
end
WaterLily.@loop b[I-δ(j,I)*(N[j]-1)] = u_BC(i,loc(i,I),0) over I ∈ slice(N,N[j],j)
push!(normal_buffers[i], copy(b))
else
WaterLily.@loop b[I] = u_BC(i,loc(i,I),0)-u_BC(i,loc(i,I+δ(j,I)),0) over I ∈ slice(N,1,j)
push!(tangential_buffers[i,j], copy(b))
WaterLily.@loop b[I-δ(j,I)*(N[j]-1)] = u_BC(i,loc(i,I),0)-u_BC(i,loc(i,I-δ(j,I)),0) over I ∈ slice(N,N[j],j)
push!(tangential_buffers[i,j], copy(b))
end
end
return normal_buffers, tangential_buffers
end
T = Float32
mem = CuArray
a = rand(T,150,100,50,3) |> mem
N,n = size_u(a)
normal_buffers, tangential_buffers = get_buffers(a, Uλ; T, mem)
b, c = copy(a), copy(a)
apply_BCs!(b, Uλ)
apply_BCs!(c, normal_buffers, tangential_buffers)
@assert isapprox(b, c, atol=1e-5)
@btime apply_BCs!($b, $Uλ) # CPU -t 1: 3.571 ms (54 allocations: 864 bytes) | GPU: 84.264 μs (936 allocations: 34.69 KiB)
@btime apply_BCs!($c, $normal_buffers, $tangential_buffers) # CPU -t 1: 184.782 μs (84 allocations: 1.88 KiB) | GPU: 102.383 μs (1197 allocations: 43.55 KiB) |
|
I think, regardless of how well the BC kernel is launched, we will never be able to get the same evaluation time for |
|
What I was thinking is to steal the |
|
I like the |
|
FYI: I couldn't get vecloop to be nearly as fast as loop when doing the
same task. Maybe I was making some kind of mistake...
…On Mon, Apr 14, 2025, 10:22 Bernat Font ***@***.***> wrote:
I like the @vecloop approach. I was also already dispatching by Tuple of
Function (not shown in the MWE though). So I will try to push these
changes and you maybe can try to add the @vecloop?
—
Reply to this email directly, view it on GitHub
<#165 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADSKJ7KMFA4BGJ2SHA5SWT2ZNV5FAVCNFSM6AAAAABOFAJCZOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQMBQHA2TAMJVHE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
*b-fg* left a comment (WaterLily-jl/WaterLily.jl#165)
<#165 (comment)>
I like the @vecloop approach. I was also already dispatching by Tuple of
Function (not shown in the MWE though). So I will try to push these
changes and you maybe can try to add the @vecloop?
—
Reply to this email directly, view it on GitHub
<#165 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADSKJ7KMFA4BGJ2SHA5SWT2ZNV5FAVCNFSM6AAAAABOFAJCZOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQMBQHA2TAMJVHE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
I have implemented the BC buffers computed during pre-processing. All tests locally passing on CPU and GPU, and I have added an additional extra test in Below is the benchmark results for of TGV periodic for serial CPU, where |
|
I think I know what's going on... With the current setup, accelerate!(r,t,::Nothing,U::Function) = accelerate!(r,t,(i,x,t)->ForwardDiff.derivative(τ->U(i,x,τ),t))where |
|
How should we handle this? Currently accelerate uses both accelerate!(r,t,::Nothing,::Union{Nothing,Tuple}) = nothing
accelerate!(r,t,f::Function) = @loop r[Ii] += f(last(Ii),loc(Ii,eltype(r)),t) over Ii ∈ CartesianIndices(r)
accelerate!(r,t,g::Function,::Union{Nothing,Tuple}) = accelerate!(r,t,g)
accelerate!(r,t,::Nothing,U::Function) = accelerate!(r,t,(i,x,t)->ForwardDiff.derivative(τ->U(i,x,τ),t))
accelerate!(r,t,g::Function,U::Function) = accelerate!(r,t,(i,x,t)->g(i,x,t)+ForwardDiff.derivative(τ->U(i,x,τ),t))I am not sure what combination we should actually support. Let me know your thoughts. |
|
(Also we need to think why this was not caught by the tests). |
|
The acceleration is required when the user supplies a background velocity since it implies an accelerating reference frame. That background velocity should also be used to set the initial velocity. We have test to make sure this happens... However, the TGV is just an initial condition, not a background velocity. More logic is needed in the input conditions, I guess... |
|
I think we are trying to squeeze to much stuff in |
|
I am thinking to introduce this (breaking) change: function Simulation(dims::NTuple{N}, U₀, Uλ;
L=1, U=nothing, Δt=0.25, ν=0., g=nothing, ϵ=1, perdir=(),
exitBC=false, body::AbstractBody=NoBody(),
T=Float32, mem=Array) where N
@assert !(isnothing(U) && isa(Uλ,Function)) "`U` (velocity scale) must be specified if `Uλ` is a `Function`"
isnothing(U) && (U = √sum(abs2,Uλ))
check_fn(g,N,T); check_fn(Uλ,N,T)
flow = Flow(dims,U₀,Uλ;Δt,ν,g,T,f=mem,perdir,exitBC)
measure!(flow,body;ϵ)
new(U,L,ϵ,flow,body,MultiLevelPoisson(flow.p,flow.μ₀,flow.σ;perdir))
endSo when creating a Alternatively, we could also have If you like this I can update the code base and the tests. WaterLily-Examples should also be updated. So it would be reasonable to bump WaterLily to 1.4 as well. Let me know! |
|
But this is kind of what we had before, |
|
Yes, and I am not sure how to do this without using another argument in |
…leaned up across Simulation and Flow. All tests passing locally on CPU and GPU.
|
After a short conversation with Gabe, I have re-introduce the different arguments I took the chance to homogenize some of these arguments across the different objects ( I might make some performance tests to check the impact of non-uniform BCs and see if it using BC buffers, when Details |
|
Sounds good to me |
This pull request adds the capability of using space and time-varying inflow boundary conditions as well as prescribing space-time body force to the momentum step.
The major change is that now the
u_BC::Functionpassed to theFlowis of the formu_BC(i,x,t)whereiis the component,xthe location in physical space andtthe time. The flow is then accelerated correctly, and the boundary conditions on the velocity field are applied correctly. This allows to prescribe velocity profiles at the inlet and outlet.Another feature grabbed from #174 is to enable passing either a
body_force::AbstractArrayorbody_force::Function=(i,x,t)->...tomom_step!andsim_step!, which allows the user to either prescribe some custom term or accelerate the flow locally.