Skip to content

Turbulence utils: SGS model and on-the-fly temporal averaging#198

Merged
b-fg merged 26 commits intomasterfrom
turbulence_modelling
May 24, 2025
Merged

Turbulence utils: SGS model and on-the-fly temporal averaging#198
b-fg merged 26 commits intomasterfrom
turbulence_modelling

Conversation

@b-fg
Copy link
Copy Markdown
Member

@b-fg b-fg commented Mar 10, 2025

Implementing some utilities for turbulence modelling such as eddy viscosity models for subgrid stresses (based on #165), and averaging routines.

@b-fg b-fg added the enhancement New feature or request label Mar 10, 2025
@weymouth
Copy link
Copy Markdown
Member

Looking pretty clean

@b-fg
Copy link
Copy Markdown
Member Author

b-fg commented Mar 11, 2025

This builds on #165, so we should integrate that first.

@b-fg b-fg changed the title Turbulence modelling Turbulence utils: SGS model and on-the-fly temporal averaging May 13, 2025
@b-fg
Copy link
Copy Markdown
Member Author

b-fg commented May 13, 2025

Needs to add tests, and time-averaging restart (I/O) based on JLD2 (once #213 is merged it can be added in WaterLilyJLD2Ext.jl).

@b-fg b-fg marked this pull request as ready for review May 13, 2025 18:26
@b-fg b-fg requested a review from weymouth May 13, 2025 18:27
@b-fg
Copy link
Copy Markdown
Member Author

b-fg commented May 13, 2025

Here is MWE of how this works

using Revise, WaterLily, GLMakie

function circle(n,m;Re=100,U=1)
    radius, center = m/8, m/2-1
    sdf(x,t) = sum(abs2, x .- center) - radius
    Simulation((n,m), (U,0), 2radius; ν=U*2radius/Re, body=AutoBody(sdf))
end

sim = circle(4*32,32)
sim_step!(sim, 40; remeasure=false, verbose=true) # developing flow
meanflow = MeanFlow(sim.flow)
sim_step!(sim, 190; remeasure=false, verbose=true, meanflow) # temporal average

function vort!(cpu_array, sim)
    a = sim.flow.σ
    WaterLily.@inside a[I] = WaterLily.curl(3,I,sim.flow.u)*sim.L/sim.U
    copyto!(cpu_array, a[inside(a)]) # copy to CPU
end

viz!(sim, vort!; colormap=:balance, levels=50) # Flow plot
WaterLily.copy!(sim.flow, meanflow)
viz!(sim, vort!; colormap=:balance, levels=50) # MeanFlow plot

Instantanous flow
Screenshot from 2025-05-13 21-13-42
Mean flow
Screenshot from 2025-05-13 21-13-51

@b-fg
Copy link
Copy Markdown
Member Author

b-fg commented May 13, 2025

There is something allocating when passing the convective scheme λ to Flow... Needs fixing.

b-fg added 4 commits May 19, 2025 18:04
…locations.

This is currently done using dispatch by Value, and requires repeated code.
Without this, when trying to get automatically specialized flux functions, it always allocates during runtime.
@b-fg
Copy link
Copy Markdown
Member Author

b-fg commented May 19, 2025

Passing a flux function λ into mom_step! is unfortunately allocating, which might be cause by lack of function specialization. So currently, the only workaround I have found is to dispatch to the specific function by Val{::Symbol}. That is, I have to use the following implementation

@inline ϕu(a,I,f,u,::Val{:quick}) = @inbounds u>0 ? u*quick(f[I-2δ(a,I)],f[I-δ(a,I)],f[I]) : u*quick(f[I+δ(a,I)],f[I],f[I-δ(a,I)])
@inline ϕuP(a,Ip,I,f,u,::Val{:quick}) = @inbounds u>0 ? u*quick(f[Ip],f[I-δ(a,I)],f[I]) : u*quick(f[I+δ(a,I)],f[I],f[I-δ(a,I)])
@inline ϕuL(a,I,f,u,::Val{:quick}) = @inbounds u>0 ? u*ϕ(a,I,f) : u*quick(f[I+δ(a,I)],f[I],f[I-δ(a,I)])
@inline ϕuR(a,I,f,u,::Val{:quick}) = @inbounds u<0 ? u*ϕ(a,I,f) : u*quick(f[I-2δ(a,I)],f[I-δ(a,I)],f[I])

@inline ϕu(a,I,f,u,::Val{:cds}) = @inbounds u>0 ? u*cds(f[I-2δ(a,I)],f[I-δ(a,I)],f[I]) : u*cds(f[I+δ(a,I)],f[I],f[I-δ(a,I)])
@inline ϕuP(a,Ip,I,f,u,::Val{:cds}) = @inbounds u>0 ? u*cds(f[Ip],f[I-δ(a,I)],f[I]) : u*cds(f[I+δ(a,I)],f[I],f[I-δ(a,I)])
@inline ϕuL(a,I,f,u,::Val{:cds}) = @inbounds u>0 ? u*ϕ(a,I,f) : u*cds(f[I+δ(a,I)],f[I],f[I-δ(a,I)])
@inline ϕuR(a,I,f,u,::Val{:cds}) = @inbounds u<0 ? u*ϕ(a,I,f) : u*cds(f[I-2δ(a,I)],f[I-δ(a,I)],f[I])

Instead of the cleaner version

@inline ϕu(a,I,f,u,λ::F) where {F} =  @inbounds u>0 ? u*λ(f[I-2δ(a,I)],f[I-δ(a,I)],f[I]) : u*λ(f[I+δ(a,I)],f[I],f[I-δ(a,I)])
@inline ϕuP(a,Ip,I,f,u,λ::F) where {F} = @inbounds u>0 ? u*λ(f[Ip],f[I-δ(a,I)],f[I]) : u*λ(f[I+δ(a,I)],f[I],f[I-δ(a,I)])
@inline ϕuL(a,I,f,u,λ::F) where {F} = @inbounds u>0 ? u*ϕ(a,I,f) : u*λ(f[I+δ(a,I)],f[I],f[I-δ(a,I)])
@inline ϕuR(a,I,f,u,λ::F) where {F} = @inbounds u<0 ? u*ϕ(a,I,f) : u*λ(f[I-2δ(a,I)],f[I-δ(a,I)],f[I])

since the latter version results in allocations, as measure in allocatest.jl for single thread execution. I am trying to force specialization with ::F where F, but it seems to be doing nothing.

Maybe @weymouth or @vchuravy have some insights in this?

@vchuravy
Copy link
Copy Markdown
Contributor

How are you passing it into mom_step ? E.g. is the caller of ϕu properly specialized?

@b-fg
Copy link
Copy Markdown
Member Author

b-fg commented May 20, 2025

The way I planned users to select the convection scheme is by passing themselves the function into sim_step!, which is then passed all the way to ϕu. This follows: sim_step!(...; λ=quick) > mom_step!(...;λ) > conv_diff!(...,λ;...) > lowerBoundary! or upperBoundary! (or none) >ϕu(...,λ).

I have tried to force specialization in conv_diff! using the function header conv_diff!(r,u,Φ,λ::F;ν=0.1,perdir=()) where {F}, as well as in lowerBoundary and upperBoundary, and in ϕu (an the other related functions ϕuP,ϕuL,ϕuR ), ϕu(a,I,f,u,λ::F) where {F}, and it is still allocating...

@vchuravy
Copy link
Copy Markdown
Contributor

It's hard to tell, without digging deeper into the code. I would use the allocation profiler to determine first where the allocations are steming from (potentially setting the sampling to 1).

Then I would try not passing lambda through a keyword argument. That will certainly cause allocation at one point since you have to call the specialized function at some point

@b-fg
Copy link
Copy Markdown
Member Author

b-fg commented May 20, 2025

cc @weymouth, I traced the allocations down, and it is a problem of the @loop macro which is probably preventing the specialization. The following test is conducted with one thread (-t 1), so our @loop macro is resulting in @simd for $I ∈ $R instead of using the KA kernel.

So, within conv_diff!, the following (simplified) statement

@loop Φ[I] = ϕu(j,CI(I,i),u,ϕ(i,CI(I,j),u),λ) over I  inside_u(N,j)

is allocating when passing λ. Instead, the equivalent statement

for I  inside_u(N,j)
    Φ[I] = ϕu(j,CI(I,i),u,ϕ(i,CI(I,j),u),λ)
end

is not allocating. Thus, there is something not quite write with the macro not allowing specialization, as far as I can see. Maybe it's just that the macro results in a higher level function that ultimately calls ϕu,

function $kern($(rep.(sym)...),::Val{1})
   @simd for $I  $R
       @fastmath @inbounds $ex
   end
end

but then the λ::F where F trick to enforce specialization cannot be imposed. So it is a limitation of the current code design.

@vchuravy
Copy link
Copy Markdown
Contributor

Maybe it's just that the macro results in a higher level function that ultimately calls

Yeah that sounds about right. Julia specialization heuristics strike again. You could generate a kernel that specializes on each of the arguments?

@b-fg
Copy link
Copy Markdown
Member Author

b-fg commented May 21, 2025

Do you mean like a macro generating

function kern(a::A,b::B,c::C,...,::Val{1}) where {A,B,C,...}
    ...
end

That might work indeed, but I am not sure my metaprogramming skills are up to the task haha. Maybe @weymouth? 👀

@b-fg
Copy link
Copy Markdown
Member Author

b-fg commented May 21, 2025

Also, would this be something that benefits KA kernels, @vchuravy? Something like

@kernel function $kern_($(rep.(sym)...),@Const(I0))
    $I = @index(Global,Cartesian)
    $I += I0
    @fastmath @inbounds $ex
end
function $kern(a::A,b::B,...,_) where {A,B,...}
    $kern_(get_backend($(sym[1])),64)($(sym...),$R[1]-oneunit($R[1]),ndrange=size($R))
end

@b-fg
Copy link
Copy Markdown
Member Author

b-fg commented May 21, 2025

Another option for -t 1 would be to just return the @simd for $I ∈ $R; @fastmath @inbounds $ex; end instead of the of the function wrapper. Else, we can return the KA setup: KA kernel, KA wrapper, and function call.

@b-fg
Copy link
Copy Markdown
Member Author

b-fg commented May 22, 2025

To do

b-fg added 2 commits May 23, 2025 19:30
…w be passed as a function without allocating.

sim_step now takes an additional keyword argument, λ, which can be the quick or cds function, or any that the user wants to implement.
Added a new allocation test for when λ is passed, and added a test for the new central difference scheme as well.
@b-fg
Copy link
Copy Markdown
Member Author

b-fg commented May 23, 2025

Merged #133, and now a convective scheme λ can passed into sim_step! without allocating. Added a new allocation test to confirm this.

Thank you @vchuravy for the suggestion of specializing the kernel wrapper function! Works here and has improved performance a bit :)

@b-fg
Copy link
Copy Markdown
Member Author

b-fg commented May 24, 2025

Benchmarks looking good, should be ready to merge.

Benchmarks
Benchmark environment: tgv sim_step! (max_steps=100)
▶ log2p = 6
┌────────────┬──────────────────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────────────┬──────────┐
│  Backend   │      WaterLily       │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Cost [ns/DOF/dt] │ Speed-up │
├────────────┼──────────────────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────────────┼──────────┤
│     CPUx01 │               master │ 1.11.3 │   Float32 │        1807 │   0.00 │     4.22 │           160.86 │     1.00 │
│     CPUx01 │ turbulence_modelling │ 1.11.3 │   Float32 │        1807 │   0.00 │     4.24 │           161.63 │     1.00 │
│     CPUx04 │               master │ 1.11.3 │   Float32 │     2273557 │   0.00 │     2.98 │           113.62 │     1.42 │
│     CPUx04 │ turbulence_modelling │ 1.11.3 │   Float32 │     2273557 │   0.00 │     2.96 │           112.98 │     1.42 │
│ GPU-NVIDIA │               master │ 1.11.3 │   Float32 │     2933920 │   0.00 │     0.60 │            22.84 │     7.04 │
│ GPU-NVIDIA │ turbulence_modelling │ 1.11.3 │   Float32 │     2990861 │   0.00 │     0.59 │            22.32 │     7.21 │
└────────────┴──────────────────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────────────┴──────────┘
▶ log2p = 7
┌────────────┬──────────────────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────────────┬──────────┐
│  Backend   │      WaterLily       │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Cost [ns/DOF/dt] │ Speed-up │
├────────────┼──────────────────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────────────┼──────────┤
│     CPUx01 │               master │ 1.11.3 │   Float32 │        1807 │   0.00 │    25.84 │           123.21 │     1.00 │
│     CPUx01 │ turbulence_modelling │ 1.11.3 │   Float32 │        1807 │   0.00 │    26.12 │           124.54 │     0.99 │
│     CPUx04 │               master │ 1.11.3 │   Float32 │     2116285 │   0.00 │    17.36 │            82.76 │     1.49 │
│     CPUx04 │ turbulence_modelling │ 1.11.3 │   Float32 │     2116285 │   0.00 │    17.12 │            81.61 │     1.51 │
│ GPU-NVIDIA │               master │ 1.11.3 │   Float32 │     2679958 │   0.00 │     3.00 │            14.28 │     8.63 │
│ GPU-NVIDIA │ turbulence_modelling │ 1.11.3 │   Float32 │     2735667 │   0.00 │     3.00 │            14.29 │     8.62 │
└────────────┴──────────────────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────────────┴──────────┘
Benchmark environment: jelly sim_step! (max_steps=100)
▶ log2p = 5
┌────────────┬──────────────────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────────────┬──────────┐
│  Backend   │      WaterLily       │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Cost [ns/DOF/dt] │ Speed-up │
├────────────┼──────────────────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────────────┼──────────┤
│     CPUx01 │               master │ 1.11.3 │   Float32 │        7107 │   0.00 │     3.81 │           291.04 │     1.00 │
│     CPUx01 │ turbulence_modelling │ 1.11.3 │   Float32 │        7107 │   0.00 │     3.83 │           292.09 │     1.00 │
│     CPUx04 │               master │ 1.11.3 │   Float32 │     4357685 │   0.55 │     3.52 │           268.33 │     1.08 │
│     CPUx04 │ turbulence_modelling │ 1.11.3 │   Float32 │     4357685 │   0.58 │     3.51 │           267.76 │     1.09 │
│ GPU-NVIDIA │               master │ 1.11.3 │   Float32 │     5636204 │   1.29 │     1.09 │            83.37 │     3.49 │
│ GPU-NVIDIA │ turbulence_modelling │ 1.11.3 │   Float32 │     5676609 │   1.32 │     1.08 │            82.39 │     3.53 │
└────────────┴──────────────────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────────────┴──────────┘
▶ log2p = 6
┌────────────┬──────────────────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────────────┬──────────┐
│  Backend   │      WaterLily       │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Cost [ns/DOF/dt] │ Speed-up │
├────────────┼──────────────────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────────────┼──────────┤
│     CPUx01 │               master │ 1.11.3 │   Float32 │        8307 │   0.00 │    27.12 │           258.61 │     1.00 │
│     CPUx01 │ turbulence_modelling │ 1.11.3 │   Float32 │        8307 │   0.00 │    27.41 │           261.42 │     0.99 │
│     CPUx04 │               master │ 1.11.3 │   Float32 │     5712303 │   0.15 │    17.91 │           170.76 │     1.51 │
│     CPUx04 │ turbulence_modelling │ 1.11.3 │   Float32 │     5712303 │   0.15 │    17.93 │           170.99 │     1.51 │
│ GPU-NVIDIA │               master │ 1.11.3 │   Float32 │     7595601 │   0.48 │     3.82 │            36.39 │     7.11 │
│ GPU-NVIDIA │ turbulence_modelling │ 1.11.3 │   Float32 │     7644915 │   0.45 │     3.80 │            36.26 │     7.13 │
└────────────┴──────────────────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────────────┴──────────┘

@b-fg b-fg merged commit 620ff38 into master May 24, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants