Turbulence utils: SGS model and on-the-fly temporal averaging by b-fg · Pull Request #198 · WaterLily-jl/WaterLily.jl

b-fg · 2025-03-10T14:41:42Z

Implementing some utilities for turbulence modelling such as eddy viscosity models for subgrid stresses (based on #165), and averaging routines.

weymouth · 2025-03-10T17:12:24Z

Looking pretty clean

b-fg · 2025-03-11T09:38:57Z

This builds on #165, so we should integrate that first.

…entral differences (cds) too.

Now users can pass a MeanFlow struct to sim_step to compute on-the-fly temporal averages. Need to add tests.

b-fg · 2025-05-13T17:09:51Z

Needs to add tests, and time-averaging restart (I/O) based on JLD2 (once #213 is merged it can be added in WaterLilyJLD2Ext.jl).

b-fg · 2025-05-13T19:22:05Z

Here is MWE of how this works

using Revise, WaterLily, GLMakie

function circle(n,m;Re=100,U=1)
    radius, center = m/8, m/2-1
    sdf(x,t) = √sum(abs2, x .- center) - radius
    Simulation((n,m), (U,0), 2radius; ν=U*2radius/Re, body=AutoBody(sdf))
end

sim = circle(4*32,32)
sim_step!(sim, 40; remeasure=false, verbose=true) # developing flow
meanflow = MeanFlow(sim.flow)
sim_step!(sim, 190; remeasure=false, verbose=true, meanflow) # temporal average

function vort!(cpu_array, sim)
    a = sim.flow.σ
    WaterLily.@inside a[I] = WaterLily.curl(3,I,sim.flow.u)*sim.L/sim.U
    copyto!(cpu_array, a[inside(a)]) # copy to CPU
end

viz!(sim, vort!; colormap=:balance, levels=50) # Flow plot
WaterLily.copy!(sim.flow, meanflow)
viz!(sim, vort!; colormap=:balance, levels=50) # MeanFlow plot

Instantanous flow

Mean flow

b-fg · 2025-05-13T19:40:32Z

There is something allocating when passing the convective scheme λ to Flow... Needs fixing.

…locations. This is currently done using dispatch by Value, and requires repeated code. Without this, when trying to get automatically specialized flux functions, it always allocates during runtime.

b-fg · 2025-05-19T16:36:09Z

Passing a flux function λ into mom_step! is unfortunately allocating, which might be cause by lack of function specialization. So currently, the only workaround I have found is to dispatch to the specific function by Val{::Symbol}. That is, I have to use the following implementation

@inline ϕu(a,I,f,u,::Val{:quick}) = @inbounds u>0 ? u*quick(f[I-2δ(a,I)],f[I-δ(a,I)],f[I]) : u*quick(f[I+δ(a,I)],f[I],f[I-δ(a,I)])
@inline ϕuP(a,Ip,I,f,u,::Val{:quick}) = @inbounds u>0 ? u*quick(f[Ip],f[I-δ(a,I)],f[I]) : u*quick(f[I+δ(a,I)],f[I],f[I-δ(a,I)])
@inline ϕuL(a,I,f,u,::Val{:quick}) = @inbounds u>0 ? u*ϕ(a,I,f) : u*quick(f[I+δ(a,I)],f[I],f[I-δ(a,I)])
@inline ϕuR(a,I,f,u,::Val{:quick}) = @inbounds u<0 ? u*ϕ(a,I,f) : u*quick(f[I-2δ(a,I)],f[I-δ(a,I)],f[I])

@inline ϕu(a,I,f,u,::Val{:cds}) = @inbounds u>0 ? u*cds(f[I-2δ(a,I)],f[I-δ(a,I)],f[I]) : u*cds(f[I+δ(a,I)],f[I],f[I-δ(a,I)])
@inline ϕuP(a,Ip,I,f,u,::Val{:cds}) = @inbounds u>0 ? u*cds(f[Ip],f[I-δ(a,I)],f[I]) : u*cds(f[I+δ(a,I)],f[I],f[I-δ(a,I)])
@inline ϕuL(a,I,f,u,::Val{:cds}) = @inbounds u>0 ? u*ϕ(a,I,f) : u*cds(f[I+δ(a,I)],f[I],f[I-δ(a,I)])
@inline ϕuR(a,I,f,u,::Val{:cds}) = @inbounds u<0 ? u*ϕ(a,I,f) : u*cds(f[I-2δ(a,I)],f[I-δ(a,I)],f[I])

Instead of the cleaner version

@inline ϕu(a,I,f,u,λ::F) where {F} =  @inbounds u>0 ? u*λ(f[I-2δ(a,I)],f[I-δ(a,I)],f[I]) : u*λ(f[I+δ(a,I)],f[I],f[I-δ(a,I)])
@inline ϕuP(a,Ip,I,f,u,λ::F) where {F} = @inbounds u>0 ? u*λ(f[Ip],f[I-δ(a,I)],f[I]) : u*λ(f[I+δ(a,I)],f[I],f[I-δ(a,I)])
@inline ϕuL(a,I,f,u,λ::F) where {F} = @inbounds u>0 ? u*ϕ(a,I,f) : u*λ(f[I+δ(a,I)],f[I],f[I-δ(a,I)])
@inline ϕuR(a,I,f,u,λ::F) where {F} = @inbounds u<0 ? u*ϕ(a,I,f) : u*λ(f[I-2δ(a,I)],f[I-δ(a,I)],f[I])

since the latter version results in allocations, as measure in allocatest.jl for single thread execution. I am trying to force specialization with ::F where F, but it seems to be doing nothing.

Maybe @weymouth or @vchuravy have some insights in this?

vchuravy · 2025-05-20T10:03:58Z

How are you passing it into mom_step ? E.g. is the caller of ϕu properly specialized?

b-fg · 2025-05-20T12:45:27Z

The way I planned users to select the convection scheme is by passing themselves the function into sim_step!, which is then passed all the way to ϕu. This follows: sim_step!(...; λ=quick) > mom_step!(...;λ) > conv_diff!(...,λ;...) > lowerBoundary! or upperBoundary! (or none) >ϕu(...,λ).

I have tried to force specialization in conv_diff! using the function header conv_diff!(r,u,Φ,λ::F;ν=0.1,perdir=()) where {F}, as well as in lowerBoundary and upperBoundary, and in ϕu (an the other related functions ϕuP,ϕuL,ϕuR ), ϕu(a,I,f,u,λ::F) where {F}, and it is still allocating...

vchuravy · 2025-05-20T13:52:19Z

It's hard to tell, without digging deeper into the code. I would use the allocation profiler to determine first where the allocations are steming from (potentially setting the sampling to 1).

Then I would try not passing lambda through a keyword argument. That will certainly cause allocation at one point since you have to call the specialized function at some point

b-fg · 2025-05-20T17:36:07Z

cc @weymouth, I traced the allocations down, and it is a problem of the @loop macro which is probably preventing the specialization. The following test is conducted with one thread (-t 1), so our @loop macro is resulting in @simd for $I ∈ $R instead of using the KA kernel.

So, within conv_diff!, the following (simplified) statement

@loop Φ[I] = ϕu(j,CI(I,i),u,ϕ(i,CI(I,j),u),λ) over I ∈ inside_u(N,j)

is allocating when passing λ. Instead, the equivalent statement

for I ∈ inside_u(N,j)
    Φ[I] = ϕu(j,CI(I,i),u,ϕ(i,CI(I,j),u),λ)
end

is not allocating. Thus, there is something not quite write with the macro not allowing specialization, as far as I can see. Maybe it's just that the macro results in a higher level function that ultimately calls ϕu,

function $kern($(rep.(sym)...),::Val{1})
   @simd for $I ∈ $R
       @fastmath @inbounds $ex
   end
end

but then the λ::F where F trick to enforce specialization cannot be imposed. So it is a limitation of the current code design.

vchuravy · 2025-05-21T05:15:29Z

Maybe it's just that the macro results in a higher level function that ultimately calls

Yeah that sounds about right. Julia specialization heuristics strike again. You could generate a kernel that specializes on each of the arguments?

b-fg · 2025-05-21T07:50:10Z

Do you mean like a macro generating

function kern(a::A,b::B,c::C,...,::Val{1}) where {A,B,C,...}
    ...
end

That might work indeed, but I am not sure my metaprogramming skills are up to the task haha. Maybe @weymouth? 👀

b-fg · 2025-05-21T10:53:31Z

Also, would this be something that benefits KA kernels, @vchuravy? Something like

@kernel function $kern_($(rep.(sym)...),@Const(I0))
    $I = @index(Global,Cartesian)
    $I += I0
    @fastmath @inbounds $ex
end
function $kern(a::A,b::B,...,_) where {A,B,...}
    $kern_(get_backend($(sym[1])),64)($(sym...),$R[1]-oneunit($R[1]),ndrange=size($R))
end

b-fg · 2025-05-21T15:29:10Z

Another option for -t 1 would be to just return the @simd for $I ∈ $R; @fastmath @inbounds $ex; end instead of the of the function wrapper. Else, we can return the KA setup: KA kernel, KA wrapper, and function call.

b-fg · 2025-05-22T15:21:02Z

To do

Once Use preferences to switch between SIMD and KernelAbstractions #133 is merged, try again to pass λ, which should be able to specialize then
Implement save! and load! JLD2 functions for MeanFlow, following JLD2 extension to restart a Simulation using JLD2 file data format (HDF5 compatible). #213

…w be passed as a function without allocating. sim_step now takes an additional keyword argument, λ, which can be the quick or cds function, or any that the user wants to implement. Added a new allocation test for when λ is passed, and added a test for the new central difference scheme as well.

b-fg · 2025-05-23T17:57:57Z

Merged #133, and now a convective scheme λ can passed into sim_step! without allocating. Added a new allocation test to confirm this.

Thank you @vchuravy for the suggestion of specializing the kernel wrapper function! Works here and has improved performance a bit :)

b-fg · 2025-05-24T15:18:24Z

Benchmarks looking good, should be ready to merge.

Benchmarks

Benchmark environment: tgv sim_step! (max_steps=100)
▶ log2p = 6
┌────────────┬──────────────────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────────────┬──────────┐
│  Backend   │      WaterLily       │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Cost [ns/DOF/dt] │ Speed-up │
├────────────┼──────────────────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────────────┼──────────┤
│     CPUx01 │               master │ 1.11.3 │   Float32 │        1807 │   0.00 │     4.22 │           160.86 │     1.00 │
│     CPUx01 │ turbulence_modelling │ 1.11.3 │   Float32 │        1807 │   0.00 │     4.24 │           161.63 │     1.00 │
│     CPUx04 │               master │ 1.11.3 │   Float32 │     2273557 │   0.00 │     2.98 │           113.62 │     1.42 │
│     CPUx04 │ turbulence_modelling │ 1.11.3 │   Float32 │     2273557 │   0.00 │     2.96 │           112.98 │     1.42 │
│ GPU-NVIDIA │               master │ 1.11.3 │   Float32 │     2933920 │   0.00 │     0.60 │            22.84 │     7.04 │
│ GPU-NVIDIA │ turbulence_modelling │ 1.11.3 │   Float32 │     2990861 │   0.00 │     0.59 │            22.32 │     7.21 │
└────────────┴──────────────────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────────────┴──────────┘
▶ log2p = 7
┌────────────┬──────────────────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────────────┬──────────┐
│  Backend   │      WaterLily       │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Cost [ns/DOF/dt] │ Speed-up │
├────────────┼──────────────────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────────────┼──────────┤
│     CPUx01 │               master │ 1.11.3 │   Float32 │        1807 │   0.00 │    25.84 │           123.21 │     1.00 │
│     CPUx01 │ turbulence_modelling │ 1.11.3 │   Float32 │        1807 │   0.00 │    26.12 │           124.54 │     0.99 │
│     CPUx04 │               master │ 1.11.3 │   Float32 │     2116285 │   0.00 │    17.36 │            82.76 │     1.49 │
│     CPUx04 │ turbulence_modelling │ 1.11.3 │   Float32 │     2116285 │   0.00 │    17.12 │            81.61 │     1.51 │
│ GPU-NVIDIA │               master │ 1.11.3 │   Float32 │     2679958 │   0.00 │     3.00 │            14.28 │     8.63 │
│ GPU-NVIDIA │ turbulence_modelling │ 1.11.3 │   Float32 │     2735667 │   0.00 │     3.00 │            14.29 │     8.62 │
└────────────┴──────────────────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────────────┴──────────┘
Benchmark environment: jelly sim_step! (max_steps=100)
▶ log2p = 5
┌────────────┬──────────────────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────────────┬──────────┐
│  Backend   │      WaterLily       │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Cost [ns/DOF/dt] │ Speed-up │
├────────────┼──────────────────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────────────┼──────────┤
│     CPUx01 │               master │ 1.11.3 │   Float32 │        7107 │   0.00 │     3.81 │           291.04 │     1.00 │
│     CPUx01 │ turbulence_modelling │ 1.11.3 │   Float32 │        7107 │   0.00 │     3.83 │           292.09 │     1.00 │
│     CPUx04 │               master │ 1.11.3 │   Float32 │     4357685 │   0.55 │     3.52 │           268.33 │     1.08 │
│     CPUx04 │ turbulence_modelling │ 1.11.3 │   Float32 │     4357685 │   0.58 │     3.51 │           267.76 │     1.09 │
│ GPU-NVIDIA │               master │ 1.11.3 │   Float32 │     5636204 │   1.29 │     1.09 │            83.37 │     3.49 │
│ GPU-NVIDIA │ turbulence_modelling │ 1.11.3 │   Float32 │     5676609 │   1.32 │     1.08 │            82.39 │     3.53 │
└────────────┴──────────────────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────────────┴──────────┘
▶ log2p = 6
┌────────────┬──────────────────────┬────────┬───────────┬─────────────┬────────┬──────────┬──────────────────┬──────────┐
│  Backend   │      WaterLily       │ Julia  │ Precision │ Allocations │ GC [%] │ Time [s] │ Cost [ns/DOF/dt] │ Speed-up │
├────────────┼──────────────────────┼────────┼───────────┼─────────────┼────────┼──────────┼──────────────────┼──────────┤
│     CPUx01 │               master │ 1.11.3 │   Float32 │        8307 │   0.00 │    27.12 │           258.61 │     1.00 │
│     CPUx01 │ turbulence_modelling │ 1.11.3 │   Float32 │        8307 │   0.00 │    27.41 │           261.42 │     0.99 │
│     CPUx04 │               master │ 1.11.3 │   Float32 │     5712303 │   0.15 │    17.91 │           170.76 │     1.51 │
│     CPUx04 │ turbulence_modelling │ 1.11.3 │   Float32 │     5712303 │   0.15 │    17.93 │           170.99 │     1.51 │
│ GPU-NVIDIA │               master │ 1.11.3 │   Float32 │     7595601 │   0.48 │     3.82 │            36.39 │     7.11 │
│ GPU-NVIDIA │ turbulence_modelling │ 1.11.3 │   Float32 │     7644915 │   0.45 │     3.80 │            36.26 │     7.13 │
└────────────┴──────────────────────┴────────┴───────────┴─────────────┴────────┴──────────┴──────────────────┴──────────┘

Implemented template for LES eddy viscosity models.

97afa98

b-fg added the enhancement New feature or request label Mar 10, 2025

b-fg added 9 commits March 11, 2025 16:35

Merge branch 'add-nonuniform-BCs' into turbulence_modelling

6aaca69

Merge branch 'add-nonuniform-BCs' into turbulence_modelling

79a9f01

Small changes to sgs

d34773d

Users can change the convective scheme through the keyword λ. Added c…

a8894e0

…entral differences (cds) too.

Changed Smagorinsky constant from C to Cs

8ba2250

Changed Smagorinsky constant from C to Cs, missing onee

f9f895e

Merged master.

e6ae260

Merge branch 'master' into turbulence_modelling

29b7f33

Included and related routines in metrics.jl.

be6acd9

Now users can pass a MeanFlow struct to sim_step to compute on-the-fly temporal averages. Need to add tests.

b-fg mentioned this pull request May 13, 2025

On-the-fly temporal averages #132

Closed

b-fg changed the title ~~Turbulence modelling~~ Turbulence utils: SGS model and on-the-fly temporal averaging May 13, 2025

b-fg added 2 commits May 13, 2025 19:18

Fixed bugs.

27631dd

Added temporal averaging test for BL flow.

38cd70c

b-fg marked this pull request as ready for review May 13, 2025 18:26

b-fg requested a review from weymouth May 13, 2025 18:27

b-fg added 2 commits May 13, 2025 21:10

Fixed MeanFlow parametric definition.

7171f2f

Fixed MeanFlow parametric definition (again).

07e8599

b-fg added 4 commits May 19, 2025 18:04

Managed to allow user to select flux function without resulting in al…

8cec5e5

…locations. This is currently done using dispatch by Value, and requires repeated code. Without this, when trying to get automatically specialized flux functions, it always allocates during runtime.

Clean up.

a3b83fd

Merge branch 'master' into turbulence_modelling

6cc0b2a

Fixed tests.

dc0aedd

b-fg mentioned this pull request May 21, 2025

Use preferences to switch between SIMD and KernelAbstractions #133

Merged

b-fg added 2 commits May 23, 2025 19:30

Merge branch 'vc/backends' into turbulence_modelling

30b0a5d

b-fg added 6 commits May 23, 2025 20:12

Fixed MeanFlow type and tests.

832cc95

Added JLD2 save and load routines for MeanFlow type (and tests)

f5c574d

Need to specialize KA kernels, not the wrapper function.

19d9e1a

Small fix.

3a2e96d

Merge branch 'master' into turbulence_modelling

b1d0816

Improved docs.

2f22bd1

b-fg merged commit 620ff38 into master May 24, 2025
7 checks passed

Conversation

b-fg commented Mar 10, 2025

Uh oh!

weymouth commented Mar 10, 2025

Uh oh!

b-fg commented Mar 11, 2025

Uh oh!

b-fg commented May 13, 2025

Uh oh!

b-fg commented May 13, 2025

Uh oh!

b-fg commented May 13, 2025

Uh oh!

b-fg commented May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vchuravy commented May 20, 2025

Uh oh!

b-fg commented May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vchuravy commented May 20, 2025

Uh oh!

b-fg commented May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vchuravy commented May 21, 2025

Uh oh!

b-fg commented May 21, 2025

Uh oh!

b-fg commented May 21, 2025

Uh oh!

b-fg commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

b-fg commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

b-fg commented May 23, 2025

Uh oh!

b-fg commented May 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

b-fg commented May 19, 2025 •

edited

Loading

b-fg commented May 20, 2025 •

edited

Loading

b-fg commented May 20, 2025 •

edited

Loading

b-fg commented May 21, 2025 •

edited

Loading

b-fg commented May 22, 2025 •

edited

Loading