Flux agnostic `@generated` 3D conservative volume turbo kernel by MarcoArtiano · Pull Request #3090 · trixi-framework/Trixi.jl

MarcoArtiano · 2026-06-18T23:28:27Z

After some discussion with @ranocha , it would be nice to have a generic volume turbo kernel, without the need of copy pasting the whole machinery and specialize it for each flux. In TrixiAtmo, that process would be quite annoying. @ranocha suggested to look at @generated functions as we need hand loop over generic nvariables and precomputed variables. Therefore we need a kernel that writes the equivalent hand written code in dg_compressible_euler_3d, but it is general for these two variables.

Claude AI has assisted me in the creation of the PR.

16 threads, for p4est_3d_dgsem/elixir_euler_ec.jl with tspan = (10.0, 0.0).

Plain turbo flux_ranocha_turbo

BenchmarkTools.Trial: 4 samples with 1 evaluation per sample.
 Range (min … max):  1.572 s …   1.624 s  ┊ GC (min … max): 0.00% … 1.62%
 Time  (median):     1.579 s              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.589 s ± 24.180 ms  ┊ GC (mean ± σ):  0.41% ± 0.81%

  █    █    █                                             █
  █▁▁▁▁█▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  1.57 s         Histogram: frequency by time        1.62 s <

 Memory estimate: 27.09 MiB, allocs estimate: 45651.

Generated code with FluxVolumeTurbo(flux_ranocha)

BenchmarkTools.Trial: 4 samples with 1 evaluation per sample.
 Range (min … max):  1.596 s …   1.671 s  ┊ GC (min … max): 0.00% … 1.87%
 Time  (median):     1.609 s              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.621 s ± 35.386 ms  ┊ GC (mean ± σ):  0.48% ± 0.93%

  █                  ▁                                    ▁
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  1.6 s          Histogram: frequency by time        1.67 s <

 Memory estimate: 27.10 MiB, allocs estimate: 45679.

Plain implementation with flux_ranocha (just for completeness)

BenchmarkTools.Trial: 3 samples with 1 evaluation per sample.
 Range (min … max):  2.430 s …   2.554 s  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.432 s              ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.472 s ± 71.102 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █                                                       ▁
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  2.43 s         Histogram: frequency by time        2.55 s <

 Memory estimate: 27.09 MiB, allocs estimate: 45653.

The current implementation allows us also to use any volume flux, even though it is not guaranteed that the code will work with the macro @turbo.

Plain implementation for flux_shima_etal

BenchmarkTools.Trial: 3 samples with 1 evaluation per sample.
 Range (min … max):  2.028 s …   2.066 s  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.037 s              ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.043 s ± 19.837 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █            █                                          █
  █▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  2.03 s         Histogram: frequency by time        2.07 s <

 Memory estimate: 27.09 MiB, allocs estimate: 45654.

FluxVolumeTurbo(flux_shima_etal), note that this flux does not have the specialization in the generated code. It means that is is using the generated code, but we hare not precomputing the primitive variables.

BenchmarkTools.Trial: 3 samples with 1 evaluation per sample.
 Range (min … max):  1.661 s …   1.693 s  ┊ GC (min … max): 0.00% … 0.36%
 Time  (median):     1.662 s              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.672 s ± 18.437 ms  ┊ GC (mean ± σ):  0.12% ± 0.21%

  █                                                       ▁
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  1.66 s         Histogram: frequency by time        1.69 s <

 Memory estimate: 27.10 MiB, allocs estimate: 45680.

Turbo hand-written implementation of flux_shima_etal_turbo. Here we are precomputing the primitive variables, as we are using the pre-existing optimization. That can also be specified with the generic generated code.

BenchmarkTools.Trial: 4 samples with 1 evaluation per sample.
 Range (min … max):  1.446 s …   1.473 s  ┊ GC (min … max): 0.00% … 0.60%
 Time  (median):     1.448 s              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.454 s ± 13.157 ms  ┊ GC (mean ± σ):  0.15% ± 0.30%

  █  ██                                                   █
  █▁▁██▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  1.45 s         Histogram: frequency by time        1.47 s <

 Memory estimate: 27.09 MiB, allocs estimate: 45649.

Nonconservative terms benchmarks

The nonconservative implementation has been moved to #3094.
MHD with nonconservative terms for p4est_3d_dgsem/elixir_mhd_alfven_wave_nonperiodic.jl
Plain implementation volume_flux = (flux_hindenlang_gassner, flux_nonconservative_powell)

BenchmarkTools.Trial: 4 samples with 1 evaluation per sample.
 Range (min … max):  1.318 s …   1.366 s  ┊ GC (min … max): 0.00% … 2.25%
 Time  (median):     1.324 s              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.333 s ± 22.366 ms  ┊ GC (mean ± σ):  0.58% ± 1.13%

  ██           █                                          █
  ██▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  1.32 s         Histogram: frequency by time        1.37 s <

 Memory estimate: 33.32 MiB, allocs estimate: 9460.

FluxVolumeTurbo(flux_hindenlang_gassner, flux_nonconservative_powell) this does not precompute the primitive variables.

BenchmarkTools.Trial: 6 samples with 1 evaluation per sample.
 Range (min … max):  982.983 ms …   1.035 s  ┊ GC (min … max): 0.00% … 2.90%
 Time  (median):     984.219 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   993.311 ms ± 20.522 ms  ┊ GC (mean ± σ):  0.50% ± 1.19%

  ██      ▁                                                  ▁
  ██▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  983 ms          Histogram: frequency by time          1.03 s <

 Memory estimate: 33.35 MiB, allocs estimate: 9512.

Combined approach in p4est_3d_dgsem/elixir_mhd_alfven_wave_combined_fluxes_nonperiodic.jl

BenchmarkTools.Trial: 4 samples with 1 evaluation per sample.
 Range (min … max):  1.226 s …    1.565 s  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.231 s               ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.313 s ± 167.761 ms  ┊ GC (mean ± σ):  0.11% ± 0.23%

  █▁                                                       ▁
  ██▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  1.23 s         Histogram: frequency by time         1.56 s <

 Memory estimate: 33.32 MiB, allocs estimate: 9446.

So, the generic code that accepts theoretically any volume flux, without the additional effort of specializing, already provides a decent speed up. If someone is willing to invest the time to write down the small 2 specialized functions, to precompute primitive variables and the flux in terms of primitive variables, then, as for the preexisting implementation, we reach the same speed up, with reduce effort of copy pasting the whole volume kernel and writing the flux in each direction. In summary, three ingredients need to be specified:

number of flux auxiliary variables (or precomputed variables)
transformation from cons to flux precomputed variables
numerical flux that accepts directly the precomputed variables

github-actions · 2026-06-18T23:28:38Z

codecov · 2026-06-18T23:57:12Z

Codecov Report

❌ Patch coverage is 93.18182% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.87%. Comparing base (c9f3657) to head (72547ef).

Files with missing lines	Patch %	Lines
src/auxiliary/math.jl	7.69%	12 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3090      +/-   ##
==========================================
- Coverage   96.88%   96.87%   -0.01%     
==========================================
  Files         647      648       +1     
  Lines       50035    50211     +176     
==========================================
+ Hits        48475    48639     +164     
- Misses       1560     1572      +12

Flag	Coverage Δ
unittests	`96.87% <93.18%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

MarcoArtiano · 2026-06-19T10:26:27Z

+@inline function volume_flux_turbo(volume_flux::typeof(flux_ranocha_turbo),
+                                   have_nonconservative_terms::False,


I'm not happy with this design choice, as I would like to avoid the type of the turbo flux. However, this choice avoids repeating the line can_turbo for each new turbo flux.

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Co-authored-by: Marco Artiano <57838732+MarcoArtiano@users.noreply.github.com>

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

ranocha

Thanks! Can you please add a NEWS.md entry as well?

Co-authored-by: Hendrik Ranocha <ranocha@users.noreply.github.com>

ranocha

Thanks! Could you please also add a test with FluxTurbo for something where no specialization is implemented, e.g., on a mesh not supported at the moment or with some strange numerical types not supported by LoopVectorization.jl, e.g., BigFloat on a simple and small 1D problem?

ranocha · 2026-06-23T19:03:06Z

+@inline function volume_flux_turbo(volume_flux, have_nonconservative_terms::False,
+                                   aux_and_normals_and_equations...)
+    equations = last(aux_and_normals_and_equations)
+    n = nvariables(equations)
+    u_ll = SVector(ntuple(v -> aux_and_normals_and_equations[v], Val(n)))
+    u_rr = SVector(ntuple(v -> aux_and_normals_and_equations[n + v], Val(n)))
+    normal_direction = SVector(aux_and_normals_and_equations[end - 3],
+                               aux_and_normals_and_equations[end - 2],
+                               aux_and_normals_and_equations[end - 1])
+    return volume_flux(u_ll, u_rr, normal_direction, equations)
+end


Can you please adapt the "aux" name here as well, e.g., turbovars or something like that for consistency with the other names?

Co-authored-by: Hendrik Ranocha <ranocha@users.noreply.github.com>

MarcoArtiano added 4 commits June 19, 2026 00:19

add first working version

610c7e4

add tests

8555250

format

eb860cf

minor fix

ab3063a

MarcoArtiano added 6 commits June 19, 2026 11:11

nonconservative unstable kernel

f3bc84d

format

6567042

fix nonconservative kernel + tests

3d06855

increase readability

690bfd4

fix tests

8bba097

rename normal to normal direction

8256ff1

MarcoArtiano changed the title ~~WIP: @generated 3D volume turbo kernel~~ WIP: Flux agnostic @generated 3D volume turbo kernel Jun 19, 2026

explicit AbstractSIMD

160023b

MarcoArtiano commented Jun 19, 2026

View reviewed changes

Comment thread src/solvers/dgsem_structured/dg_3d_turbo.jl Outdated

MarcoArtiano commented Jun 19, 2026

View reviewed changes

Comment thread src/solvers/dgsem_structured/dg_3d_turbo.jl Outdated

MarcoArtiano commented Jun 19, 2026

View reviewed changes

MarcoArtiano added discussion performance We are greedy labels Jun 19, 2026

MarcoArtiano and others added 7 commits June 19, 2026 12:34

Update src/solvers/dgsem_structured/dg_3d_turbo.jl

61e96fa

Update src/solvers/dgsem_structured/dg_3d_turbo.jl

ac3a2d8

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

simplify flux args

80a9813

add comments

4a0a3c1

remove nonconservative part

2a8c436

minor change

86ffed2

Merge branch 'main' into ma/generated_turbo

9e731e5

MarcoArtiano changed the title ~~WIP: Flux agnostic @generated 3D volume turbo kernel~~ Flux agnostic @generated 3D conservative volume turbo kernel Jun 20, 2026

MarcoArtiano marked this pull request as ready for review June 20, 2026 08:09

MarcoArtiano commented Jun 20, 2026

View reviewed changes

Comment thread src/solvers/dgsem_structured/dg_3d_turbo.jl Outdated

move fluxes to numerical fluxes

8c718d4

change export position

11e3551

MarcoArtiano requested a review from ranocha June 20, 2026 08:25

MarcoArtiano commented Jun 20, 2026

View reviewed changes

Comment thread src/equations/numerical_fluxes.jl Outdated

MarcoArtiano and others added 3 commits June 20, 2026 10:43

fix typo

d34c047

Co-authored-by: Marco Artiano <57838732+MarcoArtiano@users.noreply.github.com>

Update src/equations/numerical_fluxes.jl

93f0373

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Merge branch 'main' into ma/generated_turbo

72547ef

ranocha requested changes Jun 23, 2026

View reviewed changes

MarcoArtiano and others added 9 commits June 23, 2026 17:15

Apply suggestions from code review

3659c1e

Co-authored-by: Hendrik Ranocha <ranocha@users.noreply.github.com>

apply some code review suggestions

4c9dedb

more robust tests

7a51c08

remove inner constructor

2a1f47a

add news and minor changes

c2fe52d

Merge branch 'main' into ma/generated_turbo

18c98d3

improve docstring

5eac91d

fix tests

8446f34

fix name

d70ceab

ranocha requested changes Jun 23, 2026

View reviewed changes

Apply suggestions from code review

8a61bb7

Co-authored-by: Hendrik Ranocha <ranocha@users.noreply.github.com>

		@inline function volume_flux_turbo(volume_flux::typeof(flux_ranocha_turbo),
		have_nonconservative_terms::False,

Conversation

MarcoArtiano commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Nonconservative terms benchmarks

Uh oh!

github-actions Bot commented Jun 18, 2026

Review checklist

Purpose and scope

Code quality

Documentation

Testing

Performance

Verification

Uh oh!

codecov Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

MarcoArtiano Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ranocha left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ranocha left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ranocha Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MarcoArtiano commented Jun 18, 2026 •

edited

Loading

codecov Bot commented Jun 18, 2026 •

edited

Loading

MarcoArtiano Jun 19, 2026 •

edited

Loading