Skip to content

[WIP] Flux agnostic @generated 3D nonconservative volume turbo kernel#3094

Draft
MarcoArtiano wants to merge 5 commits into
ma/generated_turbofrom
ma/generated_turbo_noncons
Draft

[WIP] Flux agnostic @generated 3D nonconservative volume turbo kernel#3094
MarcoArtiano wants to merge 5 commits into
ma/generated_turbofrom
ma/generated_turbo_noncons

Conversation

@MarcoArtiano

@MarcoArtiano MarcoArtiano commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

The generated code mimics the optimized hand-written code in trixi-framework/TrixiAtmo.jl#128.

Nonconservative terms benchmarks

MHD with nonconservative terms for p4est_3d_dgsem/elixir_mhd_alfven_wave_nonperiodic.jl
Plain implementation volume_flux = (flux_hindenlang_gassner, flux_nonconservative_powell)

BenchmarkTools.Trial: 4 samples with 1 evaluation per sample.
 Range (min  max):  1.318 s    1.366 s  ┊ GC (min  max): 0.00%  2.25%
 Time  (median):     1.324 s              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.333 s ± 22.366 ms  ┊ GC (mean ± σ):  0.58% ± 1.13%

  ██           █                                          █
  ██▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  1.32 s         Histogram: frequency by time        1.37 s <

 Memory estimate: 33.32 MiB, allocs estimate: 9460.

FluxVolumeTurbo(flux_hindenlang_gassner, flux_nonconservative_powell) this does not precompute the primitive variables.

BenchmarkTools.Trial: 6 samples with 1 evaluation per sample.
 Range (min  max):  982.983 ms    1.035 s  ┊ GC (min  max): 0.00%  2.90%
 Time  (median):     984.219 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   993.311 ms ± 20.522 ms  ┊ GC (mean ± σ):  0.50% ± 1.19%

  ██      ▁                                                  ▁
  ██▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  983 ms          Histogram: frequency by time          1.03 s <

 Memory estimate: 33.35 MiB, allocs estimate: 9512.

Combined approach in p4est_3d_dgsem/elixir_mhd_alfven_wave_combined_fluxes_nonperiodic.jl

BenchmarkTools.Trial: 4 samples with 1 evaluation per sample.
 Range (min  max):  1.226 s     1.565 s  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     1.231 s               ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.313 s ± 167.761 ms  ┊ GC (mean ± σ):  0.11% ± 0.23%

  █▁                                                       ▁
  ██▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  1.23 s         Histogram: frequency by time         1.56 s <

 Memory estimate: 33.32 MiB, allocs estimate: 9446.

@github-actions

Copy link
Copy Markdown
Contributor

Review checklist

This checklist is meant to assist creators of PRs (to let them know what reviewers will typically look for) and reviewers (to guide them in a structured review process). Items do not need to be checked explicitly for a PR to be eligible for merging.

Purpose and scope

  • The PR has a single goal that is clear from the PR title and/or description.
  • All code changes represent a single set of modifications that logically belong together.
  • No more than 500 lines of code are changed or there is no obvious way to split the PR into multiple PRs.

Code quality

  • The code can be understood easily.
  • Newly introduced names for variables etc. are self-descriptive and consistent with existing naming conventions.
  • There are no redundancies that can be removed by simple modularization/refactoring.
  • There are no leftover debug statements or commented code sections.
  • The code adheres to our conventions and style guide, and to the Julia guidelines.

Documentation

  • New functions and types are documented with a docstring or top-level comment.
  • Relevant publications are referenced in docstrings (see example for formatting).
  • Inline comments are used to document longer or unusual code sections.
  • Comments describe intent ("why?") and not just functionality ("what?").
  • If the PR introduces a significant change or new feature, it is documented in NEWS.md with its PR number.

Testing

  • The PR passes all tests.
  • New or modified lines of code are covered by tests.
  • New or modified tests run in less then 10 seconds.

Performance

  • There are no type instabilities or memory allocations in performance-critical parts.
  • If the PR intent is to improve performance, before/after time measurements are posted in the PR.

Verification

  • The correctness of the code was verified using appropriate tests.
  • If new equations/methods are added, a convergence test has been run and the results
    are posted in the PR.

Created with ❤️ by the Trixi.jl community.

@MarcoArtiano MarcoArtiano changed the base branch from main to ma/generated_turbo June 20, 2026 08:06
@MarcoArtiano MarcoArtiano changed the title [WIP] Flux agnostic @generated 3D nonconservative volume turbo kernel#3090 [WIP] Flux agnostic @generated 3D nonconservative volume turbo kernel Jun 20, 2026
@MarcoArtiano MarcoArtiano added the performance We are greedy label Jun 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance We are greedy

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant