Flux agnostic @generated 3D conservative volume turbo kernel#3090
Flux agnostic @generated 3D conservative volume turbo kernel#3090MarcoArtiano wants to merge 33 commits into
@generated 3D conservative volume turbo kernel#3090Conversation
Review checklistThis checklist is meant to assist creators of PRs (to let them know what reviewers will typically look for) and reviewers (to guide them in a structured review process). Items do not need to be checked explicitly for a PR to be eligible for merging. Purpose and scope
Code quality
Documentation
Testing
Performance
Verification
Created with ❤️ by the Trixi.jl community. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3090 +/- ##
==========================================
- Coverage 96.88% 96.87% -0.01%
==========================================
Files 647 648 +1
Lines 50035 50211 +176
==========================================
+ Hits 48475 48639 +164
- Misses 1560 1572 +12
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
@generated 3D volume turbo kernel@generated 3D volume turbo kernel
| @inline function volume_flux_turbo(volume_flux::typeof(flux_ranocha_turbo), | ||
| have_nonconservative_terms::False, |
There was a problem hiding this comment.
I'm not happy with this design choice, as I would like to avoid the type of the turbo flux. However, this choice avoids repeating the line can_turbo for each new turbo flux.
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@generated 3D volume turbo kernel@generated 3D conservative volume turbo kernel
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
ranocha
left a comment
There was a problem hiding this comment.
Thanks! Can you please add a NEWS.md entry as well?
Co-authored-by: Hendrik Ranocha <ranocha@users.noreply.github.com>
ranocha
left a comment
There was a problem hiding this comment.
Thanks! Could you please also add a test with FluxTurbo for something where no specialization is implemented, e.g., on a mesh not supported at the moment or with some strange numerical types not supported by LoopVectorization.jl, e.g., BigFloat on a simple and small 1D problem?
| @inline function volume_flux_turbo(volume_flux, have_nonconservative_terms::False, | ||
| aux_and_normals_and_equations...) | ||
| equations = last(aux_and_normals_and_equations) | ||
| n = nvariables(equations) | ||
| u_ll = SVector(ntuple(v -> aux_and_normals_and_equations[v], Val(n))) | ||
| u_rr = SVector(ntuple(v -> aux_and_normals_and_equations[n + v], Val(n))) | ||
| normal_direction = SVector(aux_and_normals_and_equations[end - 3], | ||
| aux_and_normals_and_equations[end - 2], | ||
| aux_and_normals_and_equations[end - 1]) | ||
| return volume_flux(u_ll, u_rr, normal_direction, equations) | ||
| end |
There was a problem hiding this comment.
Can you please adapt the "aux" name here as well, e.g., turbovars or something like that for consistency with the other names?
Co-authored-by: Hendrik Ranocha <ranocha@users.noreply.github.com>
After some discussion with @ranocha , it would be nice to have a generic volume turbo kernel, without the need of copy pasting the whole machinery and specialize it for each flux. In TrixiAtmo, that process would be quite annoying. @ranocha suggested to look at
@generatedfunctions as we need hand loop over genericnvariablesand precomputed variables. Therefore we need a kernel that writes the equivalent hand written code indg_compressible_euler_3d, but it is general for these two variables.Claude AI has assisted me in the creation of the PR.
16 threads, for
p4est_3d_dgsem/elixir_euler_ec.jlwithtspan = (10.0, 0.0).Plain turbo
flux_ranocha_turboGenerated code with
FluxVolumeTurbo(flux_ranocha)Plain implementation with
flux_ranocha(just for completeness)The current implementation allows us also to use any volume flux, even though it is not guaranteed that the code will work with the macro
@turbo.Plain implementation for
flux_shima_etalFluxVolumeTurbo(flux_shima_etal), note that this flux does not have the specialization in the generated code. It means that is is using the generated code, but we hare not precomputing the primitive variables.Turbo hand-written implementation of
flux_shima_etal_turbo. Here we are precomputing the primitive variables, as we are using the pre-existing optimization. That can also be specified with the generic generated code.Nonconservative terms benchmarks
The nonconservative implementation has been moved to #3094.
MHD with nonconservative terms for
p4est_3d_dgsem/elixir_mhd_alfven_wave_nonperiodic.jlPlain implementation
volume_flux = (flux_hindenlang_gassner, flux_nonconservative_powell)FluxVolumeTurbo(flux_hindenlang_gassner, flux_nonconservative_powell)this does not precompute the primitive variables.Combined approach in
p4est_3d_dgsem/elixir_mhd_alfven_wave_combined_fluxes_nonperiodic.jlSo, the generic code that accepts theoretically any volume flux, without the additional effort of specializing, already provides a decent speed up. If someone is willing to invest the time to write down the small 2 specialized functions, to precompute primitive variables and the flux in terms of primitive variables, then, as for the preexisting implementation, we reach the same speed up, with reduce effort of copy pasting the whole volume kernel and writing the flux in each direction. In summary, three ingredients need to be specified: