Add combined nonconservative GPU volume and boundary condition kernels#3065
Add combined nonconservative GPU volume and boundary condition kernels#3065MarcoArtiano wants to merge 40 commits into
Conversation
Review checklistThis checklist is meant to assist creators of PRs (to let them know what reviewers will typically look for) and reviewers (to guide them in a structured review process). Items do not need to be checked explicitly for a PR to be eligible for merging. Purpose and scope
Code quality
Documentation
Testing
Performance
Verification
Created with ❤️ by the Trixi.jl community. |
…ramework/Trixi.jl into ma/combined_noncons_mpi
Co-authored-by: Marco Artiano <57838732+MarcoArtiano@users.noreply.github.com>
Co-authored-by: Marco Artiano <57838732+MarcoArtiano@users.noreply.github.com>
Co-authored-by: Marco Artiano <57838732+MarcoArtiano@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3065 +/- ##
==========================================
- Coverage 96.88% 96.84% -0.05%
==========================================
Files 647 647
Lines 50035 50079 +44
==========================================
+ Hits 48475 48494 +19
- Misses 1560 1585 +25
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
benegee
left a comment
There was a problem hiding this comment.
Thanks again @MarcoArtiano !
I just have a few questions.
|
|
||
| callbacks = CallbackSet(summary_callback, | ||
| analysis_callback, alive_callback, | ||
| save_solution, |
There was a problem hiding this comment.
I'm sorry, that is when I was trying to debug and understand which callback was failing on GPU. I restored the callback.
| Ja1_avg = 0.5f0 * (Ja1_node + Ja1_node_ii) | ||
| # compute the contravariant volume flux in the direction of the | ||
| # averaged contravariant vector | ||
| fluxtilde1_left, fluxtilde1_right = volume_flux(u_node, u_node_ii, Ja1_avg, |
There was a problem hiding this comment.
This (and similar lines below) is main difference compared to the combine_conservative_and_nonconservative_fluxes::False version, right?
| # Note the factor 0.5 necessary for the nonconservative fluxes based on | ||
| # the interpretation of global SBP operators coupled discontinuously via | ||
| # central fluxes/SATs |
There was a problem hiding this comment.
No, this one should be removed.
| @inline function calc_boundary_flux!(u, surface_flux_values, t, boundary_condition, | ||
| MeshT::Type{<:Union{P4estMesh{3}, | ||
| T8codeMesh{3}}}, | ||
| have_nonconservative_terms::True, | ||
| combine_conservative_and_nonconservative_fluxes::True, | ||
| equations, | ||
| surface_integral, dg::DG, cache, i_index, j_index, | ||
| k_index, i_node_index, j_node_index, | ||
| direction_index, | ||
| element_index, boundary_index, node_coordinates, | ||
| contravariant_vectors) |
There was a problem hiding this comment.
I'm certainly missing a detail, but this function body looks identical to the one for have_nonconservative_terms::False.
There was a problem hiding this comment.
This is consistent with:
Trixi.jl/src/solvers/dgsem_p4est/dg_3d.jl
Lines 612 to 649 in 099f4e1
ranocha
left a comment
There was a problem hiding this comment.
Thanks! I have a few questions/comments.
| @trixi_testset "elixir_mhd_alfven_wave_combined_fluxes_nonperiodic.jl Float32" begin | ||
| using Trixi | ||
| @test_trixi_include(joinpath(EXAMPLES_DIR, "p4est_3d_dgsem", | ||
| "elixir_mhd_alfven_wave_combined_fluxes_nonperiodic.jl"), | ||
| l2=Float32[0.00021050235826592327, | ||
| 0.0006558863204839041, | ||
| 0.0002821364444400733, | ||
| 0.000794748435433683, | ||
| 0.0006839039307848098, | ||
| 0.0006743445524692008, | ||
| 0.000318156924452865, | ||
| 0.0007885451771559438, | ||
| 4.811726173404515e-5], | ||
| linf=Float32[0.0012031070350810857, | ||
| 0.004106999758487398, | ||
| 0.001783097816025008, | ||
| 0.004780625055122056, | ||
| 0.005095902318184908, | ||
| 0.003922455893839549, | ||
| 0.002515549802432071, | ||
| 0.004448527671538249, | ||
| 0.00019839944646198146], | ||
| RealT_for_test_tolerances=Float32, | ||
| real_type=Float32) | ||
| # Ensure that we do not have excessive memory allocations | ||
| # (e.g., from type instabilities) | ||
| semi = ode.p # `semidiscretize` adapts the semi, so we need to obtain it from the ODE problem. | ||
| @test_allocations(Trixi.rhs!, semi, sol, 2_000_000) | ||
| end |
There was a problem hiding this comment.
Can you please add the tests whether the types of everything are correct to this new testset, soo?
There was a problem hiding this comment.
For the tests with KA as backend, we have never provided these type of tests. However, for the GPU backends type tests are present.
I'm not sure about the reason why these tests are missing and eventually we should open an issue and provide all the KA tests with type tests, similarly to the GPU tests.
Co-authored-by: Hendrik Ranocha <ranocha@users.noreply.github.com>
Co-authored-by: Marco Artiano <57838732+MarcoArtiano@users.noreply.github.com>
The callback for the divergence is broken on the GPU, because we pass the full cache on the GPU, which is not filtering to not contain not-bitstype. I just avoided having these callbacks in the tests when running on the GPU by changing the
analysis_callback.Some additional notes: the volume kernel implemented only works if the volume flux is implemented following the
combine_conservative_and_nonconservative_fluxes. The boundary conditions instead are for the moment only for the original implementation of the nonconservative terms. To summarize, nonconservative systems on GPU now runs only if the volume flux is passed ascombine_conservative_and_nonconservative_fluxesand the surface flux is passed assurface_flux = (flux_conservative, flux_nonconservative).Note that the
calc_interface_flux!already provides the flexibility to run with plain original implementation and combined option for nonconservative systems on the GPU.