Skip to content

Add combined nonconservative GPU volume and boundary condition kernels#3065

Open
MarcoArtiano wants to merge 40 commits into
mainfrom
ma/noncons_gpu
Open

Add combined nonconservative GPU volume and boundary condition kernels#3065
MarcoArtiano wants to merge 40 commits into
mainfrom
ma/noncons_gpu

Conversation

@MarcoArtiano

@MarcoArtiano MarcoArtiano commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

The callback for the divergence is broken on the GPU, because we pass the full cache on the GPU, which is not filtering to not contain not-bitstype. I just avoided having these callbacks in the tests when running on the GPU by changing the analysis_callback.

Some additional notes: the volume kernel implemented only works if the volume flux is implemented following the combine_conservative_and_nonconservative_fluxes. The boundary conditions instead are for the moment only for the original implementation of the nonconservative terms. To summarize, nonconservative systems on GPU now runs only if the volume flux is passed as combine_conservative_and_nonconservative_fluxes and the surface flux is passed as surface_flux = (flux_conservative, flux_nonconservative).

Note that the calc_interface_flux! already provides the flexibility to run with plain original implementation and combined option for nonconservative systems on the GPU.

@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Review checklist

This checklist is meant to assist creators of PRs (to let them know what reviewers will typically look for) and reviewers (to guide them in a structured review process). Items do not need to be checked explicitly for a PR to be eligible for merging.

Purpose and scope

  • The PR has a single goal that is clear from the PR title and/or description.
  • All code changes represent a single set of modifications that logically belong together.
  • No more than 500 lines of code are changed or there is no obvious way to split the PR into multiple PRs.

Code quality

  • The code can be understood easily.
  • Newly introduced names for variables etc. are self-descriptive and consistent with existing naming conventions.
  • There are no redundancies that can be removed by simple modularization/refactoring.
  • There are no leftover debug statements or commented code sections.
  • The code adheres to our conventions and style guide, and to the Julia guidelines.

Documentation

  • New functions and types are documented with a docstring or top-level comment.
  • Relevant publications are referenced in docstrings (see example for formatting).
  • Inline comments are used to document longer or unusual code sections.
  • Comments describe intent ("why?") and not just functionality ("what?").
  • If the PR introduces a significant change or new feature, it is documented in NEWS.md with its PR number.

Testing

  • The PR passes all tests.
  • New or modified lines of code are covered by tests.
  • New or modified tests run in less then 10 seconds.

Performance

  • There are no type instabilities or memory allocations in performance-critical parts.
  • If the PR intent is to improve performance, before/after time measurements are posted in the PR.

Verification

  • The correctness of the code was verified using appropriate tests.
  • If new equations/methods are added, a convergence test has been run and the results
    are posted in the PR.

Created with ❤️ by the Trixi.jl community.

MarcoArtiano and others added 7 commits June 11, 2026 19:32
…ramework/Trixi.jl into ma/combined_noncons_mpi
Co-authored-by: Marco Artiano <57838732+MarcoArtiano@users.noreply.github.com>
Co-authored-by: Marco Artiano <57838732+MarcoArtiano@users.noreply.github.com>
@MarcoArtiano MarcoArtiano changed the base branch from main to ma/combined_noncons_mpi June 11, 2026 19:39
Comment thread examples/p4est_3d_dgsem/elixir_mhd_alfven_wave_nonperiodic.jl Outdated
Co-authored-by: Marco Artiano <57838732+MarcoArtiano@users.noreply.github.com>
Base automatically changed from ma/combined_noncons_mpi to main June 12, 2026 08:25
MarcoArtiano and others added 2 commits June 12, 2026 12:13
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Comment thread test/test_amdgpu_3d.jl Outdated
Comment thread src/callbacks_step/glm_speed.jl Outdated
@codecov

codecov Bot commented Jun 22, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 48.97959% with 25 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.84%. Comparing base (c9f3657) to head (0f3e153).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/solvers/dgsem_p4est/dg_3d_gpu.jl 46.81% 25 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3065      +/-   ##
==========================================
- Coverage   96.88%   96.84%   -0.05%     
==========================================
  Files         647      647              
  Lines       50035    50079      +44     
==========================================
+ Hits        48475    48494      +19     
- Misses       1560     1585      +25     
Flag Coverage Δ
unittests 96.84% <48.98%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@MarcoArtiano MarcoArtiano marked this pull request as ready for review June 22, 2026 10:56

@benegee benegee left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @MarcoArtiano !

I just have a few questions.


callbacks = CallbackSet(summary_callback,
analysis_callback, alive_callback,
save_solution,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this dropped?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, that is when I was trying to debug and understand which callback was failing on GPU. I restored the callback.

Ja1_avg = 0.5f0 * (Ja1_node + Ja1_node_ii)
# compute the contravariant volume flux in the direction of the
# averaged contravariant vector
fluxtilde1_left, fluxtilde1_right = volume_flux(u_node, u_node_ii, Ja1_avg,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (and similar lines below) is main difference compared to the combine_conservative_and_nonconservative_fluxes::False version, right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly.

Comment thread src/solvers/dgsem_p4est/dg_3d_gpu.jl Outdated
Comment on lines +562 to +564
# Note the factor 0.5 necessary for the nonconservative fluxes based on
# the interpretation of global SBP operators coupled discontinuously via
# central fluxes/SATs

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this still apply?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this one should be removed.

Comment on lines +530 to +540
@inline function calc_boundary_flux!(u, surface_flux_values, t, boundary_condition,
MeshT::Type{<:Union{P4estMesh{3},
T8codeMesh{3}}},
have_nonconservative_terms::True,
combine_conservative_and_nonconservative_fluxes::True,
equations,
surface_integral, dg::DG, cache, i_index, j_index,
k_index, i_node_index, j_node_index,
direction_index,
element_index, boundary_index, node_coordinates,
contravariant_vectors)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm certainly missing a detail, but this function body looks identical to the one for have_nonconservative_terms::False.

@MarcoArtiano MarcoArtiano Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is consistent with:

@inline function calc_boundary_flux!(surface_flux_values, t, boundary_condition,
mesh::Union{P4estMesh{3}, T8codeMesh{3}},
have_nonconservative_terms::True,
combine_conservative_and_nonconservative_fluxes::True,
equations,
surface_integral, dg::DG, cache, i_index, j_index,
k_index, i_node_index, j_node_index,
direction_index,
element_index, boundary_index)
@unpack boundaries = cache
@unpack node_coordinates, contravariant_vectors = cache.elements
@unpack surface_flux = surface_integral
# Extract solution data from boundary container
u_inner = get_node_vars(boundaries.u, equations, dg, i_node_index, j_node_index,
boundary_index)
# Outward-pointing normal direction (not normalized)
normal_direction = get_normal_direction(direction_index, contravariant_vectors,
i_index, j_index, k_index, element_index)
# Coordinates at boundary node
x = get_node_coords(node_coordinates, equations, dg,
i_index, j_index, k_index, element_index)
# Call pointwise numerical flux functions for the conservative and nonconservative part
# in the normal direction on the boundary
flux = boundary_condition(u_inner, normal_direction, x, t,
surface_flux, equations)
# Copy flux to element storage in the correct orientation
for v in eachvariable(equations)
surface_flux_values[v, i_node_index, j_node_index,
direction_index, element_index] = flux[v]
end
return nothing
end

@ranocha ranocha added the performance We are greedy label Jun 23, 2026

@ranocha ranocha left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I have a few questions/comments.

Comment thread src/callbacks_step/glm_speed.jl Outdated
Comment thread src/solvers/dgsem_p4est/dg_3d_gpu.jl
Comment thread src/solvers/dgsem_p4est/dg_3d_gpu.jl
Comment on lines +211 to +239
@trixi_testset "elixir_mhd_alfven_wave_combined_fluxes_nonperiodic.jl Float32" begin
using Trixi
@test_trixi_include(joinpath(EXAMPLES_DIR, "p4est_3d_dgsem",
"elixir_mhd_alfven_wave_combined_fluxes_nonperiodic.jl"),
l2=Float32[0.00021050235826592327,
0.0006558863204839041,
0.0002821364444400733,
0.000794748435433683,
0.0006839039307848098,
0.0006743445524692008,
0.000318156924452865,
0.0007885451771559438,
4.811726173404515e-5],
linf=Float32[0.0012031070350810857,
0.004106999758487398,
0.001783097816025008,
0.004780625055122056,
0.005095902318184908,
0.003922455893839549,
0.002515549802432071,
0.004448527671538249,
0.00019839944646198146],
RealT_for_test_tolerances=Float32,
real_type=Float32)
# Ensure that we do not have excessive memory allocations
# (e.g., from type instabilities)
semi = ode.p # `semidiscretize` adapts the semi, so we need to obtain it from the ODE problem.
@test_allocations(Trixi.rhs!, semi, sol, 2_000_000)
end

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add the tests whether the types of everything are correct to this new testset, soo?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the tests with KA as backend, we have never provided these type of tests. However, for the GPU backends type tests are present.

I'm not sure about the reason why these tests are missing and eventually we should open an issue and provide all the KA tests with type tests, similarly to the GPU tests.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would be good 👍

Co-authored-by: Hendrik Ranocha <ranocha@users.noreply.github.com>
Comment thread src/solvers/dgsem_p4est/dg_3d_gpu.jl Outdated
Co-authored-by: Marco Artiano <57838732+MarcoArtiano@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gpu performance We are greedy

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants