-
Notifications
You must be signed in to change notification settings - Fork 156
Add GPU kernel for calc_sources!
#3012
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
36 commits
Select commit
Hold shift + click to select a range
d50532f
add source term gpu kernel
MarcoArtiano 453f04d
add dispatch for no source terms
MarcoArtiano 5b7e0e8
add 1d and 3d source gpu kernels
MarcoArtiano 158e481
specify signature
MarcoArtiano 8f8b0ef
add tests
MarcoArtiano 5add261
Update test/test_amdgpu_2d.jl
JoshuaLampert 0a55992
Update test/test_amdgpu_2d.jl
JoshuaLampert 244504d
refactoring
MarcoArtiano d897a73
format
MarcoArtiano 74e140d
refactor also 3D kernels
MarcoArtiano 7ecf686
Merge branch 'main' into ma/source_gpu
MarcoArtiano f1209fe
format
MarcoArtiano c978036
fix tests
MarcoArtiano 5a5948f
fix tests
MarcoArtiano d66c664
fix elixirs
MarcoArtiano b9bbea5
fix tests
MarcoArtiano 69f13e4
Merge branch 'main' into ma/source_gpu
MarcoArtiano 55ded1f
fix tests
MarcoArtiano 303cd35
Apply suggestions from code review
MarcoArtiano 8733fdd
rename tests, delete 1D P4est
MarcoArtiano 01c2764
add kernel abstraction tests
MarcoArtiano cf75145
add cuda tests
MarcoArtiano 20e890d
fix tests
MarcoArtiano 6d8c9d0
Merge branch 'main' into ma/source_gpu
ranocha 031528d
fix allocations
MarcoArtiano 6044e49
fix allocations 3D
MarcoArtiano e19fe5f
fix allocations
MarcoArtiano 8fda9ee
Merge branch 'main' into ma/source_gpu
ranocha f3e650d
Apply suggestions from code review
MarcoArtiano b15439e
Update test/test_amdgpu_3d.jl
MarcoArtiano 594eefd
Merge branch 'main' into ma/source_gpu
MarcoArtiano 7f62e6e
Apply suggestions from code review
MarcoArtiano a743ff5
Merge branch 'main' into ma/source_gpu
MarcoArtiano d4f6ff9
add comments about storage_type and real_type choices
MarcoArtiano d66b690
Merge branch 'main' into ma/source_gpu
MarcoArtiano 34215b6
Apply suggestions from code review
MarcoArtiano File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,70 @@ | ||
| # The same setup as tree_2d_dgsem/elixir_euler_source_terms.jl | ||
| # to verify the P4estMesh implementation against TreeMesh | ||
|
|
||
| using OrdinaryDiffEqLowStorageRK | ||
| using Trixi | ||
|
|
||
| ############################################################################### | ||
| # semidiscretization of the compressible Euler equations | ||
| gamma = 1.4 | ||
| equations = CompressibleEulerEquations2D(gamma) | ||
|
|
||
| initial_condition = initial_condition_convergence_test | ||
|
|
||
| # Up to version 0.13.0, `max_abs_speed_naive` was used as the default wave speed estimate of | ||
| # `const flux_lax_friedrichs = FluxLaxFriedrichs(), i.e., `FluxLaxFriedrichs(max_abs_speed = max_abs_speed_naive)`. | ||
| # In the `StepsizeCallback`, though, the less diffusive `max_abs_speeds` is employed which is consistent with `max_abs_speed`. | ||
| # Thus, we exchanged in PR#2458 the default wave speed used in the LLF flux to `max_abs_speed`. | ||
| # To ensure that every example still runs we specify explicitly `FluxLaxFriedrichs(max_abs_speed_naive)`. | ||
| # We remark, however, that the now default `max_abs_speed` is in general recommended due to compliance with the | ||
| # `StepsizeCallback` (CFL-Condition) and less diffusion. | ||
| solver = DGSEM(polydeg = 3, surface_flux = FluxLaxFriedrichs(max_abs_speed_naive)) | ||
|
|
||
| coordinates_min = (0.0, 0.0) | ||
| coordinates_max = (2.0, 2.0) | ||
|
|
||
| trees_per_dimension = (16, 16) | ||
| mesh = P4estMesh(trees_per_dimension, | ||
| polydeg = 3, initial_refinement_level = 0, | ||
| coordinates_min = coordinates_min, coordinates_max = coordinates_max, | ||
| periodicity = true) | ||
|
|
||
| semi = SemidiscretizationHyperbolic(mesh, equations, initial_condition, solver; | ||
| source_terms = source_terms_convergence_test, | ||
| boundary_conditions = boundary_condition_periodic) | ||
|
|
||
| ############################################################################### | ||
| # ODE solvers, callbacks etc. | ||
|
|
||
| tspan = (0.0, 2.0) | ||
| # Create ODE problem with time span from 0.0 to 1.0 | ||
| # Setting `real_type` allows to change the real number type, e.g., to `Float32`. | ||
| # This is particularly useful when changing the `storage_type` to a GPU array | ||
| # type such as `ROCArray` (AMD) or `CuArray` (NVIDIA CUDA). | ||
| ode = semidiscretize(semi, tspan; real_type = nothing, storage_type = nothing) | ||
|
MarcoArtiano marked this conversation as resolved.
|
||
|
|
||
| summary_callback = SummaryCallback() | ||
|
|
||
| analysis_interval = 100 | ||
| analysis_callback = AnalysisCallback(semi, interval = analysis_interval) | ||
|
|
||
| alive_callback = AliveCallback(analysis_interval = analysis_interval) | ||
|
|
||
| save_solution = SaveSolutionCallback(interval = 100, | ||
| save_initial_solution = true, | ||
| save_final_solution = true, | ||
| solution_variables = cons2prim) | ||
|
|
||
| stepsize_callback = StepsizeCallback(cfl = 1.0) | ||
|
|
||
| callbacks = CallbackSet(summary_callback, | ||
| analysis_callback, alive_callback, | ||
| save_solution, | ||
| stepsize_callback) | ||
|
|
||
| ############################################################################### | ||
| # run the simulation | ||
|
|
||
| sol = solve(ode, CarpenterKennedy2N54(williamson_condition = false); | ||
| dt = 1.0, # solve needs some value here but it will be overwritten by the stepsize_callback | ||
| ode_default_options()..., callback = callbacks); | ||
|
ranocha marked this conversation as resolved.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,72 @@ | ||
| # The same setup as tree_3d_dgsem/elixir_euler_source_terms.jl | ||
| # to verify the StructuredMesh implementation against TreeMesh | ||
|
|
||
| using OrdinaryDiffEqLowStorageRK | ||
| using Trixi | ||
|
|
||
| ############################################################################### | ||
| # semidiscretization of the compressible Euler equations | ||
| gamma = 1.4 | ||
| equations = CompressibleEulerEquations3D(gamma) | ||
|
|
||
| initial_condition = initial_condition_convergence_test | ||
|
|
||
| # Up to version 0.13.0, `max_abs_speed_naive` was used as the default wave speed estimate of | ||
| # `const flux_lax_friedrichs = FluxLaxFriedrichs(), i.e., `FluxLaxFriedrichs(max_abs_speed = max_abs_speed_naive)`. | ||
| # In the `StepsizeCallback`, though, the less diffusive `max_abs_speeds` is employed which is consistent with `max_abs_speed`. | ||
| # Thus, we exchanged in PR#2458 the default wave speed used in the LLF flux to `max_abs_speed`. | ||
| # To ensure that every example still runs we specify explicitly `FluxLaxFriedrichs(max_abs_speed_naive)`. | ||
| # We remark, however, that the now default `max_abs_speed` is in general recommended due to compliance with the | ||
| # `StepsizeCallback` (CFL-Condition) and less diffusion. | ||
| solver = DGSEM(polydeg = 3, surface_flux = FluxLaxFriedrichs(max_abs_speed_naive), | ||
| volume_integral = VolumeIntegralWeakForm()) | ||
|
|
||
| coordinates_min = (0.0, 0.0, 0.0) | ||
| coordinates_max = (2.0, 2.0, 2.0) | ||
|
|
||
| trees_per_dimension = (4, 4, 4) | ||
|
|
||
| mesh = P4estMesh(trees_per_dimension, polydeg = 3, | ||
| coordinates_min = coordinates_min, coordinates_max = coordinates_max, | ||
| initial_refinement_level = 1, | ||
| periodicity = true) | ||
|
|
||
| semi = SemidiscretizationHyperbolic(mesh, equations, initial_condition, solver; | ||
| source_terms = source_terms_convergence_test, | ||
| boundary_conditions = boundary_condition_periodic) | ||
|
|
||
| ############################################################################### | ||
| # ODE solvers, callbacks etc. | ||
|
|
||
| tspan = (0.0, 5.0) | ||
| # Create ODE problem with time span from 0.0 to 1.0 | ||
| # Setting `real_type` allows to change the real number type, e.g., to `Float32`. | ||
| # This is particularly useful when changing the `storage_type` to a GPU array | ||
| # type such as `ROCArray` (AMD) or `CuArray` (NVIDIA CUDA). | ||
| ode = semidiscretize(semi, tspan; real_type = nothing, storage_type = nothing) | ||
|
MarcoArtiano marked this conversation as resolved.
|
||
|
|
||
| summary_callback = SummaryCallback() | ||
|
|
||
| analysis_interval = 100 | ||
| analysis_callback = AnalysisCallback(semi, interval = analysis_interval) | ||
|
|
||
| alive_callback = AliveCallback(analysis_interval = analysis_interval) | ||
|
|
||
| save_solution = SaveSolutionCallback(interval = 100, | ||
| save_initial_solution = true, | ||
| save_final_solution = true, | ||
| solution_variables = cons2prim) | ||
|
|
||
| stepsize_callback = StepsizeCallback(cfl = 0.6) | ||
|
|
||
| callbacks = CallbackSet(summary_callback, | ||
| analysis_callback, alive_callback, | ||
| save_solution, | ||
| stepsize_callback) | ||
|
|
||
| ############################################################################### | ||
| # run the simulation | ||
|
|
||
| sol = solve(ode, CarpenterKennedy2N54(williamson_condition = false); | ||
| dt = 1.0, # solve needs some value here but it will be overwritten by the stepsize_callback | ||
| ode_default_options()..., callback = callbacks); | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.