Implement stage-level coupling for split integration by efaulhaber · Pull Request #1049 · trixi-framework/TrixiParticles.jl

efaulhaber · 2026-01-16T17:16:36Z

This PR

implements stage-level coupling as opposed to the previously used step-level coupling.
fixes a memory "leak". The sub-integrator was storing the split solution for every single sub-integration call (each fluid time step), which caused massive VRAM allocations for long-running GPU simulations.

In step-level coupling, the fluid is advanced a full step before the structure is advanced a full step. This reduces the stability of the main time integration, reducing the maximum stable time step by a factor of 2 (in my tests with Carpenter-Kennedy).
Stage-level coupling calls the sub-integration in every fluid RK stage. The advantage is that the stability properties of the time integrator are preserved. In the current version, this does not have a significant performance impact and is therefore the default (edit: it does not work for some RK schemes with non-monotonic stage times and is therefore not the default). Only for small ratios (like 2-5x smaller time step for the structure), the step-level coupling might be more efficient. For details on how this is implemented, check out my comment below. The implemented version is the one without "restart", and "predict" is enabled by default but can be disabled via kwarg.

Δt	Step-level coupling	Stage-level coupling
9e-4
1e-3
1.8e-3

For 1.9e-3, I get "instability detected" with stage-level coupling. The maximum stable time step of 1.8e-3 with stage-level coupling is the same as when making the ball a moving solid wall boundary (non-elastic).
Note that we cannot do this with a CFL because the StepsizeCallback has an upper limit of 1.2e-3 due to #1048.

Note that something is still not right, as the larger time step makes the ball fall deeper with stage-level coupling. (Edit: This is fixed with "predict" below.)

efaulhaber · 2026-01-20T15:56:16Z

I now implemented a slightly different benchmark simulation. A TLSPH square with E=1e8 and rho=1200 is fully submerged in a fluid with rho=1000. It starts with zero velocity and is then slowly sinking. I measure the stability in the form of the largest stable time step and the accuracy in the form of deviation from a simulation with a very small time step.

The version I showed above was stage-coupling with "restart" (the first row in the table below), which means I reset the sub-integrator in every stage to the state of the previous time step. Without "restart", the sub-integrator is integrated to the first stage time with the fluid state prediction of the first stage, and then continues from there in the second stage, etc.
"Predict" means I apply an explicit Euler step with the structure velocity to predict the structure position at the stage time as u += v * (t_new - t_previous). Then I use this prediction to compute $F_\text{fluid}$.
"Deviation" is the (normalized) difference in the y-coordinates of the square at t=1.5 between the split integration and the non-split integration (fluid and structure integrated together at Δt=1.44e-4), which I consider the reference for this simulation (time integration error almost zero). A negative deviation means the square sank too fast, a positive deviation means it didn't sink fast enough.

Stage-coupling	Restart	Predict	max Δt	Deviation @ Δt=8e-4	Deviation @ Δt=1.6e-3	#Sub-steps @ Δt=8e-4	#Sub-steps @ Δt=1.6e-3
✅	✅	❌	1.6e-3	-8.47e-2	-1.91e-1	38k	35.2k
✅	✅	✅	1.6e-3	-2.50e-4	1.47e-3	38k	35.2k
✅	❌	❌	1.6e-3	-2.01e-2	-3.86e-2	15k	12.3k
✅	❌	✅	1.6e-3	-2.91e-4	-3.95e-4	15k	12.3k
❌	—	❌	8.0e-4	-4.39e-2	—	11.2k	—
❌	—	✅	8.0e-4	3.70e-2	—	11.2k	—

Interpreting these results, we can see that the methods without position prediction all have a significant negative deviation (obviously even more pronounced with the larger time step, for the stage-level coupling that is stable at this time step), indicating an underestimation of $F_\text{fluid}$. This is reasonable because $F_\text{fluid}$ is computed at the previous time step (or stage for the "restart" methods), at which the square is higher up, so the forces from the fluid are underestimated. This effect is smaller for non-restart methods because they compute $F_\text{fluid}$ at the previous stage instead of going further back to the previous step. Prediction significantly increases the accuracy.

The "restart" methods compute the final structure state by integrating from $t_n$ to $t_{n+1}$ with a constant $F_\text{fluid}$ computed at $t_n$. The non-restart methods instead compute $F_\text{fluid}$ based on the predicted fluid state for each stage (which can be considered less accurate since it's a prediction), but re-compute $F_\text{fluid}$ for each state (which, in turn, is more accurate). The resulting method has similar accuracy in this benchmark (higher even for the larger time step) and the same stability, but the number of sub-steps is close to the step-coupling method, whereas the restart methods require 2-3x more stub-steps.
Note that a larger ratio between fluid and structure time step further reduces the overhead of the non-restart methods compared to the step-coupling method.

In summary, the non-restart stage-coupling method with prediction is more stable (allows for a 2x larger fluid time step), more accurate, and not more expensive (does not require significantly more structure sub-steps) than the previous implementation of step-level coupling. For testing purposes, I added kwargs, so all methods can still be tested. I am not sure if the non-restart will work as well with every time integrator as it did with Carpenter-Kennedy.

Copilot

Pull request overview

This PR implements stage-level coupling for split integration of TotalLagrangianSPHSystems, and refactors the ODE problem parameter p to carry both the semidiscretization and split-integration runtime payload.

Changes:

Change semidiscretize/ODE p payload from Semidiscretization to a NamedTuple with p.semi plus p.split_integration_data.
Extend SplitIntegrationCallback with stage_coupling / predict_positions options and stage-time integration support.
Update callbacks/IO/visualization code paths to use integrator.p.semi / sol.prob.p.semi.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`test/callbacks/info.jl`	Update mock integrator payload to match `p.semi` access pattern.
`src/visualization/recipes_plots.jl`	Adapt plotting recipe and solution type alias to the new `p` payload shape.
`src/io/io.jl`	Read semidiscretization/metadata from `integrator.p.semi`.
`src/general/semidiscretization.jl`	Construct `p=(; semi, split_integration_data=nothing)` and adjust `kick!/drift!` signatures accordingly.
`src/callbacks/update.jl`	Use `integrator.p.semi`.
`src/callbacks/stepsize.jl`	Use `integrator.p.semi`; broaden callback type check for parametric `SplitIntegrationCallback`.
`src/callbacks/steady_state_reached.jl`	Use `integrator.p.semi`.
`src/callbacks/split_integration.jl`	Implement stage coupling + new payload storage under `p.split_integration_data`.
`src/callbacks/solution_saving.jl`	Use `integrator.p.semi`.
`src/callbacks/post_process.jl`	Use `integrator.p.semi`.
`src/callbacks/info.jl`	Use `integrator.p.semi`.
`src/callbacks/density_reinit.jl`	Use `integrator.p.semi`.
`ext/TrixiParticlesOrdinaryDiffEqExt.jl`	Adjust extension code to read `p.semi`.
`docs/src/refs.bib`	Remove JabRef metadata comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/callbacks/split_integration.jl

codecov · 2026-03-02T11:46:23Z

Codecov Report

❌ Patch coverage is 93.46405% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.52%. Comparing base (3afa38f) to head (df670ab).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
src/callbacks/split_integration.jl	92.03%	9 Missing ⚠️
src/visualization/recipes_plots.jl	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1049      +/-   ##
==========================================
- Coverage   89.54%   89.52%   -0.03%     
==========================================
  Files         127      127              
  Lines        9654     9710      +56     
==========================================
+ Hits         8645     8693      +48     
- Misses       1009     1017       +8

Flag	Coverage Δ
total	`89.52% <93.46%> (-0.03%)`	⬇️
unit	`67.30% <15.23%> (-0.32%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/callbacks/stepsize.jl

src/general/semidiscretization.jl

Copilot

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/callbacks/split_integration.jl

Copilot

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/callbacks/split_integration.jl

efaulhaber · 2026-03-02T14:58:44Z

/run-gpu-tests

efaulhaber · 2026-03-04T07:20:02Z

@copilot Find the issue with the failing CI runs and suggest a fix.

Copilot

Pull request overview

Copilot reviewed 23 out of 23 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

test/examples/gpu.jl

test/count_allocations.jl

src/callbacks/split_integration.jl

test/count_allocations.jl

efaulhaber · 2026-03-04T09:12:28Z

/run-gpu-tests

examples/fsi/dam_break_plate_2d.jl

src/callbacks/split_integration.jl

svchb · 2026-03-05T16:18:39Z

Since you are changing the extension can you please address this issue #1047

src/general/semidiscretization.jl

efaulhaber · 2026-03-31T08:29:45Z

/run-gpu-tests

svchb · 2026-03-31T09:55:37Z

src/callbacks/split_integration.jl

+        # Tell OrdinaryDiffEq that `u` has NOT been modified.
+        # Theoretically, the TLSPH part has been modified, but in the FSAL case,
+        # the time at the last stage is the same as the step time, so the split integration
+        # above is skipped and `u` is not modified at all.
+        # Therefore, the derivative at the last stage can be reused for the next step.


# Tell OrdinaryDiffEq that `u` was not modified. # In the FSAL case, the last stage occurs at the step time, # so the split integration is skipped and `u` remains unchanged. # Therefore, the derivative at the last stage can be reused for the next step.

svchb · 2026-03-31T10:05:06Z

src/callbacks/split_integration.jl

+        # We modify `v_ode` and `u_ode`, which is technically not allowed during stages,
+        # hence there are no guarantees about the structure part of `v_ode` and `u_ode`.
+        # By copying the current split integration values, we make sure that it's correct.


svchb · 2026-03-31T10:17:53Z

src/callbacks/split_integration.jl

+    foreach_system(semi_split) do system
+        # Construct string for the interactions timer.
+        # Avoid allocations from string construction when no timers are used.
+        # TODO do we need to disable timers in split integration?


efaulhaber self-assigned this Jan 16, 2026

efaulhaber added the enhancement New feature or request label Jan 16, 2026

efaulhaber mentioned this pull request Feb 27, 2026

Fin simulation prototyping branch #844

Draft

efaulhaber requested a review from Copilot March 2, 2026 11:32

Copilot started reviewing on behalf of efaulhaber March 2, 2026 11:32 View session

Copilot AI reviewed Mar 2, 2026

View reviewed changes

src/callbacks/split_integration.jl Outdated Show resolved Hide resolved

src/callbacks/split_integration.jl Show resolved Hide resolved

efaulhaber requested a review from Copilot March 2, 2026 11:49

Copilot started reviewing on behalf of efaulhaber March 2, 2026 11:49 View session

Copilot AI reviewed Mar 2, 2026

View reviewed changes

src/callbacks/stepsize.jl Outdated Show resolved Hide resolved

src/general/semidiscretization.jl Show resolved Hide resolved

efaulhaber requested a review from Copilot March 2, 2026 12:25

Copilot started reviewing on behalf of efaulhaber March 2, 2026 12:26 View session

Copilot AI reviewed Mar 2, 2026

View reviewed changes

src/callbacks/split_integration.jl Show resolved Hide resolved

src/callbacks/split_integration.jl Outdated Show resolved Hide resolved

efaulhaber requested a review from Copilot March 2, 2026 13:28

Copilot started reviewing on behalf of efaulhaber March 2, 2026 13:29 View session

Copilot AI reviewed Mar 2, 2026

View reviewed changes

src/callbacks/split_integration.jl Show resolved Hide resolved

efaulhaber marked this pull request as ready for review March 2, 2026 13:38

efaulhaber requested review from LasNikas and svchb March 2, 2026 13:38

efaulhaber marked this pull request as draft March 2, 2026 17:20

efaulhaber mentioned this pull request Mar 2, 2026

get_du is super annoying #1080

Closed

efaulhaber force-pushed the stage-coupling branch 2 times, most recently from c798ff9 to d749eb3 Compare March 3, 2026 13:36

efaulhaber added 5 commits March 3, 2026 22:14

Implement stage-level coupling for split integration

5d873d5

Fix timers

7cbdeb1

Fix first stages

c9297ab

Implement split integration restart and position prediction

c2d4750

Reformat

c9089d3

efaulhaber added 5 commits March 3, 2026 22:14

Fix split integration tests

6245696

Add kwarg coordinated_eltype to example file

3d9cae8

Fix rebase

0c32e83

Fix count_allocations

d129e5d

Fix tests

741c9a6

efaulhaber force-pushed the stage-coupling branch from dcf99cf to 741c9a6 Compare March 3, 2026 21:17

Reformat

f888390

efaulhaber added 2 commits March 4, 2026 09:16

Fix n-body tests

c5ee99f

Fix tests

7174e88

efaulhaber requested a review from Copilot March 4, 2026 08:48

Copilot started reviewing on behalf of efaulhaber March 4, 2026 08:48 View session

Copilot AI reviewed Mar 4, 2026

View reviewed changes

test/examples/gpu.jl Show resolved Hide resolved

test/count_allocations.jl Show resolved Hide resolved

src/callbacks/split_integration.jl Show resolved Hide resolved

test/count_allocations.jl Outdated Show resolved Hide resolved

efaulhaber added 2 commits March 4, 2026 10:03

Fix GPU tests

9c73fae

Fix comment

58699f1

efaulhaber marked this pull request as ready for review March 4, 2026 09:12

svchb requested changes Mar 5, 2026

View reviewed changes

svchb added the high priority label Mar 9, 2026

LasNikas requested changes Mar 19, 2026

View reviewed changes

src/general/semidiscretization.jl Show resolved Hide resolved

efaulhaber added 5 commits March 25, 2026 17:46

Merge branch 'main' into stage-coupling

3848ba0

Implement suggestions

f8f8409

Fix unit tests

5fb458e

Fix tests

af8bea9

Merge branch 'main' into stage-coupling

df670ab

efaulhaber requested review from LasNikas and svchb March 31, 2026 08:30

svchb requested changes Mar 31, 2026

View reviewed changes

Conversation

efaulhaber commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

efaulhaber commented Jan 20, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

efaulhaber commented Mar 2, 2026

Uh oh!

efaulhaber commented Mar 4, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

efaulhaber commented Mar 4, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

svchb commented Mar 5, 2026

Uh oh!

Uh oh!

efaulhaber commented Mar 31, 2026

Uh oh!

svchb Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

svchb Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

svchb Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

efaulhaber commented Jan 16, 2026 •

edited

Loading

codecov bot commented Mar 2, 2026 •

edited

Loading