Fuse multi-output microphysical tendencies for GPU performance#676
Fuse multi-output microphysical tendencies for GPU performance#676kaiyuan-cheng wants to merge 5 commits into
Conversation
The MPNE1M and WPNE2M schemes both have a single bundle function (`mpne1m_tendencies`, `wpne2m_tendencies`) that returns all prognostic tendencies, plus per-tracer `microphysical_tendency` dispatchers that each call the bundle and extract one field. With one tendency-kernel launch per prognostic field, the bundle was being recomputed once per tracer per RK stage — five times per cell for MPNE1M, six times for WPNE2M — even though each call returns identical values. This change adds a `compute_microphysical_tendencies!(model)` interface (no-op by default) that schemes can override to write all bundled tendency contributions in one fused launch. MPNE1M and WPNE2M now override `grid_microphysical_tendency` to zero (so per-tracer kernels skip the microphysics term) and provide a fused kernel that computes the bundle once per cell and `+=`s every output into the corresponding `Gⁿ` field. On H100, splitting_supercell.jl (Float32, 168×168×40, AcousticSSPRungeKutta3) shows a 1.26× steady-state speedup for MPNE1M and 1.52× for WPNE2M. The 2M case benefits more because (a) one extra tracer in the bundle, (b) coupled α-limiters cross-couple every process rate so the compiler has very little to prune via DCE, (c) the Abdul-Razzak–Ghan aerosol activation contains expensive transcendentals (`erf`, `log`). Switches `examples/splitting_supercell.jl` from Kessler to WPNE2M to demonstrate the fused path on a non-trivial 2M scheme; the animation is extended to 5 panels (w, qᶜˡ, nᶜˡ, qʳ, nʳ). Tests: cloud_microphysics_1M and cloud_microphysics_1M_warm_ice_deposition test sets pass with the fused implementation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR adds an opt-in interface for fusing multi-output microphysics tendency evaluation into a single kernel pass, reducing redundant per-prognostic recomputation and improving GPU performance. It wires the new hook into the model tendency pipeline, implements the fused path for the CloudMicrophysics-backed 1M/2M non-equilibrium schemes, and updates an example to exercise the new 2M path.
Changes:
- Add
compute_microphysical_tendencies!(model)(default no-op) and call it fromcompute_tendencies!so microphysics schemes can optionally contribute tendencies via a single fused kernel. - Implement fused tendency kernels for MPNE1M and WPNE2M, and disable their per-tracer grid microphysics term by overriding
grid_microphysical_tendencyto return zero. - Update
examples/splitting_supercell.jlto run with TwoMomentCloudMicrophysics (WPNE2M) and produce a 5-panel animation including number concentrations.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
src/AtmosphereModels/update_atmosphere_model_state.jl |
Calls the new fused microphysics hook during tendency assembly. |
src/AtmosphereModels/microphysics_interface.jl |
Defines the compute_microphysical_tendencies! interface and default no-op implementation. |
src/AtmosphereModels/AtmosphereModels.jl |
Exports compute_microphysical_tendencies!. |
ext/BreezeCloudMicrophysicsExt/two_moment_microphysics.jl |
Implements fused WPNE2M tendency kernel and disables per-tracer grid microphysics term. |
ext/BreezeCloudMicrophysicsExt/one_moment_microphysics.jl |
Implements fused MPNE1M tendency kernel and disables per-tracer grid microphysics term. |
ext/BreezeCloudMicrophysicsExt/BreezeCloudMicrophysicsExt.jl |
Adds imports needed for launch! and KernelAbstractions kernels. |
examples/splitting_supercell.jl |
Switches the supercell example to TwoMomentCloudMicrophysics and expands diagnostics/animation panels. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Skip the per-tracer microphysics term: handled by the fused kernel below. | ||
| @inline AM.grid_microphysical_tendency(i, j, k, grid, microphysics::MPNE1M, | ||
| name, ρ, fields, 𝒰, constants, velocities) = zero(grid) |
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
| @inline AM.grid_microphysical_tendency(i, j, k, grid, microphysics::MPNE1M, | ||
| name, ρ, fields, 𝒰, constants, velocities) = zero(grid) | ||
|
|
||
| @kernel function _compute_mpne1m_tendencies!(Gρqᵛ, Gρqᶜˡ, Gρqᶜⁱ, Gρqʳ, Gρqˢ, |
There was a problem hiding this comment.
Is there any possibility to re-use the grid_microphysical_tendency interface? This design basically negates that interface.
Another possibility is to excise the grid_microphysical_tendency interface in favor of this new interface. The downside is that we permanently bake-in splitting of the microphysics tendency from advection tendency. However, if that is faster in most situations, I think it's a good approach.
I think more generally it might be nice to provide just "one way" of extending microphysics, for simplicity.
Co-authored-by: Mosè Giordano <765740+giordano@users.noreply.github.com>
Replace the dual `grid_microphysical_tendency` + `compute_microphysical_tendencies!` interface with a single entry point. The model now invokes microphysics solely via `compute_microphysical_tendencies!`, called after the per-tracer dynamics kernels have written advection + diffusion + forcing to `Gⁿ`. Schemes plug in by extending one of two methods: * `microphysical_tendency` (state-based, per-name) for schemes whose tendencies factor naturally per-prognostic. The default `compute_microphysical_tendencies!` launches a single fused kernel that builds the microphysical and thermodynamic states once per cell and `+=`s `microphysical_tendency` for each prognostic name into the corresponding `G` field. * `compute_microphysical_tendencies!(microphysics, model)` directly, for bundle schemes (MPNE1M, WPNE2M) that compute many process rates feeding multiple prognostic tendencies in one pass. Removes `grid_microphysical_tendency` from the dynamics tendency kernels (`scalar_tendency`, `potential_temperature_tendency`, `static_energy_tendency`) and from the public API. The MPNE1M/WPNE2M zero-overrides on `grid_microphysical_tendency` are no longer needed. ZMCM's `:ρqᵉ` precipitation-removal tendency moves from the grid-indexed override to the state-based `microphysical_tendency`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Pushed a follow-up commit that addresses the reviewer's question about the dual interface (5e0ff88). Replaced Schemes plug in by extending one of two methods:
The MPNE1M/WPNE2M zero-overrides on
|
Drop the MPNE1M/2M experiments from the example file. The microphysics refactor work belongs in the source modules; the example remains on the Kessler scheme used by DCMIP2016 on main. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| @inbounds qˢ = microphysical_fields.ρqˢ[i, j, k] / ρ | ||
| ℳ = MixedPhaseOneMomentState(qᶜˡ, qᶜⁱ, qʳ, qˢ) | ||
|
|
||
| G = mpne1m_tendencies(microphysics, ρ, ℳ, 𝒰, constants) |
There was a problem hiding this comment.
Should we remove this interface and put the body of this function straight into the kernel?
| # so we need the full grid_microphysical_state which interpolates u, v, w to centers. | ||
| ℳ = AtmosphereModels.grid_microphysical_state(i, j, k, grid, microphysics, microphysical_fields, ρ, 𝒰, velocities) | ||
|
|
||
| G = wpne2m_tendencies(microphysics, ρ, ℳ, 𝒰, constants) |
There was a problem hiding this comment.
ditto here --- the new interface for building microphysics schemes requires defining this kernel, I think...
glwagner
left a comment
There was a problem hiding this comment.
Do we have to update the microphysics interface docs to reflect the fact that now, after this PR, building a microphysics scheme cannot be done merely by defining a set of functions? This PR changes the interface so that developers have to write a kernel. The documentation should also explain how to write kernels with links to KernelAbstractions and launch! in Oceananigans I think.
Summary
compute_microphysical_tendencies!(model)interface (no-op by default) so schemes whose tendency computation bundles many process rates feeding multiple prognostics can fuse the work into a single launch instead of recomputing the bundle once per tracer.grid_microphysical_tendencyto zero and writes all bundled outputs toGⁿfrom one kernel.examples/splitting_supercell.jlto WPNE2M with a 5-panel animation (w, qᶜˡ, nᶜˡ, qʳ, nʳ) to exercise the new path.Why
The per-tracer dispatchers like
microphysical_tendency(::MPNE1M, ::Val{:ρqˢ}, …)each callmpne1m_tendencies(…)and discard 4 of 5 outputs. With one tendency-kernel launch per prognostic field and AcousticSSPRungeKutta3 having 3 stages, that's 15 full bundle evaluations per cell per timestep for MPNE1M (18 for WPNE2M). The compiler's DCE prunes some of it, but the conservation-routed numerical guards inside the bundle make most rates live for any single field extraction, so most of that work is genuinely redundant.Measured speedup (H100, Float32, 168×168×40, AcousticSSPRungeKutta3)
The 2M case sees a larger speedup because the bundle has one extra prognostic, the coupled α-limiters cross-couple all process rates (so DCE has very little to remove), and aerosol activation (Abdul-Razzak–Ghan) contains expensive transcendentals (
erf,log).Test plan
cloud_microphysics_1Mtest set passes (81 tests, including MPNE1M time-stepping)cloud_microphysics_1M_warm_ice_depositiontest set passestime_step!with WPNE2M succeeds end-to-endexamples/splitting_supercell.jlruns to completion on H100 with WPNE2Mcloud_microphysics_2Mtest set@time_nsto confirm fused kernel is one of the cheaper kernels (sanity check that we didn't blow up register usage)Notes for reviewers
grid_microphysical_state(which interpolates u/v/w to centers) because WPNE2M'smicrophysical_statestores velocities (aerosol activation depends on vertical velocity). The 1M kernel reads the prognostic fields directly since MPNE1M'smicrophysical_statedoesn't use velocities.microphysical_tendency(::MPNE1M, ::Val{:ρq?}, …)andmicrophysical_tendency(::WPNE2M, ::Val{:ρ??}, …)methods are kept — they are still used by parcel models and the unit tests, which call them directly with a constructed state. The grid path now bypasses them via thegrid_microphysical_tendencyoverride.Gρq = adv + diff + 𝓜 + forcingin one expression; fused path doesGρq = adv + diff + forcingthenGρq += 𝓜), so trajectories diverge at ULP level and amplify chaotically over thousands of timesteps. Bulk diagnostics still agree — see Float64 tests for bit-level correctness.🤖 Generated with Claude Code