Skip to content

Fuse multi-output microphysical tendencies for GPU performance#676

Open
kaiyuan-cheng wants to merge 5 commits into
mainfrom
kyc/mp_gpu
Open

Fuse multi-output microphysical tendencies for GPU performance#676
kaiyuan-cheng wants to merge 5 commits into
mainfrom
kyc/mp_gpu

Conversation

@kaiyuan-cheng
Copy link
Copy Markdown
Collaborator

Summary

  • Adds a compute_microphysical_tendencies!(model) interface (no-op by default) so schemes whose tendency computation bundles many process rates feeding multiple prognostics can fuse the work into a single launch instead of recomputing the bundle once per tracer.
  • Implements the fused path for MPNE1M (mixed-phase non-equilibrium 1M) and WPNE2M (warm-phase non-equilibrium 2M). Each scheme overrides grid_microphysical_tendency to zero and writes all bundled outputs to Gⁿ from one kernel.
  • Switches examples/splitting_supercell.jl to WPNE2M with a 5-panel animation (w, qᶜˡ, nᶜˡ, qʳ, nʳ) to exercise the new path.

Why

The per-tracer dispatchers like microphysical_tendency(::MPNE1M, ::Val{:ρqˢ}, …) each call mpne1m_tendencies(…) and discard 4 of 5 outputs. With one tendency-kernel launch per prognostic field and AcousticSSPRungeKutta3 having 3 stages, that's 15 full bundle evaluations per cell per timestep for MPNE1M (18 for WPNE2M). The compiler's DCE prunes some of it, but the conservation-routed numerical guards inside the bundle make most rates live for any single field extraction, so most of that work is genuinely redundant.

Measured speedup (H100, Float32, 168×168×40, AcousticSSPRungeKutta3)

Scheme Unfused ms/iter Fused ms/iter Speedup
MPNE1M 11.04 8.75 1.26×
WPNE2M 15.28 10.08 1.52×

The 2M case sees a larger speedup because the bundle has one extra prognostic, the coupled α-limiters cross-couple all process rates (so DCE has very little to remove), and aerosol activation (Abdul-Razzak–Ghan) contains expensive transcendentals (erf, log).

Test plan

  • cloud_microphysics_1M test set passes (81 tests, including MPNE1M time-stepping)
  • cloud_microphysics_1M_warm_ice_deposition test set passes
  • Smoke test on CPU: 2-step time_step! with WPNE2M succeeds end-to-end
  • examples/splitting_supercell.jl runs to completion on H100 with WPNE2M
  • Reviewer: confirm no regression in cloud_microphysics_2M test set
  • Reviewer: profile or @time_ns to confirm fused kernel is one of the cheaper kernels (sanity check that we didn't blow up register usage)

Notes for reviewers

  • The fused 2M kernel uses grid_microphysical_state (which interpolates u/v/w to centers) because WPNE2M's microphysical_state stores velocities (aerosol activation depends on vertical velocity). The 1M kernel reads the prognostic fields directly since MPNE1M's microphysical_state doesn't use velocities.
  • The per-tracer microphysical_tendency(::MPNE1M, ::Val{:ρq?}, …) and microphysical_tendency(::WPNE2M, ::Val{:ρ??}, …) methods are kept — they are still used by parcel models and the unit tests, which call them directly with a constructed state. The grid path now bypasses them via the grid_microphysical_tendency override.
  • Float32 accumulation order changes between the two paths (per-tracer kernel does Gρq = adv + diff + 𝓜 + forcing in one expression; fused path does Gρq = adv + diff + forcing then Gρq += 𝓜), so trajectories diverge at ULP level and amplify chaotically over thousands of timesteps. Bulk diagnostics still agree — see Float64 tests for bit-level correctness.

🤖 Generated with Claude Code

The MPNE1M and WPNE2M schemes both have a single bundle function
(`mpne1m_tendencies`, `wpne2m_tendencies`) that returns all prognostic
tendencies, plus per-tracer `microphysical_tendency` dispatchers that each
call the bundle and extract one field. With one tendency-kernel launch per
prognostic field, the bundle was being recomputed once per tracer per RK
stage — five times per cell for MPNE1M, six times for WPNE2M — even though
each call returns identical values.

This change adds a `compute_microphysical_tendencies!(model)` interface
(no-op by default) that schemes can override to write all bundled tendency
contributions in one fused launch. MPNE1M and WPNE2M now override
`grid_microphysical_tendency` to zero (so per-tracer kernels skip the
microphysics term) and provide a fused kernel that computes the bundle
once per cell and `+=`s every output into the corresponding `Gⁿ` field.

On H100, splitting_supercell.jl (Float32, 168×168×40, AcousticSSPRungeKutta3)
shows a 1.26× steady-state speedup for MPNE1M and 1.52× for WPNE2M. The 2M
case benefits more because (a) one extra tracer in the bundle, (b) coupled
α-limiters cross-couple every process rate so the compiler has very little
to prune via DCE, (c) the Abdul-Razzak–Ghan aerosol activation contains
expensive transcendentals (`erf`, `log`).

Switches `examples/splitting_supercell.jl` from Kessler to WPNE2M to
demonstrate the fused path on a non-trivial 2M scheme; the animation is
extended to 5 panels (w, qᶜˡ, nᶜˡ, qʳ, nʳ).

Tests: cloud_microphysics_1M and cloud_microphysics_1M_warm_ice_deposition
test sets pass with the fused implementation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 6, 2026 21:26
Comment thread src/AtmosphereModels/microphysics_interface.jl Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an opt-in interface for fusing multi-output microphysics tendency evaluation into a single kernel pass, reducing redundant per-prognostic recomputation and improving GPU performance. It wires the new hook into the model tendency pipeline, implements the fused path for the CloudMicrophysics-backed 1M/2M non-equilibrium schemes, and updates an example to exercise the new 2M path.

Changes:

  • Add compute_microphysical_tendencies!(model) (default no-op) and call it from compute_tendencies! so microphysics schemes can optionally contribute tendencies via a single fused kernel.
  • Implement fused tendency kernels for MPNE1M and WPNE2M, and disable their per-tracer grid microphysics term by overriding grid_microphysical_tendency to return zero.
  • Update examples/splitting_supercell.jl to run with TwoMomentCloudMicrophysics (WPNE2M) and produce a 5-panel animation including number concentrations.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/AtmosphereModels/update_atmosphere_model_state.jl Calls the new fused microphysics hook during tendency assembly.
src/AtmosphereModels/microphysics_interface.jl Defines the compute_microphysical_tendencies! interface and default no-op implementation.
src/AtmosphereModels/AtmosphereModels.jl Exports compute_microphysical_tendencies!.
ext/BreezeCloudMicrophysicsExt/two_moment_microphysics.jl Implements fused WPNE2M tendency kernel and disables per-tracer grid microphysics term.
ext/BreezeCloudMicrophysicsExt/one_moment_microphysics.jl Implements fused MPNE1M tendency kernel and disables per-tracer grid microphysics term.
ext/BreezeCloudMicrophysicsExt/BreezeCloudMicrophysicsExt.jl Adds imports needed for launch! and KernelAbstractions kernels.
examples/splitting_supercell.jl Switches the supercell example to TwoMomentCloudMicrophysics and expands diagnostics/animation panels.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ext/BreezeCloudMicrophysicsExt/two_moment_microphysics.jl Outdated
Comment on lines +942 to +944
# Skip the per-tracer microphysics term: handled by the fused kernel below.
@inline AM.grid_microphysical_tendency(i, j, k, grid, microphysics::MPNE1M,
name, ρ, fields, 𝒰, constants, velocities) = zero(grid)
Comment thread examples/splitting_supercell.jl Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented May 6, 2026

Codecov Report

❌ Patch coverage is 92.00000% with 6 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...ezeCloudMicrophysicsExt/one_moment_microphysics.jl 91.66% 2 Missing ⚠️
...ezeCloudMicrophysicsExt/two_moment_microphysics.jl 90.47% 2 Missing ⚠️
src/AtmosphereModels/microphysics_interface.jl 92.85% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

@inline AM.grid_microphysical_tendency(i, j, k, grid, microphysics::MPNE1M,
name, ρ, fields, 𝒰, constants, velocities) = zero(grid)

@kernel function _compute_mpne1m_tendencies!(Gρqᵛ, Gρqᶜˡ, Gρqᶜⁱ, Gρqʳ, Gρqˢ,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any possibility to re-use the grid_microphysical_tendency interface? This design basically negates that interface.

Another possibility is to excise the grid_microphysical_tendency interface in favor of this new interface. The downside is that we permanently bake-in splitting of the microphysics tendency from advection tendency. However, if that is faster in most situations, I think it's a good approach.

I think more generally it might be nice to provide just "one way" of extending microphysics, for simplicity.

kaiyuan-cheng and others added 2 commits May 6, 2026 18:32
Co-authored-by: Mosè Giordano <765740+giordano@users.noreply.github.com>
Replace the dual `grid_microphysical_tendency` + `compute_microphysical_tendencies!`
interface with a single entry point. The model now invokes microphysics solely via
`compute_microphysical_tendencies!`, called after the per-tracer dynamics kernels
have written advection + diffusion + forcing to `Gⁿ`.

Schemes plug in by extending one of two methods:

* `microphysical_tendency` (state-based, per-name) for schemes whose tendencies
  factor naturally per-prognostic. The default `compute_microphysical_tendencies!`
  launches a single fused kernel that builds the microphysical and thermodynamic
  states once per cell and `+=`s `microphysical_tendency` for each prognostic name
  into the corresponding `G` field.

* `compute_microphysical_tendencies!(microphysics, model)` directly, for bundle
  schemes (MPNE1M, WPNE2M) that compute many process rates feeding multiple
  prognostic tendencies in one pass.

Removes `grid_microphysical_tendency` from the dynamics tendency kernels
(`scalar_tendency`, `potential_temperature_tendency`, `static_energy_tendency`)
and from the public API. The MPNE1M/WPNE2M zero-overrides on
`grid_microphysical_tendency` are no longer needed.

ZMCM's `:ρqᵉ` precipitation-removal tendency moves from the grid-indexed
override to the state-based `microphysical_tendency`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@kaiyuan-cheng
Copy link
Copy Markdown
Collaborator Author

Pushed a follow-up commit that addresses the reviewer's question about the dual interface (5e0ff88).

Replaced grid_microphysical_tendency + compute_microphysical_tendencies! with a single entry point. The atmosphere model now invokes microphysics solely through compute_microphysical_tendencies!, called after the per-tracer dynamics kernels write advection + diffusion + forcing to Gⁿ.

Schemes plug in by extending one of two methods:

  • microphysical_tendency (state-based, per-name): the default compute_microphysical_tendencies! launches a single fused kernel that builds and 𝒰 once per cell and +=s the result into each prognostic G field. Simple schemes (Kessler, saturation adjustment, ZMCM) plug in here.
  • compute_microphysical_tendencies!(microphysics, model) directly: bundle schemes (MPNE1M, WPNE2M) override this with their fused-bundle kernel.

The MPNE1M/WPNE2M zero-overrides on grid_microphysical_tendency (and the grid_microphysical_tendency calls inside scalar_tendency, potential_temperature_tendency, static_energy_tendency) are gone. Verified locally:

  • cloud_microphysics_0M + cloud_microphysics_1M: 95/95 pass
  • cloud_microphysics_2M: 71/71 pass
  • ExplicitImports: clean

Drop the MPNE1M/2M experiments from the example file. The microphysics
refactor work belongs in the source modules; the example remains on the
Kessler scheme used by DCMIP2016 on main.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@inbounds qˢ = microphysical_fields.ρqˢ[i, j, k] / ρ
ℳ = MixedPhaseOneMomentState(qᶜˡ, qᶜⁱ, qʳ, qˢ)

G = mpne1m_tendencies(microphysics, ρ, ℳ, 𝒰, constants)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove this interface and put the body of this function straight into the kernel?

# so we need the full grid_microphysical_state which interpolates u, v, w to centers.
ℳ = AtmosphereModels.grid_microphysical_state(i, j, k, grid, microphysics, microphysical_fields, ρ, 𝒰, velocities)

G = wpne2m_tendencies(microphysics, ρ, ℳ, 𝒰, constants)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto here --- the new interface for building microphysics schemes requires defining this kernel, I think...

Copy link
Copy Markdown
Member

@glwagner glwagner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to update the microphysics interface docs to reflect the fact that now, after this PR, building a microphysics scheme cannot be done merely by defining a set of functions? This PR changes the interface so that developers have to write a kernel. The documentation should also explain how to write kernels with links to KernelAbstractions and launch! in Oceananigans I think.

@giordano giordano added the breaking change 💔 Concerning a change which breaks the API label May 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking change 💔 Concerning a change which breaks the API

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants