Skip to content

[Optimization] Photoionization ODE performance#1946

Closed
BenWibking wants to merge 27 commits into
developmentfrom
BenWibking/photochemistry-flux-elim
Closed

[Optimization] Photoionization ODE performance#1946
BenWibking wants to merge 27 commits into
developmentfrom
BenWibking/photochemistry-flux-elim

Conversation

@BenWibking

@BenWibking BenWibking commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Description

More performance optimizations (+measured improvement for DTypeFront problem):

  • Use ROS2S integrator [~24% faster]. (This improvement is both on development and on this branch.)
  • RHS optimizations:
    • Selectively use fast low-precision approximations for sqrt and pow in cooling terms.
      (Note: Rate terms are NOT modified. exp is NEVER modified anywhere.)
    • Group common subexpressions and other algebraic rewrites
  • Use a bespoke 6x6 LU solver for 6-equation ODE systems (e.g., the photoionization network)

ROS2S is the fastest integrator (measured on 1 H200 GPU), both before and after the other optimizations.

However, the other changes in this PR now appear to make it slower. This is under investigation.

Measured 23 Jun 2026 on development:

  ┌─────────┬─────────────┬───────────┬───────────┬────────┬────────┬─────────────┬─────────┬──────────────┬──────────────────┬────────┐
  │ Method  │ FoM us/zone │ Mupdate/s │ TP wall s │ Chem s │ Chem % │ Rad-noODE % │ Hydro % │ Hydro-only x │ RadSub/HydroStep │ Subcyc │
  ├─────────┼─────────────┼───────────┼───────────┼────────┼────────┼─────────────┼─────────┼──────────────┼──────────────────┼────────┤
  │ ROS2S   │      0.0877 │     11.40 │     19.24 │   6.47 │  33.65 │       42.80 │   14.78 │         6.77 │             0.52 │   9.99 │
  │ Rodas3P │      0.0895 │     11.17 │     19.30 │   7.07 │  36.62 │       41.82 │   14.43 │         6.93 │             0.54 │   9.99 │
  │ Rodas4P │      0.0957 │     10.45 │     20.73 │   7.95 │  38.35 │       40.64 │   13.51 │         7.40 │             0.58 │   9.99 │
  │ VODE    │      0.1089 │      9.18 │     23.20 │  10.67 │  45.98 │       36.15 │   12.05 │         8.30 │             0.68 │   9.99 │
  │ Rodas5P │      0.1214 │      8.24 │     25.93 │  13.09 │  50.47 │       32.48 │   11.08 │         9.03 │             0.75 │   9.99 │
  └─────────┴─────────────┴───────────┴───────────┴────────┴────────┴─────────────┴─────────┴──────────────┴──────────────────┴────────┘

Measured 23 Jun 2026 on this PR:

  ┌─────────┬─────────────┬───────────┬───────────┬────────┬────────┬─────────────┬─────────┬──────────────┬──────────────────┬────────┐
  │ Method  │ FoM us/zone │ Mupdate/s │ TP wall s │ Chem s │ Chem % │ Rad-noODE % │ Hydro % │ Hydro-only x │ RadSub/HydroStep │ Subcyc │
  ├─────────┼─────────────┼───────────┼───────────┼────────┼────────┼─────────────┼─────────┼──────────────┼──────────────────┼────────┤
  │ ROS2S   │      0.0965 │     10.36 │     19.67 │   7.99 │  40.63 │       43.08 │   14.61 │         6.84 │             0.57 │   9.99 │
  │ Rodas3P │      0.0979 │     10.21 │     19.91 │   8.31 │  41.72 │       42.42 │   14.43 │         6.93 │             0.58 │   9.99 │
  │ Rodas4P │      0.1011 │      9.90 │     20.55 │   8.63 │  42.01 │       42.60 │   13.98 │         7.15 │             0.61 │   9.99 │
  │ VODE    │      0.1194 │      8.38 │     24.27 │  12.49 │  51.46 │       35.58 │   11.77 │         8.50 │             0.74 │   9.99 │
  │ Rodas5P │      0.1251 │      7.99 │     25.43 │  13.77 │  54.14 │       33.40 │   11.33 │         8.83 │             0.77 │   9.99 │
  └─────────┴─────────────┴───────────┴───────────┴────────┴────────┴─────────────┴─────────┴──────────────┴──────────────────┴────────┘

Each batch of performance improvements is generated with variations on this prompt:

/goal Optimize the performance of the DTypeFront problem. The figure of merit is time to solution on 1 GPU for 20 hydro timesteps with the default timestepping. Do not modify any code other than the photochemistry ODE integration. Do not change the physics solved or the mathematical content of the equations. However, you can apply any performance optimizations in the ODE integrator or the Jacobian or the RHS evaluation as long as the final result passes the built-in verification test. After 10 iterations of optimizations, stop and report your results. Each iteration should take no longer than 5 minutes.

Either a specific area of the code to focus on or an optimization strategy for the agent to follow can be appended to this generic prompt.

Related issues

N/A

Checklist

Before this pull request can be reviewed, all of these tasks should be completed. Denote completed tasks with an x inside the square brackets [ ] in the Markdown source below:

  • I have added a description (see above).
  • I have added a link to any related issues (if applicable; see above).
  • I have read the Contributing Guide.
  • I have added tests for any new physics that this PR adds to the code.
  • (For quokka-astro org members) I have manually triggered the GPU tests with the magic comment /azp run.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the photochemistry solver by removing the radiation flux from the ODE integration and instead attenuating it algebraically based on the change in photon density. It also introduces a configuration option to skip radius plotting in the DTypeFront test. The review feedback correctly points out that the algebraic flux attenuation logic currently assumes a single chemical band and should be updated to store and compute attenuation factors per-band to support multi-band networks.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread src/radiation/photochemistry.hpp Outdated
Comment thread src/radiation/photochemistry.hpp Outdated
Comment thread src/radiation/photochemistry.hpp
@BenWibking BenWibking changed the title [WIP] algebraically eliminate radiation flux from photoionization network Algebraically eliminate radiation flux from photoionization network Jun 12, 2026
@BenWibking

Copy link
Copy Markdown
Collaborator Author

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 2 pipeline(s).

@BenWibking

Copy link
Copy Markdown
Collaborator Author

Hmm, not sure what's causing the GPU failures here. It works on my local NVIDIA H200 node and on my laptop. I'll have to look at this next week.

@markkrumholz

Copy link
Copy Markdown
Collaborator

Unfortunately I think we should not merge this because, while it speeds up this particular case, it is not a generalizable solution that we can use generically — which is why the code was written the way it was in the first place. The problem is that this approach works only if photons can only be absorbed, not if they can also be created. Consider a situation in which we have an external radiation field entering a cell that supplies a non-zero photon number density and photon flux, which is being absorbed in the cell, but we also have some process in the cell that creates photons in the same band. In this case we could have a situation where essentially all the incoming photons get absorbed, but at the same time the emission process creates an equal number of photons. The emitted photons will be isotropic (neglecting the small effects of relativistic beaming), so in this situation the right answer is that the flux goes down to ~0, but the photon number density remains unchanged. In this situation you clearly can’t deduce the change in flux just from the change in photon density.

The situation I’m describing it’s just an idle thought experiment — there are plenty of real astrophysical systems where this sort of effect is important. An example is photoevaporation of protoplanetary disks, where the disk may be shadowed from direct stellar radiation by a puffed-up inner rim, but the disk surface nonetheless receives a significant ionizing photon flux because recombinations to the ground state in the disk corona create ionizing photons that are isotropic in their direction and are produced at positions above the disk that aren’t shadowed.

@BenWibking

Copy link
Copy Markdown
Collaborator Author

Unfortunately I think we should not merge this because, while it speeds up this particular case, it is not a generalizable solution that we can use generically — which is why the code was written the way it was in the first place. The problem is that this approach works only if photons can only be absorbed, not if they can also be created. Consider a situation in which we have an external radiation field entering a cell that supplies a non-zero photon number density and photon flux, which is being absorbed in the cell, but we also have some process in the cell that creates photons in the same band. In this case we could have a situation where essentially all the incoming photons get absorbed, but at the same time the emission process creates an equal number of photons. The emitted photons will be isotropic (neglecting the small effects of relativistic beaming), so in this situation the right answer is that the flux goes down to ~0, but the photon number density remains unchanged. In this situation you clearly can’t deduce the change in flux just from the change in photon density.

The situation I’m describing it’s just an idle thought experiment — there are plenty of real astrophysical systems where this sort of effect is important. An example is photoevaporation of protoplanetary disks, where the disk may be shadowed from direct stellar radiation by a puffed-up inner rim, but the disk surface nonetheless receives a significant ionizing photon flux because recombinations to the ground state in the disk corona create ionizing photons that are isotropic in their direction and are produced at positions above the disk that aren’t shadowed.

Sure, I agree that in general this will not work. But I think we do want to optimize the photoionization network for Gen2 if possible. 10% is a lot of node-hours.

@markkrumholz

Copy link
Copy Markdown
Collaborator

Unfortunately I think we should not merge this because, while it speeds up this particular case, it is not a generalizable solution that we can use generically — which is why the code was written the way it was in the first place. The problem is that this approach works only if photons can only be absorbed, not if they can also be created. Consider a situation in which we have an external radiation field entering a cell that supplies a non-zero photon number density and photon flux, which is being absorbed in the cell, but we also have some process in the cell that creates photons in the same band. In this case we could have a situation where essentially all the incoming photons get absorbed, but at the same time the emission process creates an equal number of photons. The emitted photons will be isotropic (neglecting the small effects of relativistic beaming), so in this situation the right answer is that the flux goes down to ~0, but the photon number density remains unchanged. In this situation you clearly can’t deduce the change in flux just from the change in photon density.
The situation I’m describing it’s just an idle thought experiment — there are plenty of real astrophysical systems where this sort of effect is important. An example is photoevaporation of protoplanetary disks, where the disk may be shadowed from direct stellar radiation by a puffed-up inner rim, but the disk surface nonetheless receives a significant ionizing photon flux because recombinations to the ground state in the disk corona create ionizing photons that are isotropic in their direction and are produced at positions above the disk that aren’t shadowed.

Sure, I agree that in general this will not work. But I think we do want to optimize the photoionization network for Gen2 if possible. 10% is a lot of node-hours.

Ok, but how are you proposing this be handled in the future? The current network is a toy example; it’s not the one we’re planning on using for the Gen 2 runs. For production runs we will be using a more realistic cooling treatment that will have to be generated with JAFF, so the optimizations you’re making here won’t have any effect on the production runs. Moreover, for the production runs I would advocate that we should use a network that properly does photon redistribution by having a case A recombination coefficient and keeping the recombination to the ground state terms, rather than just adopting the on-the-spot approximation as in this toy example. We’re doing so here only because the similarity solution to such we’re comparing also uses the on-the-spot approximation. If we drop that approximation for the production runs, your trick won’t work for them.

@BenWibking

Copy link
Copy Markdown
Collaborator Author

Unfortunately I think we should not merge this because, while it speeds up this particular case, it is not a generalizable solution that we can use generically — which is why the code was written the way it was in the first place. The problem is that this approach works only if photons can only be absorbed, not if they can also be created. Consider a situation in which we have an external radiation field entering a cell that supplies a non-zero photon number density and photon flux, which is being absorbed in the cell, but we also have some process in the cell that creates photons in the same band. In this case we could have a situation where essentially all the incoming photons get absorbed, but at the same time the emission process creates an equal number of photons. The emitted photons will be isotropic (neglecting the small effects of relativistic beaming), so in this situation the right answer is that the flux goes down to ~0, but the photon number density remains unchanged. In this situation you clearly can’t deduce the change in flux just from the change in photon density.
The situation I’m describing it’s just an idle thought experiment — there are plenty of real astrophysical systems where this sort of effect is important. An example is photoevaporation of protoplanetary disks, where the disk may be shadowed from direct stellar radiation by a puffed-up inner rim, but the disk surface nonetheless receives a significant ionizing photon flux because recombinations to the ground state in the disk corona create ionizing photons that are isotropic in their direction and are produced at positions above the disk that aren’t shadowed.

Sure, I agree that in general this will not work. But I think we do want to optimize the photoionization network for Gen2 if possible. 10% is a lot of node-hours.

Ok, but how are you proposing this be handled in the future? The current network is a toy example; it’s not the one we’re planning on using for the Gen 2 runs. For production runs we will be using a more realistic cooling treatment that will have to be generated with JAFF, so the optimizations you’re making here won’t have any effect on the production runs. Moreover, for the production runs I would advocate that we should use a network that properly does photon redistribution by having a case A recombination coefficient and keeping the recombination to the ground state terms, rather than just adopting the on-the-spot approximation as in this toy example. We’re doing so here only because the similarity solution to such we’re comparing also uses the on-the-spot approximation. If we drop that approximation for the production runs, your trick won’t work for them.

Right, this is just a performance prototype. I propose we have an option in JAFF to turn on this optimization. It can be off by default, since that's more general. I would advocate for using case B for the production simulations. I am skeptical that M1 can accurately compute the radiation anisotropy for case A, so I don't see how that would give us a better answer in practice.

@markkrumholz

Copy link
Copy Markdown
Collaborator

Unfortunately I think we should not merge this because, while it speeds up this particular case, it is not a generalizable solution that we can use generically — which is why the code was written the way it was in the first place. The problem is that this approach works only if photons can only be absorbed, not if they can also be created. Consider a situation in which we have an external radiation field entering a cell that supplies a non-zero photon number density and photon flux, which is being absorbed in the cell, but we also have some process in the cell that creates photons in the same band. In this case we could have a situation where essentially all the incoming photons get absorbed, but at the same time the emission process creates an equal number of photons. The emitted photons will be isotropic (neglecting the small effects of relativistic beaming), so in this situation the right answer is that the flux goes down to ~0, but the photon number density remains unchanged. In this situation you clearly can’t deduce the change in flux just from the change in photon density.
The situation I’m describing it’s just an idle thought experiment — there are plenty of real astrophysical systems where this sort of effect is important. An example is photoevaporation of protoplanetary disks, where the disk may be shadowed from direct stellar radiation by a puffed-up inner rim, but the disk surface nonetheless receives a significant ionizing photon flux because recombinations to the ground state in the disk corona create ionizing photons that are isotropic in their direction and are produced at positions above the disk that aren’t shadowed.

Sure, I agree that in general this will not work. But I think we do want to optimize the photoionization network for Gen2 if possible. 10% is a lot of node-hours.

Ok, but how are you proposing this be handled in the future? The current network is a toy example; it’s not the one we’re planning on using for the Gen 2 runs. For production runs we will be using a more realistic cooling treatment that will have to be generated with JAFF, so the optimizations you’re making here won’t have any effect on the production runs. Moreover, for the production runs I would advocate that we should use a network that properly does photon redistribution by having a case A recombination coefficient and keeping the recombination to the ground state terms, rather than just adopting the on-the-spot approximation as in this toy example. We’re doing so here only because the similarity solution to such we’re comparing also uses the on-the-spot approximation. If we drop that approximation for the production runs, your trick won’t work for them.

Right, this is just a performance prototype. I propose we have an option in JAFF to turn on this optimization. It can be off by default, since that's more general. I would advocate for using case B for the production simulations. I am skeptical that M1 can accurately compute the radiation anisotropy for case A, so I don't see how that would give us a better answer in practice.

I don’t agree on case B, but we don’t need to resolve that now. I’m fine with going ahead with this PR as long as it is understood that what we’re doing is just scoping out a feature to add to JAFF.

@BenWibking BenWibking changed the title Algebraically eliminate radiation flux from photoionization network More performance optimizations for photoionization ODE Jun 13, 2026
@BenWibking

BenWibking commented Jun 13, 2026

Copy link
Copy Markdown
Collaborator Author

Just to clarify: the goal of this PR is an existence proof for how fast this ODE system can be solved and a real-world proof of the LLM-driven performance optimization strategy.

Once we have this version optimized, we can try to ensure that JAFF reaches (or at least gets close) to this performance baseline for the final version of the network that we use in production.

@BenWibking BenWibking changed the title More performance optimizations for photoionization ODE [Optimization] Photoionization ODE performance Jun 13, 2026
@BenWibking

BenWibking commented Jun 13, 2026

Copy link
Copy Markdown
Collaborator Author

@markkrumholz Ok, eliminating the radiation flux actually makes this 10 times faster. But this is actually an artifact of how extremely small the current absolute radiation flux tolerance is.

@James471

Copy link
Copy Markdown
Contributor

@BenWibking If we really want to remove flux for some networks / problems and have them for others, I can add a preprocessor symbol that keeps or removes the flux variable from Microphysics and photochemistry.hpp.

@markkrumholz

Copy link
Copy Markdown
Collaborator

@BenWibking If we really want to remove flux for some networks / problems and have them for others, I can add a preprocessor symbol that keeps or removes the flux variable from Microphysics and photochemistry.hpp.

I think this is the sort of thing that would be better to put into JAFF, rather than putting it in Quokka. JAFF already has an option for whether, when emitting a photochemistry network, if should emit equations for the photon number densities only or also for the fluxes. This is just a slight variation on that.

@James471

Copy link
Copy Markdown
Contributor

You would need to change array sizes. Right now, it's just
constexpr int MicrophysicsNumRadVarsPerGroup = 1;
This would have to be wrapped in an ifdef if we want a choice.

@BenWibking

Copy link
Copy Markdown
Collaborator Author

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 2 pipeline(s).

@BenWibking

Copy link
Copy Markdown
Collaborator Author

For concreteness, this table shows the latest performance results with a column for the relative cost of one radiation substep compared to one hydro-only step:


  ┌────────────┬──────────┬──────────┬──────────┬─────────┬────────┬──────────┬───────────────────┬──────────┬───────────────────┬────────────┐
  │ Integrator │    Steps │      RHS │      Jac │    Chem │  ODE % │      Rad │     Rad substeps/ │      Rad │               FoM │ Hydro-only │
  │            │          │          │          │         │        │    excl. │             hydro │   step / │                   │            │
  │            │          │          │          │         │        │    ODE % │                   │    Hydro │                   │            │
  │            │          │          │          │         │        │          │                   │     step │                   │            │
  ├────────────┼──────────┼──────────┼──────────┼─────────┼────────┼──────────┼───────────────────┼──────────┼───────────────────┼────────────┤
  │ ROS2S      │ 1.003933 │ 3.011799 │ 1.003933 │ 2.352 s │ 54.93% │   38.30% │              9.94 │    0.83x │ 0.131311 us/zone- │      9.89x │
  │            │          │          │          │         │        │          │                   │          │            update │            │
  │ Rodas3P    │ 1.018669 │ 5.093346 │ 1.018669 │ 2.373 s │ 54.86% │   38.45% │              9.94 │    0.83x │ 0.132762 us/zone- │      9.96x │
  │            │          │          │          │         │        │          │                   │          │            update │            │
  │ Rodas4P    │ 1.022179 │ 6.133072 │ 1.022179 │ 2.420 s │ 53.81% │   38.57% │              9.94 │    0.84x │ 0.135083 us/zone- │     10.25x │
  │            │          │          │          │         │        │          │                   │          │            update │            │
  │ Rodas5P    │ 1.022797 │ 8.182373 │ 1.022797 │ 3.492 s │ 61.13% │   31.39% │              9.94 │    1.06x │ 0.173532 us/zone- │     12.45x │
  │            │          │          │          │         │        │          │                   │          │            update │            │
  │ VODE       │ 3.070011 │ 6.100708 │ 3.014877 │ 2.684 s │ 55.57% │   34.94% │              9.94 │    0.90x │ 0.143236 us/zone- │     11.06x │
  │            │          │          │          │         │        │          │                   │          │            update │            │
  └────────────┴──────────┴──────────┴──────────┴─────────┴────────┴──────────┴───────────────────┴──────────┴───────────────────┴────────────┘

@BenWibking

Copy link
Copy Markdown
Collaborator Author

One warning to fix:

/home/runner/work/quokka/quokka/extern/Microphysics/integration/Rosenbrock/rosenbrock_integrator.H: In function ‘int rosenbrock_integrator(BurnT&, rosenbrock_t<integrator_neqs<BurnT>()>&)’:
/home/runner/work/quokka/quokka/extern/Microphysics/integration/Rosenbrock/rosenbrock_integrator.H:480:55: error: ‘rosenbrock_timestep_controller’ is not a member of ‘integrator_rp’
[ 64%] Building CXX object src/problems/PrimordialChem/CMakeFiles/PrimordialChem.dir/__/__/io/DerivedParticleDeposition.cpp.o
  480 |                   << " controller=" << integrator_rp::rosenbrock_timestep_controller
      |                                                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

@chongchonghe

Copy link
Copy Markdown
Contributor

Unfortunately I think we should not merge this because, while it speeds up this particular case, it is not a generalizable solution that we can use generically — which is why the code was written the way it was in the first place. The problem is that this approach works only if photons can only be absorbed, not if they can also be created. Consider a situation in which we have an external radiation field entering a cell that supplies a non-zero photon number density and photon flux, which is being absorbed in the cell, but we also have some process in the cell that creates photons in the same band. In this case we could have a situation where essentially all the incoming photons get absorbed, but at the same time the emission process creates an equal number of photons. The emitted photons will be isotropic (neglecting the small effects of relativistic beaming), so in this situation the right answer is that the flux goes down to ~0, but the photon number density remains unchanged. In this situation you clearly can’t deduce the change in flux just from the change in photon density.

The situation I’m describing it’s just an idle thought experiment — there are plenty of real astrophysical systems where this sort of effect is important. An example is photoevaporation of protoplanetary disks, where the disk may be shadowed from direct stellar radiation by a puffed-up inner rim, but the disk surface nonetheless receives a significant ionizing photon flux because recombinations to the ground state in the disk corona create ionizing photons that are isotropic in their direction and are produced at positions above the disk that aren’t shadowed.

Right. This can be understood by the fact that we do not include the same isotropic, recombination term in flux as in photon number density. 

Removing the convergence check on radiation flux ( #1956 ) resolves this concern and gives the benefits of both worlds.

@BenWibking

BenWibking commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator Author

Removing the convergence check on radiation flux ( #1956 ) resolves this concern and gives the benefits of both worlds.

Yes, I agree that is a better solution. It's both faster and more accurate. I'll put the flux equation back into this PR, but keep the other optimiztions. Those optimizations might still make it up to ~30% faster.

@BenWibking

Copy link
Copy Markdown
Collaborator Author

The remaining microoptimizations don't appear to have a significant effect now that the tolerances are much looser. ROS2S does make a significant (+20%) walltime improvment on DTypeFront, but it can already be used on the development branch. Closing.

@BenWibking BenWibking closed this Jun 25, 2026
@BenWibking BenWibking deleted the BenWibking/photochemistry-flux-elim branch June 25, 2026 02:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants