Skip to content

Commit e2e25f5

Browse files
committed
Add ccpp_constituent_prop_mod.F90.patch and doc/cam4_fwaut_constituent_order.md
1 parent 0fb1a1a commit e2e25f5

3 files changed

Lines changed: 199 additions & 1 deletion

File tree

capgen-ng/ccpp_capgen_ng.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -690,7 +690,7 @@ def _load_metadata_files(
690690
('horizontal_loop_begin', 'integer', 'lower horizontal slice bound at scheme call sites'),
691691
('horizontal_loop_end', 'integer', 'upper horizontal slice bound at scheme call sites'),
692692
('number_of_physics_threads','integer', 'physics-internal thread budget (pass 1 if unused)'),
693-
('ccpp_error_code', 'integer', 'CCPP error flag'),
693+
('ccpp_error_code', 'integer', 'CCPP error code'),
694694
('ccpp_error_message', 'character', 'CCPP error message'),
695695
]
696696
# NOTE: the threading index/count (``thread_number`` / ``number_of_threads``)
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
--- capgen-ng/src/ccpp_constituent_prop_mod.F90
2+
+++ capgen-ng/src/ccpp_constituent_prop_mod.F90
3+
@@ -1392,6 +1392,17 @@
4+
type(ccpp_constituent_properties_t), pointer :: cprop
5+
character(len=dimname_len) :: dimname
6+
character(len=*), parameter :: subname = 'ccp_model_const_table_lock'
7+
+ ! === ONE-OFF cam4 constituent-reorder experiment ===
8+
+ ! When .true., force the cam4 advected water species into original-capgen
9+
+ ! order [cloud_liquid=1, cloud_ice=2, water_vapor=3] instead of hash-table
10+
+ ! order, to prove the FWAUT b4b diff is driven purely by constituent order.
11+
+ ! Only the 3 cam4 water-species std-names are remapped; everything else keeps
12+
+ ! its normal hash-order index, so other suites are unaffected unless they
13+
+ ! advect exactly these names. Flip to .false. (or delete) to restore.
14+
+ logical, parameter :: l_const_reorder = .true.
15+
+ integer :: const_pos
16+
+ character(len=512) :: sname_reorder
17+
+ ! === end experiment ===
18+
19+
astat = 0
20+
errcode_local = 0
21+
@@ -1460,9 +1471,24 @@
22+
errcode_local = errcode_local + 1
23+
exit
24+
end if
25+
- call cprop%set_const_index(index_advect, &
26+
+ ! === ONE-OFF cam4 constituent-reorder experiment ===
27+
+ const_pos = index_advect
28+
+ if (l_const_reorder) then
29+
+ call cprop%standard_name(sname_reorder, &
30+
+ errcode=errcode, errmsg=errmsg)
31+
+ select case (trim(sname_reorder))
32+
+ case ('cloud_liquid_water_mixing_ratio_wrt_moist_air_and_condensed_water')
33+
+ const_pos = 1
34+
+ case ('cloud_ice_mixing_ratio_wrt_moist_air_and_condensed_water')
35+
+ const_pos = 2
36+
+ case ('water_vapor_mixing_ratio_wrt_moist_air_and_condensed_water')
37+
+ const_pos = 3
38+
+ end select
39+
+ end if
40+
+ call cprop%set_const_index(const_pos, &
41+
errcode=errcode, errmsg=errmsg)
42+
- call this%const_metadata(index_advect)%set(cprop)
43+
+ call this%const_metadata(const_pos)%set(cprop)
44+
+ ! === end experiment ===
45+
else
46+
index_const = index_const + 1
47+
if (index_const > num_vars) then
Lines changed: 151 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,151 @@
1+
# cam4 (QPC4) bit-for-bit difference: root cause is constituent registration order
2+
3+
**Status:** root cause found and **proven**. Decision requested from CAM-SIMA.
4+
**Date:** 2026-06-11 **Author:** D. Heinzeller
5+
6+
## Executive summary
7+
8+
The CAM-SIMA test
9+
`SMS_D_Ln9.mpasa120_mpasa120.QPC4.derecho_intel.cam-outfrq_analy_ic_cam4`
10+
(full cam4 physics on the MPAS dynamical core) fails its bit-for-bit (b4b)
11+
comparison against the capgen baseline. The difference is **machine-epsilon
12+
roundoff** — state and flux fields agree to 14–17 significant digits; the
13+
comparison is loud only in RK-microphysics *ratio* diagnostics (e.g. `FWAUT`,
14+
RMS ≈ 4.24e-2), which are ratios of two near-zero autoconversion rates and so
15+
amplify any roundoff. The behavior is identical under GNU and Intel.
16+
17+
The physics source, `suite_cam4.xml`, and `src/data/registry.xml` are
18+
**byte-identical** between the two builds. The difference is purely in the
19+
generated CCPP caps. We have traced it to a single cause and **proven** it:
20+
21+
> **capgen-ng registers the advected constituents in a different order than the
22+
> original capgen.** Specifically, `cloud_liquid` and `cloud_ice`
23+
> are swapped. This changes the floating-point summation order in the energy/water
24+
> thermodynamic diagnostics, which the energy fixer then spreads across all columns
25+
> as a tiny, pervasive heating — the source of the b4b difference.
26+
27+
A one-off patch that forces capgen-ng's advected water species into the
28+
original-capgen order makes **QPC4 bit-for-bit identical** to the baseline.
29+
30+
## The difference (runtime constituent list, `debug_output = 2`)
31+
32+
| index | original capgen (baseline) | capgen-ng |
33+
|------:|----------------------------|-----------|
34+
| 1 | **cloud_liquid** (advected) | **cloud_ice** (advected) |
35+
| 2 | **cloud_ice** (advected) | **cloud_liquid** (advected) |
36+
| 3 | water_vapor (advected) | water_vapor (advected) |
37+
| 4–10 | CFC12, O3, CH4, O2, N2O, CFC11, CO2 | CFC12, O2, CH4, CO2, O3, N2O, CFC11 |
38+
39+
Indices 1–3 are the advected water species; 4–10 are non-advected trace gases.
40+
The advected block is what matters (see mechanism). `water_vapor` is index 3 in
41+
both — the only advected difference is the **cloud_liquid ↔ cloud_ice swap**.
42+
43+
## Mechanism
44+
45+
1. `air_composition` builds `thermodynamic_active_species_idx` by walking the
46+
advected constituents in **constituent-index order**.
47+
2. `get_hydrostatic_energy` (`cam_thermo`) sums the water species in that order.
48+
Baseline sums `cloud_liquid + cloud_ice + water_vapor`; capgen-ng sums
49+
`cloud_ice + cloud_liquid + water_vapor`. Same values, **different FP order**.
50+
3. The resulting machine-eps difference in total energy/water is picked up by the
51+
global energy fixer (`check_energy_fix`), which redistributes it as a uniform
52+
heating across all columns. From that point the two runs differ at roundoff
53+
level everywhere, surfacing loudly only in ratio diagnostics like `FWAUT`.
54+
55+
`air_composition.F90` and `cam_constituents.F90` are byte-identical between the
56+
two builds, so the entire difference originates in the registration order the
57+
generated cap produces. In the CCPP framework, registration order is the
58+
hash-table iteration order in `ccpp_model_constituents_t%lock_table` (advected
59+
packed first) — i.e. an arbitrary, generator-dependent order, not a deliberate
60+
physical ordering.
61+
62+
## Proof
63+
64+
Forcing capgen-ng's advected water species into the baseline order
65+
`[cloud_liquid = 1, cloud_ice = 2, water_vapor = 3]` (a flag-guarded one-off
66+
patch in the framework's `ccp_model_const_table_lock`) makes QPC4 reproduce the
67+
ccpp-prebuild baseline **bit-for-bit** (cprnc: all fields identical). This
68+
isolates constituent ordering as the *sole* cause. Patch (file `ccpp_constituent_prop_mod.F90.patch` in the top-level directory of the `feature/capgen-ng` ccpp-framework branch):
69+
70+
```
71+
--- capgen-ng/src/ccpp_constituent_prop_mod.F90
72+
+++ capgen-ng/src/ccpp_constituent_prop_mod.F90
73+
@@ -1392,6 +1392,17 @@
74+
type(ccpp_constituent_properties_t), pointer :: cprop
75+
character(len=dimname_len) :: dimname
76+
character(len=*), parameter :: subname = 'ccp_model_const_table_lock'
77+
+ ! === ONE-OFF cam4 constituent-reorder experiment ===
78+
+ ! When .true., force the cam4 advected water species into original-capgen
79+
+ ! order [cloud_liquid=1, cloud_ice=2, water_vapor=3] instead of hash-table
80+
+ ! order, to prove the FWAUT b4b diff is driven purely by constituent order.
81+
+ ! Only the 3 cam4 water-species std-names are remapped; everything else keeps
82+
+ ! its normal hash-order index, so other suites are unaffected unless they
83+
+ ! advect exactly these names. Flip to .false. (or delete) to restore.
84+
+ logical, parameter :: l_const_reorder = .true.
85+
+ integer :: const_pos
86+
+ character(len=512) :: sname_reorder
87+
+ ! === end experiment ===
88+
89+
astat = 0
90+
errcode_local = 0
91+
@@ -1460,9 +1471,24 @@
92+
errcode_local = errcode_local + 1
93+
exit
94+
end if
95+
- call cprop%set_const_index(index_advect, &
96+
+ ! === ONE-OFF cam4 constituent-reorder experiment ===
97+
+ const_pos = index_advect
98+
+ if (l_const_reorder) then
99+
+ call cprop%standard_name(sname_reorder, &
100+
+ errcode=errcode, errmsg=errmsg)
101+
+ select case (trim(sname_reorder))
102+
+ case ('cloud_liquid_water_mixing_ratio_wrt_moist_air_and_condensed_water')
103+
+ const_pos = 1
104+
+ case ('cloud_ice_mixing_ratio_wrt_moist_air_and_condensed_water')
105+
+ const_pos = 2
106+
+ case ('water_vapor_mixing_ratio_wrt_moist_air_and_condensed_water')
107+
+ const_pos = 3
108+
+ end select
109+
+ end if
110+
+ call cprop%set_const_index(const_pos, &
111+
errcode=errcode, errmsg=errmsg)
112+
- call this%const_metadata(index_advect)%set(cprop)
113+
+ call this%const_metadata(const_pos)%set(cprop)
114+
+ ! === end experiment ===
115+
else
116+
index_const = index_const + 1
117+
if (index_const > num_vars) then
118+
```
119+
120+
## Assessment — neither order is "wrong"
121+
122+
Both builds register the same constituents with identical properties; the
123+
ordering is not physically meaningful, and the resulting solutions are
124+
roundoff-equivalent and both physically correct. The b4b failure reflects only
125+
that capgen-ng's (arbitrary) order differs from the (equally arbitrary) order
126+
the capgen baseline happened to produce.
127+
128+
## Decision requested
129+
130+
To resolve QPC4 (and any other case sensitive to constituent order), we propose:
131+
132+
1. Give capgen-ng a **deterministic, documented** constituent-registration order
133+
(e.g. water vapor first, with a clear rule for how constituents land in the
134+
array) — replacing today's hash-bucket order.
135+
2. Adopt the new documented order and **re-baseline** the affected CAM-SIMA cases once.
136+
137+
The temporary proof patch will be removed once the path is agreed.
138+
139+
## Artifacts
140+
141+
- **Patch (git diff):** `<FILL IN: path to the .patch / repo+commit>`
142+
reproduce with
143+
`git -C EXT/cam-sima-ng/ccpp_framework/capgen-ng diff src/ccpp_constituent_prop_mod.F90`.
144+
- **Run directories (Derecho):**
145+
- Baseline (original capgen): `<FILL IN>`
146+
- capgen-ng, unpatched (shows the FWAUT diff): `<FILL IN>`
147+
- capgen-ng + reorder patch (**b4b**): `<FILL IN>`
148+
- **cprnc summaries:**
149+
- unpatched vs baseline: `<FILL IN: FWAUT RMS ≈ 4.24e-2, state fields ~15 digits>`
150+
- patched vs baseline: `<FILL IN: all fields identical (b4b)>`
151+
- **Constituent lists (`debug_output = 2`, `atm.log`):** as tabulated above.

0 commit comments

Comments
 (0)