Rb/callmip phase1a by braghiere · Pull Request #1693 · CliMA/ClimaLand.jl

braghiere · 2026-03-30T02:14:42Z

Purpose

CalLMIP Phase 1a submission for the DK-Sor site (Sorø, Denmark — temperate beech forest).
16-parameter EKI calibration of ClimaLand, followed by CES uncertainty quantification,
producing 4 NetCDF files submitted to modelevaluation.org.

Result: NEE RMSE −27% cal / −35% val vs prior. Bias reduced −84%.
4 ALMA-compliant NetCDF files submitted. CES emulator failure diagnosed and documented
for Phase 1b — see CES_Phase2_Issues_for_OLLie.md.

Pipeline

Step 0 — Build observation vector ~2 min
calibrate_dk_sor/generate_observations.jl (slurm_gen_observations.sh)
→ calibrate_dk_sor/observations.jld2

Step 1 — EKI calibration (33 members × 10 iterations) ~9 h
calibrate_dk_sor/run_calibration.jl (slurm_calibration.sh)
uses: model_interface.jl | priors.jl 1 node · 24 cores · ~192 GB
→ calibrate_dk_sor/output/iteration_NNN/

Step 2 — CES: train GP emulator + run MCMC posterior ~3 h
callmip_uq_dk_sor/emulate_sample.jl (slurm_emulate_sample.sh)
uses: ../calibrate_dk_sor/priors.jl 1 node · 8 CPUs · 64 GB
→ output_posterior_uq/ GP on 264 pts; 100k MCMC steps

Step 3 — Run 50-member posterior ensemble + analyze ~2–3 h
callmip_uq_dk_sor/run_posterior_ensemble.jl (slurm_posterior_ensemble.sh)
callmip_uq_dk_sor/analyze_posterior_ensemble.jl
→ output_posterior_ensemble/, output_posterior_analysis/

Step 4 — Run CalLMIP simulations (prior + EKI-optimal) ~73 min
callmip_uq_dk_sor/run_callmip_simulations.jl (slurm_callmip_simulations.sh)
uses: callmip_model_interface.jl
uses: ../calibrate_dk_sor/prior_mean_parameters.toml (member 001)
uses: ../calibrate_dk_sor/priors.jl (member 002)
→ output_callmip_sims/

Step 5 — Write NetCDF + evaluate ~10 min
callmip_uq_dk_sor/write_callmip_netcdf.jl (slurm_postprocess.sh)
callmip_uq_dk_sor/evaluate_calibration.jl
→ callmip_output/*.nc, output_evaluation/

Total: ~16 h wall · ~400 CPU-h

✅ Phase 1a result: EKI calibration works

EKI converges in 9 iterations. NEE RMSE drops 4 orders of magnitude from the
blown-up initial ensemble to iter 9.

	Prior	EKI optimal	Change
NEE RMSE cal (gC m⁻² d⁻¹)	3.053	2.228	−27%
NEE RMSE val (gC m⁻² d⁻¹)	3.749	2.420	−35%
NEE bias cal	+0.437	+0.070	−84%
Qle RMSE cal (W m⁻²)	38.9	29.8	−23%
Qh RMSE cal (W m⁻²)	61.1	51.5	−16%

✅ What the calibration learned

The three new respiration parameters — root_leaf_nitrogen_ratio,
stem_leaf_nitrogen_ratio (autotrophic) and soilCO2_activation_energy
(heterotrophic/DAMM Arrhenius) — are the dominant signal, both strongly shifted
and agreed upon by EKI and MCMC. canopy_K_lw and canopy_z_0m_coeff are
tightly data-informed. moisture_stress_c, leaf_Cd, both Michaelis constants,
and soilCO2_pre_exponential_factor remain underdetermined from NEE+Qle+Qh alone.

⚠️ Phase 1b problem 1: Structural — winter respiration

Both prior and EKI-optimal produce near-zero NEE in Oct–Mar. FLUXNET shows
+1.5 to +2.4 gC m⁻² d⁻¹ sustained CO₂ source. Despite three dedicated
respiration parameters, this seems to be a structural limitation, not
resolvable by parameter calibration alone.

[Annual means 2003–2013 — upload prior_vs_posterior_annual.png]

⚠️ Phase 1b problem 2: CES emulator failure

The GP emulator uses CES v1 encoder_schedule API. Validation scatter looks
reasonable in the bulk, but the MCMC posterior disagrees sharply with the EKI
optimum for pmodel_β, moisture_stress_c, and canopy_z_0m_coeff.

Root cause: EKI collapsed tightly to pmodel_β ≈ 19 → all 264 GP training
points cluster there → GP incorrectly extrapolates low cost near prior mean
pmodel_β ≈ 38 → MCMC settles near prior → near-zero GPP → flat NEE posterior.

CES 90% bands cover only 31.5% of NEE observations (expected 90%).
The EKI-optimal member is the scientifically meaningful Phase 1a result.
Full diagnosis + proposed fixes: CES_Phase2_Issues_for_OLLie.md.

Grey = prior, blue = MCMC posterior, red dashed = EKI optimum. The disagreement between blue and red is the emulator failure signature.

Submitted files

callmip_output/
ClimaLand.CalLMIP1.0_Expt1_DK-Sor_Cal_Prior.nc
ClimaLand.CalLMIP1.0_Expt1_DK-Sor_Cal_Posterior.nc
ClimaLand.CalLMIP1.0_Expt1_DK-Sor_Val_Temporal_Prior.nc
ClimaLand.CalLMIP1.0_Expt1_DK-Sor_Val_Temporal_Posterior.nc

13 ALMA variables. 3 all-NaN (ESoil, Qg, TotAbovBioMass — not in current model).
56 days masked in EKI-optimal member (numerical blowup, cold-season transitions).

Phase 1b targets

Winter respiration — structural fix in heterotrophic/autotrophic respiration
CES emulator — retrain GP with broader ensemble coverage (see CES_Phase2_Issues_for_OLLie.md)
ESoil / Qg — enable soil evaporation and ground heat flux diagnostics
Numerical instability — 56 blowup days in EKI-optimal during cold-season transitions

odunbar

I've only looked at
emulate_sample.jl - Looks like a great start!

My main comment is that

I noticed that all the kwargs for building emulators are deprecated. are you using CES v1? If so, then you need to do something that matches the online docs (Emulator) page, or the template I provided
There is very little emulator validation, beyond MSE. I think it is critical that we have sufficient plotting etc of emulator predictions to validate how well it is doing. Bad emulators will lead to MCMC non-convergence and problematic posteriors. I made some small note of this here

AlexisRenchon · 2026-04-08T16:14:02Z

just a note - RAI is actually often larger than LAI, from what I see in the literature

PFT | Typical RAI range | Notes
-- | -- | --
Tropical evergreen forest | 10 – 40 | Huge fine-root surface area
Temperate deciduous forest | 8 – 25 | Strong seasonal turnover
Boreal forest | 5 – 15 | Limited by cold soils
Shrubland | 5 – 20 | Often deep + opportunistic
Grassland | 10 – 30 | Dense fibrous roots
Crops | 5 – 20 | Depends on species & stage
Tundra | 2 – 10 | Shallow due to permafrost

https://www.pnas.org/doi/10.1073/pnas.94.14.7362

[skip ci]

…faults)

Critical bug fixes for DK-Sor CalLMIP Phase 1a calibration: Bug fixes (model): - RAI = FT(1.5) (was 17.5, 10x too large → Ra_root >> GPP → NEE≈0 always) - LAI: Copernicus 'LAI' (was MODIS 'LAI_alternative', 2x lower peak) - SOC consumption tendency added to DAMM biogeochemistry for carbon conservation - LandSimulation type param relaxed: I (was I <: DEIntegrator) Calibration (run_calibration.jl, slurm_calibration.sh): - DT = 900 s (was 450) - 12-param Alexis priors, no AR params (was 14-param) - ntasks = 25 (= 2*12+1 UKI sigma points) - Removed Pkg.update() from slurm script (saves ~3.5h) CES pipeline (emulate_sample.jl): - find_latest_ekp_path: picks up latest calibration iteration - N_train = 8 (10-iter calibration -> max usable = 8) - init_stepsize = 1e-3, max_iter = 500 for pCN optimize_stepsize - 12-param priors (matches calibration exactly) CalLMIP sims (callmip_model_interface.jl, run_callmip_simulations.jl): - RAI = FT(1.5) - 12-param Alexis priors New scripts: - submit_calibrate_then_ces.sh: chains calibration -> emulate_sample via --dependency=afterok - plot_calibration_check.jl: seasonality diagnostic plot (prior vs posterior) - run_alexis_test.jl / slurm_alexis_test.sh: forward run with Alexis posterior Known open issue: - emulate_sample.jl MCMC (pCN-MH) fails: pCN beta -> 1e-95, acceptance 0.0005 Root cause: 1257-dim obs vector creates extremely peaked GP posterior Questions documented in ISSUES_AND_QUESTIONS.md

…submission - Add priors.jl (16-param prior definitions, used by calibration + CES + CalLMIP sims) - Add plot_multiyear_check.jl (multi-year EKI posterior stability check) - Add evaluate_calibration.jl + slurm_postprocess.sh (full skill scores + figures) - Add plot_calibration_improvement.jl, plot_posterior_distributions.jl, plot_callmip_netcdf.py, calibration_rmse_convergence.jl (diagnostic scripts) - Add CALLMIP_Phase1a_Report.md (complete Phase 1a results, steps 1-6) - Add CALLMIP_Phase1a_ModelDescription.md (paper section 12) - Add CES_Phase2_Issues_for_OLLie.md (GP emulator failure diagnosis) - Update emulate_sample.jl to CES v1 encoder_schedule API - Update write_callmip_netcdf.jl: fix JLD2 sub-dict lookup, column_integral, empty diagnostic guard, blowup masking, Val_Temporal naming - Update run_callmip_simulations.jl, callmip_model_interface.jl: 16 params - Expand priors to 16 parameters (+ root/stem N ratios, soilCO2_activation_energy) CalLMIP Phase 1a results (DK-Sor, 2003-2014): NEE RMSE: 3.053 -> 2.228 gC/m2/d (-27% cal), -35% val, bias -84% Qle RMSE: 38.9 -> 29.8 W/m2 (-23% cal) 4 NetCDF files submitted to modelevaluation.org

odunbar

Looks great! Very happy to see the pipeline go through with the new dim reduction and kwargs. Emulator looks fine

I recommended a couple more checks for validation
Despite Claude saying it failed, I think the experiment has actually worked. There is no constraint that the MCMC needs to line up with UKI optimum, this is just the principle for selecting training points so it helps us get a good emulator if they remain consistent. I think it is worth noting as well that we are only plotting the 1D marginals of a high dimensional correlated distribution which only tell us part of the story.

My initial impression, just from the UKI, is that this experiment illustrates that a couple of our priors are a bit off, I have a suggestion for a follow up validation/calibration items that might help with this.

odunbar · 2026-04-10T17:48:54Z

+# Per-output-dimension RMSE (in the decorrelated/reduced space)
+rmse_per_dim_train = sqrt.(mean((y_true_train .- y_mean_train) .^ 2; dims = 2))
+rmse_per_dim_test  = sqrt.(mean((y_true_test  .- y_mean_test)  .^ 2; dims = 2))
+


I think it would also be useful to look at the y_var_train/y_var_test (it is also a covariance, so maybe "cov" is more suitable name). In the transform_to_real = true configuration, they should be representing the coupled uncertainty of

y_cov = obs_noise_cov + gp_uncertainty_in_prediction

And if things are working OK you would expect near where there is a good amount of data, the y_var to be close to the obs_noise_cov. if certain parameter regimes have very large y_var then we can look at where these are (i would expect somewhere far from the posterior). The reason i mention this, is that the y_var is also used in the sampling stage not just the y_mean.

odunbar · 2026-04-10T18:17:57Z

+
+function build_dk_sor_priors()
+    priors_vec = [
+        # ── Canopy / conductance ─────────────────────────────────────────────


how confident are we on these priors?
In particular moisture_stress_c and root_leaf_nitrogen_ratio, stem_leaf_nitrogen_ratio etc.

I ask because the EKI optimum is very far from the prior mass. ( In results at 4/10) In a fully Bayesian setting we fully trust the prior, and so this would indicate it is a very rare result. In the case that we don't, we may find it useful to run an experiment that breaks the Bayesian setting slightly.

I wonder if recentering the prior at the current EKI optimum (though keeping a large spread) and rerunning the calibration whether these numbers would be more stable and well sampled. It would be nice to finish the calibration with the parameter value not too far out in the tail of the prior.

This also leads to better sampling and emulator UQ.

odunbar · 2026-04-10T18:27:49Z

+    rmse_path = joinpath(out_dir,
+        "emulator_validation_rmse_its$(first(train_iterations))to$(last(train_iterations)).png")
+    savefig(p_rmse, rmse_path)
+    @info "Emulator per-dimension RMSE plot → $rmse_path"


As a stretch goal, I think it would be great to take a couple of representative parameter values that perform for example (best/median/worst) and plot the statistics in physical space against the true parameters I'm not sure what variables we are taking, but i think the physicists/land scientists would like something more interpretable to validate than in the decorrelated space (even though mathematically this is probably more correct)

odunbar · 2026-04-10T18:52:12Z

+)
+display(chain)
+
+posterior = MarkovChainMonteCarlo.get_posterior(mcmc, chain)


I would be interested in seeing whether the posterior is sampling in a regime that is good / bad for the emulator. I propose the following test:

Take a bunch of the posterior samples

Use them like the y_var test set above to evaluate if the emulator was in a regime where y_var is very large, or is it close to obs_noise_cov. (comparing against the test data from before). If large then we could deduce that the MCMC is not sampling in a region where the emulator is most accurate.

braghiere requested review from AlexisRenchon, kmdeck and odunbar March 30, 2026 02:15

odunbar reviewed Mar 30, 2026

View reviewed changes

Comment thread experiments/callmip_uq_dk_sor/emulate_sample.jl Outdated

Comment thread experiments/callmip_uq_dk_sor/emulate_sample.jl Outdated

Comment thread experiments/callmip_uq_dk_sor/emulate_sample.jl Outdated

Comment thread experiments/callmip_uq_dk_sor/emulate_sample.jl Outdated

AlexisRenchon and others added 25 commits April 9, 2026 15:46

Calibrate NEON soil co2

48e79d9

wip

c60a3fa

use site meteorology

63067b0

[skip ci]

change prior

09dad95

wip

d117cc6

wip [skip ci]

d3af287

local changes

5d807cf

biomass height, clamp bug, precip factor, precip split

be8f7a2

wip [skip ci]

ce34687

fix fluxes [skip ci]

2009d08

fix parameter [skip ci]

4496e85

site specific params

da9e73b

use hi res soil data to get site parameters

920a943

calibration on hpc with climacalibrate

68193b2

addressing Kat review comments

a0e7ffa

small fix

ac9cb7b

days valid all years for same dim

251a539

pad missing day as 0, large variance

2e30626

refactor no minibatching

8de6033

fix autotrophic respiration

7ebfdfe

fix setup

a4dc954

wip

fae17b2

wip

78bf9a4

fix tr diag

6c40f88

update manifest

7a84209

Renato Braghiere added 3 commits April 9, 2026 15:46

wip: fix callmip UQ pipeline bugs (pkgdir, LAI, 12-param prior, AR de…

0b33281

…faults)

braghiere force-pushed the rb/callmip_phase1a branch from ddaab69 to 4df8db9 Compare April 9, 2026 22:58

odunbar reviewed Apr 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rb/callmip phase1a#1693

Rb/callmip phase1a#1693
braghiere wants to merge 28 commits intomainfrom
rb/callmip_phase1a

braghiere commented Mar 30, 2026 •

edited

Loading

Uh oh!

odunbar left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AlexisRenchon commented Apr 8, 2026 •

edited

Loading

Uh oh!

odunbar left a comment

Uh oh!

odunbar Apr 10, 2026

Uh oh!

odunbar Apr 10, 2026

Uh oh!

odunbar Apr 10, 2026

Uh oh!

odunbar Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

braghiere commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Pipeline

✅ Phase 1a result: EKI calibration works

✅ What the calibration learned

⚠️ Phase 1b problem 1: Structural — winter respiration

⚠️ Phase 1b problem 2: CES emulator failure

Submitted files

Phase 1b targets

Uh oh!

odunbar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AlexisRenchon commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

odunbar left a comment

Choose a reason for hiding this comment

Uh oh!

odunbar Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

odunbar Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

odunbar Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

odunbar Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

braghiere commented Mar 30, 2026 •

edited

Loading

AlexisRenchon commented Apr 8, 2026 •

edited

Loading