Skip to content

Rb/callmip phase1a#1693

Draft
braghiere wants to merge 28 commits intomainfrom
rb/callmip_phase1a
Draft

Rb/callmip phase1a#1693
braghiere wants to merge 28 commits intomainfrom
rb/callmip_phase1a

Conversation

@braghiere
Copy link
Copy Markdown
Member

@braghiere braghiere commented Mar 30, 2026

Purpose

CalLMIP Phase 1a submission for the DK-Sor site (Sorø, Denmark — temperate beech forest).
16-parameter EKI calibration of ClimaLand, followed by CES uncertainty quantification,
producing 4 NetCDF files submitted to modelevaluation.org.

Result: NEE RMSE −27% cal / −35% val vs prior. Bias reduced −84%.
4 ALMA-compliant NetCDF files submitted. CES emulator failure diagnosed and documented
for Phase 1b — see CES_Phase2_Issues_for_OLLie.md.


Pipeline

Step 0 — Build observation vector ~2 min
calibrate_dk_sor/generate_observations.jl (slurm_gen_observations.sh)
→ calibrate_dk_sor/observations.jld2

Step 1 — EKI calibration (33 members × 10 iterations) ~9 h
calibrate_dk_sor/run_calibration.jl (slurm_calibration.sh)
uses: model_interface.jl | priors.jl 1 node · 24 cores · ~192 GB
→ calibrate_dk_sor/output/iteration_NNN/

Step 2 — CES: train GP emulator + run MCMC posterior ~3 h
callmip_uq_dk_sor/emulate_sample.jl (slurm_emulate_sample.sh)
uses: ../calibrate_dk_sor/priors.jl 1 node · 8 CPUs · 64 GB
→ output_posterior_uq/ GP on 264 pts; 100k MCMC steps

Step 3 — Run 50-member posterior ensemble + analyze ~2–3 h
callmip_uq_dk_sor/run_posterior_ensemble.jl (slurm_posterior_ensemble.sh)
callmip_uq_dk_sor/analyze_posterior_ensemble.jl
→ output_posterior_ensemble/, output_posterior_analysis/

Step 4 — Run CalLMIP simulations (prior + EKI-optimal) ~73 min
callmip_uq_dk_sor/run_callmip_simulations.jl (slurm_callmip_simulations.sh)
uses: callmip_model_interface.jl
uses: ../calibrate_dk_sor/prior_mean_parameters.toml (member 001)
uses: ../calibrate_dk_sor/priors.jl (member 002)
→ output_callmip_sims/

Step 5 — Write NetCDF + evaluate ~10 min
callmip_uq_dk_sor/write_callmip_netcdf.jl (slurm_postprocess.sh)
callmip_uq_dk_sor/evaluate_calibration.jl
→ callmip_output/*.nc, output_evaluation/

Total: ~16 h wall · ~400 CPU-h


✅ Phase 1a result: EKI calibration works

EKI converges in 9 iterations. NEE RMSE drops 4 orders of magnitude from the
blown-up initial ensemble to iter 9.

Prior EKI optimal Change
NEE RMSE cal (gC m⁻² d⁻¹) 3.053 2.228 −27%
NEE RMSE val (gC m⁻² d⁻¹) 3.749 2.420 −35%
NEE bias cal +0.437 +0.070 −84%
Qle RMSE cal (W m⁻²) 38.9 29.8 −23%
Qh RMSE cal (W m⁻²) 61.1 51.5 −16%
calibration_improvement callmip_nc_ces_bands callmip_nc_seasonal

✅ What the calibration learned

The three new respiration parameters — root_leaf_nitrogen_ratio,
stem_leaf_nitrogen_ratio (autotrophic) and soilCO2_activation_energy
(heterotrophic/DAMM Arrhenius) — are the dominant signal, both strongly shifted
and agreed upon by EKI and MCMC. canopy_K_lw and canopy_z_0m_coeff are
tightly data-informed. moisture_stress_c, leaf_Cd, both Michaelis constants,
and soilCO2_pre_exponential_factor remain underdetermined from NEE+Qle+Qh alone.

parameter_shifts uncertainty_reduction

⚠️ Phase 1b problem 1: Structural — winter respiration

Both prior and EKI-optimal produce near-zero NEE in Oct–Mar. FLUXNET shows
+1.5 to +2.4 gC m⁻² d⁻¹ sustained CO₂ source. Despite three dedicated
respiration parameters, this seems to be a structural limitation, not
resolvable by parameter calibration alone.

[Annual means 2003–2013 — upload prior_vs_posterior_annual.png]


⚠️ Phase 1b problem 2: CES emulator failure

The GP emulator uses CES v1 encoder_schedule API. Validation scatter looks
reasonable in the bulk, but the MCMC posterior disagrees sharply with the EKI
optimum for pmodel_β, moisture_stress_c, and canopy_z_0m_coeff.

Root cause: EKI collapsed tightly to pmodel_β ≈ 19 → all 264 GP training
points cluster there → GP incorrectly extrapolates low cost near prior mean
pmodel_β ≈ 38 → MCMC settles near prior → near-zero GPP → flat NEE posterior.

CES 90% bands cover only 31.5% of NEE observations (expected 90%).
The EKI-optimal member is the scientifically meaningful Phase 1a result.
Full diagnosis + proposed fixes: CES_Phase2_Issues_for_OLLie.md.

Screenshot 2026-04-09 at 3 27 08 PM Grey = prior, blue = MCMC posterior, red dashed = EKI optimum. The disagreement between blue and red is the emulator failure signature.

Submitted files

callmip_output/
ClimaLand.CalLMIP1.0_Expt1_DK-Sor_Cal_Prior.nc
ClimaLand.CalLMIP1.0_Expt1_DK-Sor_Cal_Posterior.nc
ClimaLand.CalLMIP1.0_Expt1_DK-Sor_Val_Temporal_Prior.nc
ClimaLand.CalLMIP1.0_Expt1_DK-Sor_Val_Temporal_Posterior.nc

13 ALMA variables. 3 all-NaN (ESoil, Qg, TotAbovBioMass — not in current model).
56 days masked in EKI-optimal member (numerical blowup, cold-season transitions).


Phase 1b targets

  1. Winter respiration — structural fix in heterotrophic/autotrophic respiration
  2. CES emulator — retrain GP with broader ensemble coverage (see CES_Phase2_Issues_for_OLLie.md)
  3. ESoil / Qg — enable soil evaporation and ground heat flux diagnostics
  4. Numerical instability — 56 blowup days in EKI-optimal during cold-season transitions

Copy link
Copy Markdown
Member

@odunbar odunbar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've only looked at
emulate_sample.jl - Looks like a great start!

My main comment is that

  1. I noticed that all the kwargs for building emulators are deprecated. are you using CES v1? If so, then you need to do something that matches the online docs (Emulator) page, or the template I provided
  2. There is very little emulator validation, beyond MSE. I think it is critical that we have sufficient plotting etc of emulator predictions to validate how well it is doing. Bad emulators will lead to MCMC non-convergence and problematic posteriors. I made some small note of this here

Comment thread experiments/callmip_uq_dk_sor/emulate_sample.jl Outdated
Comment thread experiments/callmip_uq_dk_sor/emulate_sample.jl Outdated
Comment thread experiments/callmip_uq_dk_sor/emulate_sample.jl Outdated
Comment thread experiments/callmip_uq_dk_sor/emulate_sample.jl Outdated
@AlexisRenchon
Copy link
Copy Markdown
Member

AlexisRenchon commented Apr 8, 2026

just a note - RAI is actually often larger than LAI, from what I see in the literature

PFT | Typical RAI range | Notes
-- | -- | --
Tropical evergreen forest | 10 – 40 | Huge fine-root surface area
Temperate deciduous forest | 8 – 25 | Strong seasonal turnover
Boreal forest | 5 – 15 | Limited by cold soils
Shrubland | 5 – 20 | Often deep + opportunistic
Grassland | 10 – 30 | Dense fibrous roots
Crops | 5 – 20 | Depends on species & stage
Tundra | 2 – 10 | Shallow due to permafrost

https://www.pnas.org/doi/10.1073/pnas.94.14.7362

Renato Braghiere added 3 commits April 9, 2026 15:46
Critical bug fixes for DK-Sor CalLMIP Phase 1a calibration:

Bug fixes (model):
- RAI = FT(1.5) (was 17.5, 10x too large → Ra_root >> GPP → NEE≈0 always)
- LAI: Copernicus 'LAI' (was MODIS 'LAI_alternative', 2x lower peak)
- SOC consumption tendency added to DAMM biogeochemistry for carbon conservation
- LandSimulation type param relaxed: I (was I <: DEIntegrator)

Calibration (run_calibration.jl, slurm_calibration.sh):
- DT = 900 s (was 450)
- 12-param Alexis priors, no AR params (was 14-param)
- ntasks = 25 (= 2*12+1 UKI sigma points)
- Removed Pkg.update() from slurm script (saves ~3.5h)

CES pipeline (emulate_sample.jl):
- find_latest_ekp_path: picks up latest calibration iteration
- N_train = 8 (10-iter calibration -> max usable = 8)
- init_stepsize = 1e-3, max_iter = 500 for pCN optimize_stepsize
- 12-param priors (matches calibration exactly)

CalLMIP sims (callmip_model_interface.jl, run_callmip_simulations.jl):
- RAI = FT(1.5)
- 12-param Alexis priors

New scripts:
- submit_calibrate_then_ces.sh: chains calibration -> emulate_sample via --dependency=afterok
- plot_calibration_check.jl: seasonality diagnostic plot (prior vs posterior)
- run_alexis_test.jl / slurm_alexis_test.sh: forward run with Alexis posterior

Known open issue:
- emulate_sample.jl MCMC (pCN-MH) fails: pCN beta -> 1e-95, acceptance 0.0005
  Root cause: 1257-dim obs vector creates extremely peaked GP posterior
  Questions documented in ISSUES_AND_QUESTIONS.md
…submission

- Add priors.jl (16-param prior definitions, used by calibration + CES + CalLMIP sims)
- Add plot_multiyear_check.jl (multi-year EKI posterior stability check)
- Add evaluate_calibration.jl + slurm_postprocess.sh (full skill scores + figures)
- Add plot_calibration_improvement.jl, plot_posterior_distributions.jl,
  plot_callmip_netcdf.py, calibration_rmse_convergence.jl (diagnostic scripts)
- Add CALLMIP_Phase1a_Report.md (complete Phase 1a results, steps 1-6)
- Add CALLMIP_Phase1a_ModelDescription.md (paper section 12)
- Add CES_Phase2_Issues_for_OLLie.md (GP emulator failure diagnosis)
- Update emulate_sample.jl to CES v1 encoder_schedule API
- Update write_callmip_netcdf.jl: fix JLD2 sub-dict lookup, column_integral,
  empty diagnostic guard, blowup masking, Val_Temporal naming
- Update run_callmip_simulations.jl, callmip_model_interface.jl: 16 params
- Expand priors to 16 parameters (+ root/stem N ratios, soilCO2_activation_energy)

CalLMIP Phase 1a results (DK-Sor, 2003-2014):
  NEE RMSE: 3.053 -> 2.228 gC/m2/d (-27% cal), -35% val, bias -84%
  Qle RMSE: 38.9 -> 29.8 W/m2 (-23% cal)
  4 NetCDF files submitted to modelevaluation.org
@braghiere braghiere force-pushed the rb/callmip_phase1a branch from ddaab69 to 4df8db9 Compare April 9, 2026 22:58
Copy link
Copy Markdown
Member

@odunbar odunbar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Very happy to see the pipeline go through with the new dim reduction and kwargs. Emulator looks fine

  • I recommended a couple more checks for validation
  • Despite Claude saying it failed, I think the experiment has actually worked. There is no constraint that the MCMC needs to line up with UKI optimum, this is just the principle for selecting training points so it helps us get a good emulator if they remain consistent. I think it is worth noting as well that we are only plotting the 1D marginals of a high dimensional correlated distribution which only tell us part of the story.

My initial impression, just from the UKI, is that this experiment illustrates that a couple of our priors are a bit off, I have a suggestion for a follow up validation/calibration items that might help with this.

# Per-output-dimension RMSE (in the decorrelated/reduced space)
rmse_per_dim_train = sqrt.(mean((y_true_train .- y_mean_train) .^ 2; dims = 2))
rmse_per_dim_test = sqrt.(mean((y_true_test .- y_mean_test) .^ 2; dims = 2))

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would also be useful to look at the y_var_train/y_var_test (it is also a covariance, so maybe "cov" is more suitable name). In the transform_to_real = true configuration, they should be representing the coupled uncertainty of

y_cov = obs_noise_cov + gp_uncertainty_in_prediction

And if things are working OK you would expect near where there is a good amount of data, the y_var to be close to the obs_noise_cov. if certain parameter regimes have very large y_var then we can look at where these are (i would expect somewhere far from the posterior). The reason i mention this, is that the y_var is also used in the sampling stage not just the y_mean.


function build_dk_sor_priors()
priors_vec = [
# ── Canopy / conductance ─────────────────────────────────────────────
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how confident are we on these priors?
In particular moisture_stress_c and root_leaf_nitrogen_ratio, stem_leaf_nitrogen_ratio etc.

I ask because the EKI optimum is very far from the prior mass. ( In results at 4/10) In a fully Bayesian setting we fully trust the prior, and so this would indicate it is a very rare result. In the case that we don't, we may find it useful to run an experiment that breaks the Bayesian setting slightly.

I wonder if recentering the prior at the current EKI optimum (though keeping a large spread) and rerunning the calibration whether these numbers would be more stable and well sampled. It would be nice to finish the calibration with the parameter value not too far out in the tail of the prior.

This also leads to better sampling and emulator UQ.

rmse_path = joinpath(out_dir,
"emulator_validation_rmse_its$(first(train_iterations))to$(last(train_iterations)).png")
savefig(p_rmse, rmse_path)
@info "Emulator per-dimension RMSE plot → $rmse_path"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a stretch goal, I think it would be great to take a couple of representative parameter values that perform for example (best/median/worst) and plot the statistics in physical space against the true parameters I'm not sure what variables we are taking, but i think the physicists/land scientists would like something more interpretable to validate than in the decorrelated space (even though mathematically this is probably more correct)

)
display(chain)

posterior = MarkovChainMonteCarlo.get_posterior(mcmc, chain)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be interested in seeing whether the posterior is sampling in a regime that is good / bad for the emulator. I propose the following test:

  • Take a bunch of the posterior samples
  • Use them like the y_var test set above to evaluate if the emulator was in a regime where y_var is very large, or is it close to obs_noise_cov. (comparing against the test data from before). If large then we could deduce that the MCMC is not sampling in a region where the emulator is most accurate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants