Conversation
odunbar
left a comment
There was a problem hiding this comment.
I've only looked at
emulate_sample.jl - Looks like a great start!
My main comment is that
- I noticed that all the kwargs for building emulators are deprecated. are you using CES v1? If so, then you need to do something that matches the online docs (
Emulator) page, or the template I provided - There is very little emulator validation, beyond MSE. I think it is critical that we have sufficient plotting etc of emulator predictions to validate how well it is doing. Bad emulators will lead to MCMC non-convergence and problematic posteriors. I made some small note of this here
|
just a note - RAI is actually often larger than LAI, from what I see in the literature |
[skip ci]
Critical bug fixes for DK-Sor CalLMIP Phase 1a calibration: Bug fixes (model): - RAI = FT(1.5) (was 17.5, 10x too large → Ra_root >> GPP → NEE≈0 always) - LAI: Copernicus 'LAI' (was MODIS 'LAI_alternative', 2x lower peak) - SOC consumption tendency added to DAMM biogeochemistry for carbon conservation - LandSimulation type param relaxed: I (was I <: DEIntegrator) Calibration (run_calibration.jl, slurm_calibration.sh): - DT = 900 s (was 450) - 12-param Alexis priors, no AR params (was 14-param) - ntasks = 25 (= 2*12+1 UKI sigma points) - Removed Pkg.update() from slurm script (saves ~3.5h) CES pipeline (emulate_sample.jl): - find_latest_ekp_path: picks up latest calibration iteration - N_train = 8 (10-iter calibration -> max usable = 8) - init_stepsize = 1e-3, max_iter = 500 for pCN optimize_stepsize - 12-param priors (matches calibration exactly) CalLMIP sims (callmip_model_interface.jl, run_callmip_simulations.jl): - RAI = FT(1.5) - 12-param Alexis priors New scripts: - submit_calibrate_then_ces.sh: chains calibration -> emulate_sample via --dependency=afterok - plot_calibration_check.jl: seasonality diagnostic plot (prior vs posterior) - run_alexis_test.jl / slurm_alexis_test.sh: forward run with Alexis posterior Known open issue: - emulate_sample.jl MCMC (pCN-MH) fails: pCN beta -> 1e-95, acceptance 0.0005 Root cause: 1257-dim obs vector creates extremely peaked GP posterior Questions documented in ISSUES_AND_QUESTIONS.md
…submission - Add priors.jl (16-param prior definitions, used by calibration + CES + CalLMIP sims) - Add plot_multiyear_check.jl (multi-year EKI posterior stability check) - Add evaluate_calibration.jl + slurm_postprocess.sh (full skill scores + figures) - Add plot_calibration_improvement.jl, plot_posterior_distributions.jl, plot_callmip_netcdf.py, calibration_rmse_convergence.jl (diagnostic scripts) - Add CALLMIP_Phase1a_Report.md (complete Phase 1a results, steps 1-6) - Add CALLMIP_Phase1a_ModelDescription.md (paper section 12) - Add CES_Phase2_Issues_for_OLLie.md (GP emulator failure diagnosis) - Update emulate_sample.jl to CES v1 encoder_schedule API - Update write_callmip_netcdf.jl: fix JLD2 sub-dict lookup, column_integral, empty diagnostic guard, blowup masking, Val_Temporal naming - Update run_callmip_simulations.jl, callmip_model_interface.jl: 16 params - Expand priors to 16 parameters (+ root/stem N ratios, soilCO2_activation_energy) CalLMIP Phase 1a results (DK-Sor, 2003-2014): NEE RMSE: 3.053 -> 2.228 gC/m2/d (-27% cal), -35% val, bias -84% Qle RMSE: 38.9 -> 29.8 W/m2 (-23% cal) 4 NetCDF files submitted to modelevaluation.org
ddaab69 to
4df8db9
Compare
odunbar
left a comment
There was a problem hiding this comment.
Looks great! Very happy to see the pipeline go through with the new dim reduction and kwargs. Emulator looks fine
- I recommended a couple more checks for validation
- Despite Claude saying it failed, I think the experiment has actually worked. There is no constraint that the MCMC needs to line up with UKI optimum, this is just the principle for selecting training points so it helps us get a good emulator if they remain consistent. I think it is worth noting as well that we are only plotting the 1D marginals of a high dimensional correlated distribution which only tell us part of the story.
My initial impression, just from the UKI, is that this experiment illustrates that a couple of our priors are a bit off, I have a suggestion for a follow up validation/calibration items that might help with this.
| # Per-output-dimension RMSE (in the decorrelated/reduced space) | ||
| rmse_per_dim_train = sqrt.(mean((y_true_train .- y_mean_train) .^ 2; dims = 2)) | ||
| rmse_per_dim_test = sqrt.(mean((y_true_test .- y_mean_test) .^ 2; dims = 2)) | ||
|
|
There was a problem hiding this comment.
I think it would also be useful to look at the y_var_train/y_var_test (it is also a covariance, so maybe "cov" is more suitable name). In the transform_to_real = true configuration, they should be representing the coupled uncertainty of
y_cov = obs_noise_cov + gp_uncertainty_in_prediction
And if things are working OK you would expect near where there is a good amount of data, the y_var to be close to the obs_noise_cov. if certain parameter regimes have very large y_var then we can look at where these are (i would expect somewhere far from the posterior). The reason i mention this, is that the y_var is also used in the sampling stage not just the y_mean.
|
|
||
| function build_dk_sor_priors() | ||
| priors_vec = [ | ||
| # ── Canopy / conductance ───────────────────────────────────────────── |
There was a problem hiding this comment.
how confident are we on these priors?
In particular moisture_stress_c and root_leaf_nitrogen_ratio, stem_leaf_nitrogen_ratio etc.
I ask because the EKI optimum is very far from the prior mass. ( In results at 4/10) In a fully Bayesian setting we fully trust the prior, and so this would indicate it is a very rare result. In the case that we don't, we may find it useful to run an experiment that breaks the Bayesian setting slightly.
I wonder if recentering the prior at the current EKI optimum (though keeping a large spread) and rerunning the calibration whether these numbers would be more stable and well sampled. It would be nice to finish the calibration with the parameter value not too far out in the tail of the prior.
This also leads to better sampling and emulator UQ.
| rmse_path = joinpath(out_dir, | ||
| "emulator_validation_rmse_its$(first(train_iterations))to$(last(train_iterations)).png") | ||
| savefig(p_rmse, rmse_path) | ||
| @info "Emulator per-dimension RMSE plot → $rmse_path" |
There was a problem hiding this comment.
As a stretch goal, I think it would be great to take a couple of representative parameter values that perform for example (best/median/worst) and plot the statistics in physical space against the true parameters I'm not sure what variables we are taking, but i think the physicists/land scientists would like something more interpretable to validate than in the decorrelated space (even though mathematically this is probably more correct)
| ) | ||
| display(chain) | ||
|
|
||
| posterior = MarkovChainMonteCarlo.get_posterior(mcmc, chain) |
There was a problem hiding this comment.
I would be interested in seeing whether the posterior is sampling in a regime that is good / bad for the emulator. I propose the following test:
- Take a bunch of the posterior samples
- Use them like the
y_vartest set above to evaluate if the emulator was in a regime wherey_varis very large, or is it close toobs_noise_cov. (comparing against the test data from before). If large then we could deduce that the MCMC is not sampling in a region where the emulator is most accurate.
Purpose
CalLMIP Phase 1a submission for the DK-Sor site (Sorø, Denmark — temperate beech forest).
16-parameter EKI calibration of ClimaLand, followed by CES uncertainty quantification,
producing 4 NetCDF files submitted to modelevaluation.org.
Result: NEE RMSE −27% cal / −35% val vs prior. Bias reduced −84%.
4 ALMA-compliant NetCDF files submitted. CES emulator failure diagnosed and documented
for Phase 1b — see
CES_Phase2_Issues_for_OLLie.md.Pipeline
Step 0 — Build observation vector ~2 min
calibrate_dk_sor/generate_observations.jl (slurm_gen_observations.sh)
→ calibrate_dk_sor/observations.jld2
Step 1 — EKI calibration (33 members × 10 iterations) ~9 h
calibrate_dk_sor/run_calibration.jl (slurm_calibration.sh)
uses: model_interface.jl | priors.jl 1 node · 24 cores · ~192 GB
→ calibrate_dk_sor/output/iteration_NNN/
Step 2 — CES: train GP emulator + run MCMC posterior ~3 h
callmip_uq_dk_sor/emulate_sample.jl (slurm_emulate_sample.sh)
uses: ../calibrate_dk_sor/priors.jl 1 node · 8 CPUs · 64 GB
→ output_posterior_uq/ GP on 264 pts; 100k MCMC steps
Step 3 — Run 50-member posterior ensemble + analyze ~2–3 h
callmip_uq_dk_sor/run_posterior_ensemble.jl (slurm_posterior_ensemble.sh)
callmip_uq_dk_sor/analyze_posterior_ensemble.jl
→ output_posterior_ensemble/, output_posterior_analysis/
Step 4 — Run CalLMIP simulations (prior + EKI-optimal) ~73 min
callmip_uq_dk_sor/run_callmip_simulations.jl (slurm_callmip_simulations.sh)
uses: callmip_model_interface.jl
uses: ../calibrate_dk_sor/prior_mean_parameters.toml (member 001)
uses: ../calibrate_dk_sor/priors.jl (member 002)
→ output_callmip_sims/
Step 5 — Write NetCDF + evaluate ~10 min
callmip_uq_dk_sor/write_callmip_netcdf.jl (slurm_postprocess.sh)
callmip_uq_dk_sor/evaluate_calibration.jl
→ callmip_output/*.nc, output_evaluation/
Total: ~16 h wall · ~400 CPU-h
✅ Phase 1a result: EKI calibration works
EKI converges in 9 iterations. NEE RMSE drops 4 orders of magnitude from the
blown-up initial ensemble to iter 9.
✅ What the calibration learned
The three new respiration parameters —
root_leaf_nitrogen_ratio,stem_leaf_nitrogen_ratio(autotrophic) andsoilCO2_activation_energy(heterotrophic/DAMM Arrhenius) — are the dominant signal, both strongly shifted
and agreed upon by EKI and MCMC.
canopy_K_lwandcanopy_z_0m_coeffaretightly data-informed.
moisture_stress_c,leaf_Cd, both Michaelis constants,and
soilCO2_pre_exponential_factorremain underdetermined from NEE+Qle+Qh alone.Both prior and EKI-optimal produce near-zero NEE in Oct–Mar. FLUXNET shows
+1.5 to +2.4 gC m⁻² d⁻¹ sustained CO₂ source. Despite three dedicated
respiration parameters, this seems to be a structural limitation, not
resolvable by parameter calibration alone.
[Annual means 2003–2013 — upload prior_vs_posterior_annual.png]
The GP emulator uses CES v1
encoder_scheduleAPI. Validation scatter looksreasonable in the bulk, but the MCMC posterior disagrees sharply with the EKI
optimum for
pmodel_β,moisture_stress_c, andcanopy_z_0m_coeff.Root cause: EKI collapsed tightly to
pmodel_β ≈ 19→ all 264 GP trainingpoints cluster there → GP incorrectly extrapolates low cost near prior mean
pmodel_β ≈ 38→ MCMC settles near prior → near-zero GPP → flat NEE posterior.CES 90% bands cover only 31.5% of NEE observations (expected 90%).
The EKI-optimal member is the scientifically meaningful Phase 1a result.
Full diagnosis + proposed fixes:
CES_Phase2_Issues_for_OLLie.md.Submitted files
callmip_output/
ClimaLand.CalLMIP1.0_Expt1_DK-Sor_Cal_Prior.nc
ClimaLand.CalLMIP1.0_Expt1_DK-Sor_Cal_Posterior.nc
ClimaLand.CalLMIP1.0_Expt1_DK-Sor_Val_Temporal_Prior.nc
ClimaLand.CalLMIP1.0_Expt1_DK-Sor_Val_Temporal_Posterior.nc
13 ALMA variables. 3 all-NaN (ESoil, Qg, TotAbovBioMass — not in current model).
56 days masked in EKI-optimal member (numerical blowup, cold-season transitions).
Phase 1b targets