Add Model 2a: same-parameter cross-biomarker covariance as a JAGS extension of model.jags (strictly nests Chapter 1) + Chapter-1 comparison#245
Conversation
…AGS extension of model.jags Add Model 2a, an additive extension of model.jags that introduces same-parameter cross-biomarker covariance via a shared per-parameter latent factor, leaving every Chapter 1 within-biomarker block unchanged. Chapter 1 is recovered exactly when the loadings are zero (strict nesting). - model: inst/extdata/model_2a.jags - fit: run_mod_2a() with jags_data_2a, make_inits_2a, prep_priors_2a, add_factor_priors; reuses prep_data() unchanged - cross-biomarker summaries: summarize_cross_2a, cross_cov_from_loadings, marginal_var_2a, cross_cor_from_draw_2a (+ internal jags_node_utils) - comparison vs Chapter 1: compare_mod_2a, fit_chapter1_lean, summarize_curve_params_2a - simulation/validation: sim_params_2a, sim_case_data_2a, build_sigma_2a, validate_recovery_2a, validate_nesting_2a - tests: 7 pure-function unit tests + 1 JAGS-gated smoke test - example: run_mod_2a-examples.R (one fit on nepal_sees) No new dependencies; no Chapter 1 code modified.
This comment has been minimized.
This comment has been minimized.
Merge remote-tracking branch 'origin/main' into chapter2-extend # Conflicts: # DESCRIPTION # inst/WORDLIST
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
📖 https://ucd-serg.github.io/serodynamics/preview/pr245 |
Tooling/style only; model and functions unchanged. stop()->cli_abort, line/indent/semicolon fixes, object_name_linter exemption for *_2a files, sim_params tolerance 0.05, suppressWarnings on JAGS smoke tests,
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
@claude Please fix the remaining open review comments 1–3 on branch
The corrected join should follow this structure:
Use this single-line version:
Please only make these targeted changes for comments 1–3. |
|
Claude finished @Kwan-Jenny's task in 1m 8s —— View job Tasks
SummaryAll three targeted fixes applied and pushed as commit
Note on fix 1I kept |
- compare_mod_2a(): use join_by() and set relationship = "one-to-one" on the shared-parameter left_join (CLAUDE.md requirement). - jags_data_2a(): flatten the multi-biomarker warning to a single line so cli does not render an embedded newline plus indentation. - sim_case_data_2a(): drop the hardcoded n * 2 pre-allocation and for (k in 1:2) loop so the simulator honours length(biomarkers). Co-authored-by: Kwan-Jenny <Kwan-Jenny@users.noreply.github.com>
|
Claude finished @Kwan-Jenny's task in 5m 27s —— View job Code Review: Model 2a PR #245 — after commits
|
| Item | File | Status |
|---|---|---|
left_join() missing relationship |
compare_mod_2a.R:60-66 |
✓ Fixed (join_by() + relationship = "one-to-one" + # nolint for linter) |
sim_case_data_2a hardcoded 1:2 |
sim_case_data_2a.R:37,40 |
✓ Fixed (length(biomarkers) / seq_along) |
@returns doc on run_mod_2a |
run_mod_2a.R |
✓ Present |
stop() → cli::cli_abort() |
all four files | ✓ Done |
base::merge() → dplyr::left_join() |
compare_mod_2a.R |
✓ Done |
fit_chapter1_lean memory footprint |
fit_chapter1_lean.R |
✓ Lean (monitors mu.par, prec.par only) |
Positional probs row indexing |
summarize_cross_2a.R:72-74 |
Still open — low priority (unchanged) |
| CLI warning embedded newline | jags_data_2a.R:29-30 |
⚠ Re-opened (see above) |
JAGS model and math ✓
inst/extdata/model_2a.jags is unchanged and correct:
- Factor construction
par = mu.par + w + lambda * fis sound - Identifiability constraint
lambda[1,p] T(0,)is correct lambda[k,p] ~ dnorm(0, prec.lambda)fork >= 2leaves the sign ofc_pfree vialambda[2,p]w ~ dmnorm(zero_p, prec.par)with the Chapter 1 Wishart prior is unchanged- Setting all
lambda = 0reduces to Chapter 1 exactly
The marginal variance algebra in marginal_var_2a(), the cross-covariance formula in cross_cov_from_loadings(), and the correlation in cross_cor_from_draw_2a() are all mathematically correct.
Tests ✓
Unit tests cover:
build_sigma_2a— block placement, Chapter 1 nesting, PD rejectioncross_cov_from_loadings,marginal_var_2a,cross_cor_from_draw_2a— algebraic correctnesssummarize_cross_2a— end-to-end with synthetic fixed drawssim_params_2a— large-n covariance recoverysim_case_data_2a— shape, biomarker labels, reproducibilitysummarize_curve_params_2a— means, variances (with and without loadings)make_inits_2a,add_factor_priors— structure and validation
validate_recovery_2a and validate_nesting_2a have 0% CI coverage by design (full-MCMC; documented as locally-run). The 0% lines in cross_cov_from_loadings.R are the error branch (nrow < 2), which is acceptable.
Summary
1 item to resolve before merge: fix the re-introduced embedded newline in jags_data_2a.R:29-30 by shortening the warning message to fit within 80 characters on one line. Everything else — model math, dplyr idioms, documentation, tests, and the three items fixed in b3ba634 — is clean.
· Branch
|
@claude The only CI failure now is an R CMD check NOTE, plus one re-opened review item. Please make these two minimal changes and KEEP them — do not re-introduce join_by() and do not split the CLI string. Then run
Remove the now-unnecessary
Keep it on a single line; do not add a line break inside the string. If it is still >80 with the current indentation, shorten the wording further (keep the meaning) rather than splitting the line. |
|
Claude finished @Kwan-Jenny's task in 12m 38s —— View job Claude finished @Kwan-Jenny's task — · Branch Tasks
ResultPushed as commit 1) shared <- dplyr::left_join(
s1, s2,
by = c("biomarker", "param"), # character `by`: no NSE, so this avoids
relationship = "one-to-one", # the R CMD check no-visible-binding NOTE
suffix = c("_ch1", "_2a")
)2) "i" = "found {n_bio} biomarkers; rank-1 factor couples all, subset to 2."Verification
Notes on what I did not run
|
- compare_mod_2a.R: switch dplyr::left_join back to character `by` so R CMD check's codetools no longer flags `biomarker`/`param` as undefined globals. Keep `relationship = "one-to-one"` and drop the obsolete `# nolint` (lintr no longer needs to be silenced). - jags_data_2a.R: replace the two-line "i" cli message with a single line under 80 chars, so cli no longer renders a literal mid-sentence indent and line_length_linter stays clean. Co-Authored-By: Kwan-Jenny <Kwan-Jenny@users.noreply.github.com>
|
@claude review |
|
Claude finished @Kwan-Jenny's task in 2m 53s —— View job Code Review: Model 2a PR #245 — after commits
|
| Item | File | Status |
|---|---|---|
left_join() missing relationship |
compare_mod_2a.R:60-65 |
✓ Fixed (character by, relationship = "one-to-one", # nolint removed) |
sim_case_data_2a hardcoded 1:2 |
sim_case_data_2a.R:37,40 |
✓ Fixed (length(biomarkers) / seq_along(biomarkers)) |
| CLI multi-line warning | jags_data_2a.R:29 |
✓ Fixed (79 chars on one line, including indent) |
@returns doc on run_mod_2a |
run_mod_2a.R |
✓ Present |
stop() → cli::cli_abort() |
all four files | ✓ Done |
base::merge() → dplyr::left_join() |
compare_mod_2a.R |
✓ Done |
fit_chapter1_lean memory footprint |
fit_chapter1_lean.R |
✓ Lean (monitors mu.par, prec.par only) |
The model file inst/extdata/model_2a.jags, the factor construction (par = mu + w + lambda * f), the identifiability constraint lambda[1, p] T(0,), and the marginal-variance algebra in marginal_var_2a() / cross_cov_from_loadings() / cross_cor_from_draw_2a() are all mathematically correct. Setting lambda = 0 reduces to Chapter 1 exactly. ✓
New findings
1. sim_case_data_2a.R:33-41 — latent inconsistency between the (now-generalized) loop and the 2-biomarker sim_params_2a() (minor — correctness)
Fix 3 generalized the outer loop to for (k in seq_along(biomarkers)), but sim_params_2a() still hardcodes two biomarkers (it builds mu = c(mu_g, mu_a) and returns an n × 2P log_par). So a caller who passes biomarkers = c("A", "B", "C") will hit truth$log_par[i, ((k-1)*p+1):(k*p)] with k = 3 → columns 11:15 of a 10-column matrix → subscript-out-of-bounds. The docs (@param biomarkers Length-2 character) still say length 2, so this is a latent assumption, not a current crash — but the loop now lies about its own generality. Cheapest fix is an early assert near the top:
if (length(biomarkers) != 2L) {
cli::cli_abort("{.arg biomarkers} must have length 2 (Model 2a is a pair).")
}Alternatively, generalize sim_params_2a() to take a list of mean vectors. The targeted fix-3 instruction was the right local change; an assert would make the constraint explicit. Fix this →
2. summarize_cross_2a.R:39-41, 67-74 — probs is row-indexed positionally, not by sorted probability (minor — silent-wrong-result risk)
probs is user-facing (defaulting to c(0.025, 0.5, 0.975)) and only validated for length 3. apply(..., quantile, probs = probs) returns rows in the order of probs, then the code reads cov_q[1, ] as cov_lo, cov_q[2, ] as cov_med, cov_q[3, ] as cov_hi. A call like summarize_cross_2a(fit$mcmc, probs = c(0.5, 0.025, 0.975)) populates cov_med with the lower quantile with no error. One-line fix right after the length check:
probs <- sort(probs)3. compare_mod_2a.R:74-79 — try(..., silent = TRUE) returns a "try-error" object, not NULL (minor — return-type consistency)
The roxygen at lines 40 says "dic_ch1, dic_2a: raw DIC objects when dic = TRUE (else NULL)". But on failure try() returns a "try-error" object, so dic_ch1 ends up as a try-error rather than NULL. Use tryCatch(..., error = function(e) NULL) if the documented contract ("NULL or DIC object") is the intended behavior. Fix this →
4. compare_mod_2a.R:43-55 — ... is forwarded to both fits, which leaks Model-2a-only args (minor — API design)
... is forwarded to both fit_chapter1_lean(...) (which forwards to prep_priors()) and run_mod_2a(...) (which has extra_monitors, prec_lambda, plus its own ...). prec_lambda is named explicitly, fine. But extra_monitors = "y0" via ... would route to prep_priors() on the Chapter 1 side and either silently drop or error. The "shared = Chapter 1 prior args only" constraint isn't documented anywhere. Suggest either documenting it explicitly in @param ..., or pulling out a monitors_2a parameter.
5. CLAUDE.md guidance now contradicts the dplyr join in compare_mod_2a.R (documentation only, out of scope for this PR)
CLAUDE.md says "Prefer dplyr::left_join() with by = join_by(...)". The fix here intentionally uses character by to avoid the R CMD check NOTE that the bare-symbol join_by() produces. Both forms work; the maintainer's preference is documented at the call site via the inline comment. A short note in CLAUDE.md (when a join's bywould otherwise trigger a no-visible-binding NOTE for the bare column names, characterby = c(...) is the accepted form) would close the loop for future reviewers and prevent a fourth round-trip.
Tests, lint, NAMESPACE — clean
NAMESPACEexports all 17 new Model 2a functions plus existinginitsfunction..lintr.Rexempts the_2afiles fromobject_name_linteronly (correct narrow scope; the2asuffix can't satisfy snake_case ACRO regex).- Unit tests cover the pure algebra (
build_sigma_2a,cross_cov_from_loadings,marginal_var_2a,cross_cor_from_draw_2a), the extraction (summarize_cross_2a,summarize_curve_params_2a) with synthetic constant draws, structure checks (make_inits_2a,add_factor_priors,sim_case_data_2a), large-n covariance recovery (sim_params_2a), and a tiny-MCMC smoke test forrun_mod_2a+compare_mod_2a.validate_recovery_2a/validate_nesting_2a0% coverage is by design.
Summary
The strict-nesting story holds and the implementation is mathematically sound. Three small things worth fixing before merge:
- Either add a
length(biomarkers) == 2Lassert insim_case_data_2aor document the constraint clearly (R/sim_case_data_2a.R:33). probs <- sort(probs)insummarize_cross_2aso positional row indexing is safe (R/summarize_cross_2a.R:41).- Swap
try(..., silent = TRUE)fortryCatch(..., error = function(e) NULL)to honor the documentedNULL-on-failure return type incompare_mod_2a(R/compare_mod_2a.R:74-79).
Item 4 (... forwarding) and item 5 (CLAUDE.md note about character by) are documentation-only nudges.
· Branch
Summary
Adds Model 2a, an additive extension of
model.jagsthat introduces same-parameter cross-biomarker covariance into the subject-level antibody kinetic parameters while leaving every Chapter 1 within-biomarker block unchanged. Chapter 1 is recovered exactly when the new covariances are zero, so Model 2a strictly nests the current model. It also addscompare_mod_2a(), a head-to-head of Chapter 1 vs Model 2a on the same data.Nothing in the existing Chapter 1 pipeline is modified —
model.jags,prep_data(),prep_priors(),run_mod()are untouched; all code is new.What Model 2a is
Chapter 1 covariance is biomarker-separated (cross-biomarker = 0):
Model 2a keeps
Sigma_G,Sigma_Aand adds a diagonal cross-biomarkerblock
C = diag(c_1..c_P)(same-parameter only):c_pis the IgG~IgA covariance for the same kinetic parameterp; cross-parameter cross-biomarker terms stay 0;C = 0reproduces Chapter 1 (35 covariance parameters vs 30).Implementation (factor parameterization)
A Wishart precision cannot pin a sparse covariance's zero-pattern, so the exact pattern is generated with a shared latent factor per kinetic parameter:
For two biomarkers this gives
c_p = lambda[1,p] * lambda[2,p](derived from the loadings — this is a model quantity, independent of the sampler; it is the samec_pwhether implemented in JAGS or Stan), cross-parameter cross-biomarkercovariance 0, and — with all
lambda = 0— exactlymodel.jags. The first biomarker's loadings are constrained> 0for identifiability; the sign ofc_pstays free vialambda[2,p]. The model is general inn_antigen_isos: with 2 biomarkers it is exactly Model 2a; with more, the single per-parameter factor couples all of them (rank-1, same-parameter).inst/extdata/model_2a.jagsis identical tomodel.jagsexcept thisparconstruction and the loading priors.What's added
inst/extdata/model_2a.jags— the extended model.add_factor_priors(),prep_priors_2a()(reuse Chapter 1 priors).jags_data_2a(),make_inits_2a(),run_mod_2a()— a lean wrapper that reusesprep_data()unchanged and returns the MCMC plus a tidy cross-biomarker covariance/correlation summary (nosr_modelpost-processing).cross_cov_from_loadings(),marginal_var_2a(),cross_cor_from_draw_2a(),summarize_cross_2a(), shared-internaljags_node_utils_2a.R.build_sigma_2a(),sim_params_2a(),sim_case_data_2a(),validate_recovery_2a(),validate_nesting_2a().fit_chapter1_lean()(same posterior asrun_mod()),summarize_curve_params_2a(),compare_mod_2a().No new dependencies (all in Imports already); no new data (uses
nepal_sees).Chapter 1 vs Model 2a — what changes
compare_mod_2a()fits both models on the same data and reports:mu.parand the within-biomarker variances. Because Model 2a strictly nests Chapter 1, these should agree within MCMC error; a large difference would signal a problem, notan improvement.
c_p/ correlationrho_pper kinetic parameter, which Chapter 1 cannot represent (it is structurally 0 there).compare_mod_2a(dic = TRUE)offers a best-effort DIC as a first signal.Comparison results on
nepal_sees(full-length MCMC, run locally):(to be filled in from the local comparison run: convergence (max R-hat), the shared-parameter table with max |diff|, the cross-biomarker
c_p/rho_ptable and which CIs exclude 0, and themu.parCI-width summary)Verification
run_mod_2a()/compare_mod_2a()execute and return the expected structure (matching the existingtest-run_mod.Rstyle; CI installs JAGS).nepal_sees, are run locally at full MCMC length (validate_recovery_2a(),validate_nesting_2a(),compare_mod_2a()); results pasted above.How to try it
Scope
This PR is the "add the model + show it nests Chapter 1 + compare to Chapter 1" increment. Quantifying improvement (WAIC/LOO, joint posterior-predictive comparison, seroincidence coverage/width) is a follow-up, since WAIC needs a
model-file change and the seroincidence step needs the serocalculator integration.