Fix #588: Support left truncation via delay_min parameter#596
Fix #588: Support left truncation via delay_min parameter#596seabbs-bot wants to merge 6 commits intoepinowcast:mainfrom
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
WalkthroughThe changes implement left-truncation support via a new Changes
Sequence DiagramsequenceDiagram
participant User
participant MarginModel as Marginal Model<br/>(R/marginal_model.R)
participant DataPrep as Data Preparation<br/>(utils.R, linelist_data.R)
participant RLikelihood as R Likelihood<br/>(gen.R)
participant Stan as Stan Functions
participant PCens as primarycensored
User->>MarginModel: as_epidist_marginal_model(..., delay_min=?)
MarginModel->>DataPrep: .add_delay_min(data, delay_min)
DataPrep->>DataPrep: Resolve delay_min<br/>(NULL/scalar/column)
DataPrep-->>MarginModel: data with delay_min
MarginModel->>MarginModel: Validate delay_lwr >= delay_min
MarginModel->>Stan: Pass delay_min → vreal5
Note over RLikelihood,PCens: During likelihood evaluation
RLikelihood->>RLikelihood: Extract prep$data$vreal5<br/>→ delay_min
RLikelihood->>PCens: dpcens(..., L=delay_min)
PCens-->>RLikelihood: log probability with<br/>left truncation
Note over Stan,PCens: During Stan execution
Stan->>Stan: marginal_family_lpmf<br/>(..., delay_min)
Stan->>PCens: primarycensored_lpmf<br/>(..., L=delay_min)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
eab8e0b to
4519a77
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (5)
tests/testthat/test-aggregate_data.R (1)
146-155: Strengthen this test with non-constantdelay_minvalues.Current assertions pass even if
delay_minwere accidentally not used as a grouping key. A mixed-value fixture would better catch regressions.Suggested enhancement
test_that( "as_epidist_aggregate_data.epidist_linelist_data preserves delay_min", # nolint { data_with_min <- sim_obs - data_with_min$delay_min <- 1 + data_with_min$delay_min <- rep(c(0, 1), length.out = nrow(data_with_min)) agg <- as_epidist_aggregate_data(data_with_min) expect_true("delay_min" %in% names(agg)) - expect_true(all(agg$delay_min == 1)) + expect_setequal(unique(agg$delay_min), c(0, 1)) } )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/testthat/test-aggregate_data.R` around lines 146 - 155, The test uses a constant delay_min so it won't detect regressions where delay_min isn't treated as a grouping key; change the fixture (data_with_min derived from sim_obs) to assign non-constant delay_min values (e.g., a repeating or varied vector) across rows, call as_epidist_aggregate_data.epidist_linelist_data (as_epidist_aggregate_data) to produce agg, and replace the simple all-equals assertion with checks that agg contains the expected distinct delay_min values and that rows with different delay_min in the input map to separate groups in agg (use names: data_with_min, sim_obs, agg, and the function as_epidist_aggregate_data.epidist_linelist_data to locate the code).tests/testthat/test-gen.R (1)
54-55: Add one non-zerodelay_mincase in posterior draw tests.Nice update for default behavior. To better protect the new left-truncation path, add at least one case with
delay_min > 0and assert generated values respect that lower bound (e.g.,min(pred) >= delay_min).Also applies to: 114-114
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/testthat/test-gen.R` around lines 54 - 55, Add a posterior-draw test case that exercises the left-truncation path by passing a non-zero delay_min (e.g., delay_min = 1 or >0) along with the existing parameters (relative_obs_time, pwindow, swindow, delay_upr) and assert that the generated predictions satisfy the lower bound (use an assertion like checking min(pred) >= delay_min). Locate the posterior draw invocation in the test file (the block using relative_obs_time, pwindow, swindow, delay_upr, delay_min) and duplicate/extend it with a non-zero delay_min scenario and the corresponding assertion to ensure values respect the lower bound.R/aggregate_data.R (1)
178-187: Deduplicategroup_varsbefore aggregation.Now that optional columns are auto-included,
bycan re-add the same variables (e.g.,delay_min). Aunique()pass keeps grouping intent explicit and avoids duplicate selectors.♻️ Proposed patch
# Auto-include optional columns that exist in the data optional_cols <- intersect(.linelist_optional_cols(), names(data)) group_vars <- c(group_vars, optional_cols) # Combine required variables with user-specified ones if (!is.null(by)) { assert_character(by) assert_names(names(data), must.include = by) group_vars <- c(group_vars, by) } + group_vars <- unique(group_vars)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@R/aggregate_data.R` around lines 178 - 187, The group_vars vector can get duplicate entries because optional columns from .linelist_optional_cols() and the user-provided by may overlap (e.g., delay_min); update the code that builds group_vars (the variable group_vars in aggregate_data.R where optional_cols are appended and where by is added) to deduplicate by wrapping the combined vector with unique() (i.e., set group_vars <- unique(group_vars) after modifications) so subsequent aggregation/grouping uses a clean list of grouping variables.tests/testthat/test-marginal_model.R (1)
57-76: Strengthen column/inheritance tests with non-default values.At Line 61 and Line 73, using only
0can mask regressions because0is also the default fallback. Prefer non-default (and ideally varying) values plus exact equality checks.✅ Suggested test hardening
test_that( "as_epidist_marginal_model handles column name delay_min", { data_with_min <- sim_obs - data_with_min$my_min <- 0 + data_with_min$my_min <- rep_len(c(1, 2), nrow(data_with_min)) model <- as_epidist_marginal_model( data_with_min, delay_min = "my_min" ) - expect_true(all(model$delay_min == 0)) + expect_identical(model$delay_min, data_with_min$my_min) } ) test_that( "as_epidist_marginal_model inherits delay_min from data", { data_with_min <- sim_obs - data_with_min$delay_min <- 0 + data_with_min$delay_min <- rep_len(c(2, 3), nrow(data_with_min)) model <- as_epidist_marginal_model(data_with_min) - expect_true(all(model$delay_min == 0)) + expect_identical(model$delay_min, data_with_min$delay_min) } )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/testthat/test-marginal_model.R` around lines 57 - 76, The tests use the default fallback value 0 which can mask regressions; update both tests for as_epidist_marginal_model to set non-default and varying values (e.g., a vector like c(1,2,3) or -1/1) in the column (data_with_min$my_min and data_with_min$delay_min), call as_epidist_marginal_model with and without the delay_min argument, and replace loose checks (expect_true(all(... == 0))) with strict equality assertions (e.g., expect_equal(model$delay_min, <expected_vector>)) to verify exact inheritance and mapping.man/dot-add_delay_min.Rd (1)
12-21: Consider documenting the non-negative numeric constraint explicitly.The docs currently say “numeric scalar” but not
>= 0; adding that detail would match runtime validation and reduce ambiguity.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@man/dot-add_delay_min.Rd` around lines 12 - 21, Update the documentation for the delay_min argument to state that when provided as a numeric scalar it must be non-negative (>= 0); mention this constraint alongside the existing descriptions for NULL and character values so the resolved behavior of delay_min (NULL uses existing column or defaults to 0, numeric must be >= 0, character looks up the named column) is explicit; reference the delay_min argument in dot-add_delay_min.Rd and ensure the value section still indicates the data frame will include a delay_min column.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@man/epidist_gen_log_lik.Rd`:
- Around line 25-26: Update the documentation for the parameter vreal5 to
reflect that it is optional and defaults to 0 (the implementation fallback in
R/gen.R sets delay_min <- 0 when vreal5 is missing); change the description for
vreal5 from a required "minimum delay (left truncation point)" to indicate it is
optional with default 0 and reference that it maps to delay_min in the code
(R/gen.R).
In `@R/marginal_model.R`:
- Around line 64-70: The validation for delay_min in .add_delay_min (R/utils.R)
currently only tests is.character(delay_min) and allows vectors like
c("col_a","col_b") which later cause a base subsetting error when used as
data[[delay_min]]; update .add_delay_min to explicitly reject non-scalar
character inputs by checking is.character(delay_min) && length(delay_min)==1
(and similarly ensure any numeric input is length 1), and raise a clear error
message that delay_min must be NULL, a single numeric scalar, or a single column
name string; apply the same scalar-character validation to the analogous
handling block referenced around lines 89-105 so both spots provide consistent,
user-friendly errors.
In `@R/utils.R`:
- Around line 414-417: When handling the character branch for delay_min
(condition using is.character(delay_min)), validate that delay_min is a single
scalar name before indexing: check length(delay_min) == 1 and, if not, raise a
clear error (e.g., "delay_min must be a single column name") so we don't hit a
cryptic [[ indexing failure later; then continue to call
assert_names(names(data), must.include = delay_min) and assign data$delay_min <-
data[[delay_min]] as before. Ensure you reference the existing symbols
delay_min, assert_names, and data$delay_min in the change.
---
Nitpick comments:
In `@man/dot-add_delay_min.Rd`:
- Around line 12-21: Update the documentation for the delay_min argument to
state that when provided as a numeric scalar it must be non-negative (>= 0);
mention this constraint alongside the existing descriptions for NULL and
character values so the resolved behavior of delay_min (NULL uses existing
column or defaults to 0, numeric must be >= 0, character looks up the named
column) is explicit; reference the delay_min argument in dot-add_delay_min.Rd
and ensure the value section still indicates the data frame will include a
delay_min column.
In `@R/aggregate_data.R`:
- Around line 178-187: The group_vars vector can get duplicate entries because
optional columns from .linelist_optional_cols() and the user-provided by may
overlap (e.g., delay_min); update the code that builds group_vars (the variable
group_vars in aggregate_data.R where optional_cols are appended and where by is
added) to deduplicate by wrapping the combined vector with unique() (i.e., set
group_vars <- unique(group_vars) after modifications) so subsequent
aggregation/grouping uses a clean list of grouping variables.
In `@tests/testthat/test-aggregate_data.R`:
- Around line 146-155: The test uses a constant delay_min so it won't detect
regressions where delay_min isn't treated as a grouping key; change the fixture
(data_with_min derived from sim_obs) to assign non-constant delay_min values
(e.g., a repeating or varied vector) across rows, call
as_epidist_aggregate_data.epidist_linelist_data (as_epidist_aggregate_data) to
produce agg, and replace the simple all-equals assertion with checks that agg
contains the expected distinct delay_min values and that rows with different
delay_min in the input map to separate groups in agg (use names: data_with_min,
sim_obs, agg, and the function as_epidist_aggregate_data.epidist_linelist_data
to locate the code).
In `@tests/testthat/test-gen.R`:
- Around line 54-55: Add a posterior-draw test case that exercises the
left-truncation path by passing a non-zero delay_min (e.g., delay_min = 1 or >0)
along with the existing parameters (relative_obs_time, pwindow, swindow,
delay_upr) and assert that the generated predictions satisfy the lower bound
(use an assertion like checking min(pred) >= delay_min). Locate the posterior
draw invocation in the test file (the block using relative_obs_time, pwindow,
swindow, delay_upr, delay_min) and duplicate/extend it with a non-zero delay_min
scenario and the corresponding assertion to ensure values respect the lower
bound.
In `@tests/testthat/test-marginal_model.R`:
- Around line 57-76: The tests use the default fallback value 0 which can mask
regressions; update both tests for as_epidist_marginal_model to set non-default
and varying values (e.g., a vector like c(1,2,3) or -1/1) in the column
(data_with_min$my_min and data_with_min$delay_min), call
as_epidist_marginal_model with and without the delay_min argument, and replace
loose checks (expect_true(all(... == 0))) with strict equality assertions (e.g.,
expect_equal(model$delay_min, <expected_vector>)) to verify exact inheritance
and mapping.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 0a33974c-1786-4b0b-b90c-5e3ed808c47c
📒 Files selected for processing (28)
DESCRIPTIONNEWS.mdR/aggregate_data.RR/gen.RR/globals.RR/linelist_data.RR/marginal_model.RR/utils.Rinst/stan/marginal_model/functions.stanman/as_epidist_marginal_model.epidist_aggregate_data.Rdman/as_epidist_marginal_model.epidist_linelist_data.Rdman/as_epidist_naive_model.epidist_linelist_data.Rdman/dot-add_delay_min.Rdman/dot-add_dpar_info.Rdman/dot-add_weights.Rdman/dot-get_brms_fn.Rdman/epidist.Rdman/epidist_family.Rdman/epidist_family_param.Rdman/epidist_family_prior.Rdman/epidist_family_prior.default.Rdman/epidist_family_prior.lognormal.Rdman/epidist_gen_log_lik.Rdman/epidist_gen_posterior_epred.Rdman/epidist_gen_posterior_predict.Rdtests/testthat/test-aggregate_data.Rtests/testthat/test-gen.Rtests/testthat/test-marginal_model.R
💤 Files with no reviewable changes (1)
- R/globals.R
bbdc64b to
4032f36
Compare
Add delay_min parameter to as_epidist_marginal_model() to support left truncation (L parameter) from primarycensored >= 1.4.0. This fixes the broken Stan template that was missing the L parameter in the primarycensored_lpmf call. Changes: - Add delay_min parameter to marginal model (NULL/scalar/column name) - Thread delay_min through vreal5 to Stan template and R-side gen functions (dpcens/rpcens) - Preserve delay_min through aggregation via optional column mechanism - Update tests for primarycensored 1.4.0 error handling changes - Require primarycensored >= 1.4.0 Closes epinowcast#588 Closes epinowcast#583 Co-authored-by: Sam Abbott <contact@samabbott.co.uk>
Co-authored-by: Sam Abbott <contact@samabbott.co.uk>
- Use rep_len() instead of rep(length.out=) per rep_len_linter - Avoid nested pipe inside suppressMessages per nested_pipe_linter - Move delay_min NEWS entry from 0.4.0 to 0.4.0.1000 (dev version)
- Add delay_min = 0 to all vignette helper functions that construct newdata for add_epred_draws/add_predicted_draws (ebola.Rmd, faq.Rmd) - Add scalar validation for character delay_min in .add_delay_min() - Update vreal5 docs to note it defaults to 0 if absent
4032f36 to
4549dfc
Compare
seabbs
left a comment
There was a problem hiding this comment.
I think this looks reasonable but going to sit on it before merging and make a vignette.
Simulates left-truncated delay data and compares models with and without the delay_min adjustment, showing parameter recovery and fitted distribution plots.
Use simulate_gillespie + simulate_secondary instead of raw data construction to fix as_epidist_linelist_data dispatch error in R CMD check.
This is entirely from an agent so do not review until I have pinged for review as I will do a first pass
Summary
delay_minparameter toas_epidist_marginal_model()to support left truncation (Lparameter from primarycensored >= 1.4.0)Lparameter in theprimarycensored_lpmfcall (also closes Update for primarycensored L parameter (left truncation support) #583)delay_minthrough the full pipeline: data → vreal5 → Stan → R-side gen functions (dpcens/rpcens)Closes #588, closes #584
Changes
Data pipeline
as_epidist_marginal_model()acceptsdelay_minasNULL(auto-detect/default 0), numeric scalar, or column name string.add_delay_min()helper inutils.Rfollows the.add_weights()pattern.linelist_optional_cols()delay_min >= 0anddelay_lwr >= delay_minStan code
data real delay_minparameter to marginal family lpmf functionprimarycensored_lpmfcall to match 1.4.0 signature:(d | dist_id, params, pwindow, d_upper, L, D, primary_id, primary_params)R-side gen functions
epidist_gen_log_lik(),epidist_gen_posterior_predict(): extractdelay_minfromvreal5(defaults to 0 if absent for latent model compatibility)L = delay_mintoprimarycensored::dpcens()andrpcens()Vignettes
ebola.Rmdandfaq.Rmdto includedelay_min = 0in newdata for predictionsTest plan
This was opened by a bot. Please ping @seabbs for any questions.