Skip to content

Fix #588: Support left truncation via delay_min parameter#596

Open
seabbs-bot wants to merge 6 commits intoepinowcast:mainfrom
seabbs-bot:support-epidist
Open

Fix #588: Support left truncation via delay_min parameter#596
seabbs-bot wants to merge 6 commits intoepinowcast:mainfrom
seabbs-bot:support-epidist

Conversation

@seabbs-bot
Copy link
Copy Markdown
Contributor

@seabbs-bot seabbs-bot commented Apr 9, 2026

This is entirely from an agent so do not review until I have pinged for review as I will do a first pass

Summary

  • Add delay_min parameter to as_epidist_marginal_model() to support left truncation (L parameter from primarycensored >= 1.4.0)
  • Fix broken Stan template that was missing the L parameter in the primarycensored_lpmf call (also closes Update for primarycensored L parameter (left truncation support) #583)
  • Thread delay_min through the full pipeline: data → vreal5 → Stan → R-side gen functions (dpcens/rpcens)

Closes #588, closes #584

Changes

Data pipeline

  • as_epidist_marginal_model() accepts delay_min as NULL (auto-detect/default 0), numeric scalar, or column name string
  • .add_delay_min() helper in utils.R follows the .add_weights() pattern
  • Optional column preservation during aggregation via .linelist_optional_cols()
  • Validation: delay_min >= 0 and delay_lwr >= delay_min

Stan code

  • Added data real delay_min parameter to marginal family lpmf function
  • Fixed primarycensored_lpmf call to match 1.4.0 signature: (d | dist_id, params, pwindow, d_upper, L, D, primary_id, primary_params)

R-side gen functions

  • epidist_gen_log_lik(), epidist_gen_posterior_predict(): extract delay_min from vreal5 (defaults to 0 if absent for latent model compatibility)
  • Pass L = delay_min to primarycensored::dpcens() and rpcens()

Vignettes

  • Updated ebola.Rmd and faq.Rmd to include delay_min = 0 in newdata for predictions

Test plan

  • Tests for default delay_min (0), scalar, column name, auto-detect from data
  • Tests for invalid inputs (nonexistent column, negative, wrong type)
  • Tests for assertion failure when delay_lwr < delay_min
  • Tests for delay_min preservation through aggregation
  • Lint checks pass
  • Stan syntax check passes (check-cmdstan)

This was opened by a bot. Please ping @seabbs for any questions.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 9, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d6b3de3b-b771-4219-b80b-ccb3e7678503

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Walkthrough

The changes implement left-truncation support via a new delay_min parameter throughout the epidist package to align with primarycensored's new L parameter. This involves updating marginal model conversion functions, linelist data handling, likelihood computations in both R and Stan, and comprehensive documentation and test coverage.

Changes

Cohort / File(s) Summary
Package Metadata
DESCRIPTION, NEWS.md
Updated roxygen version to 7.3.3; added NEWS entries for left-truncation support in marginal models and updated primarycensored requirement to >= 1.4.0.
Data Preparation & Validation
R/linelist_data.R, R/aggregate_data.R, R/utils.R
Extended linelist data validation to support optional delay_min column; added .linelist_optional_cols() helper returning "delay_min"; auto-aggregation now includes optional columns; introduced .add_delay_min() utility to resolve delay_min from various input forms (NULL, numeric scalar, or column name).
Marginal Model Pipeline
R/marginal_model.R
Extended both as_epidist_marginal_model.epidist_linelist_data() and .epidist_aggregate_data() with delay_min = NULL parameter; added .add_delay_min() integration; strengthened assertion to validate delay_lwr >= delay_min; updated brms formula interface to pass delay_min into vreal(); extended Stan custom family to include vreal5[n] parameter.
Likelihood & Posterior Computation
R/gen.R
Extended .generic_gen_log_lik() and .analytical_gen_log_lik() to compute delay_min from prep$data$vreal5 and pass as L parameter to primarycensored::dpcens(); similarly updated epidist_gen_posterior_predict() to pass L = delay_min to primarycensored::rpcens(); added documentation for vreal4 and vreal5 prep variables.
Stan Implementation
inst/stan/marginal_model/functions.stan
Extended marginal_family_lpmf() function signature to accept data real delay_min parameter; updated primarycensored_lpmf call to pass delay_min as L argument instead of hardcoded 0.
Global Variables
R/globals.R
Removed unused "fix" from global variable declarations.
Function Documentation
man/as_epidist_marginal_model.epidist_linelist_data.Rd, man/as_epidist_marginal_model.epidist_aggregate_data.Rd, man/dot-add_delay_min.Rd
Added/updated documentation for new delay_min parameter with usage forms and behavior; expanded weight argument documentation for linelist method; created new documentation file for .add_delay_min() helper.
Cross-Reference Updates
man/dot-add_dpar_info.Rd, man/dot-add_weights.Rd, man/dot-get_brms_fn.Rd, man/epidist.Rd, man/epidist_family.Rd, man/epidist_family_param.Rd, man/epidist_family_prior.Rd, man/epidist_family_prior.default.Rd, man/epidist_family_prior.lognormal.Rd, man/epidist_gen_log_lik.Rd, man/epidist_gen_posterior_epred.Rd, man/epidist_gen_posterior_predict.Rd, man/as_epidist_naive_model.epidist_linelist_data.Rd
Updated hyperlink targets for brmsfamily() references from generic link to explicit brms::brmsfamily() namespace; expanded prep variable documentation in likelihood/posterior functions to include vreal4 and vreal5; clarified weight default behavior in method documentation.
Test Coverage
tests/testthat/test-marginal_model.R, tests/testthat/test-aggregate_data.R, tests/testthat/test-gen.R
Added comprehensive test coverage for delay_min parameter including: default behavior (0), scalar values, column name resolution, data inheritance; error cases for invalid inputs and constraint violations; updated existing posterior/epred tests to include delay_min = 0 in test fixtures.

Sequence Diagram

sequenceDiagram
    participant User
    participant MarginModel as Marginal Model<br/>(R/marginal_model.R)
    participant DataPrep as Data Preparation<br/>(utils.R, linelist_data.R)
    participant RLikelihood as R Likelihood<br/>(gen.R)
    participant Stan as Stan Functions
    participant PCens as primarycensored
    
    User->>MarginModel: as_epidist_marginal_model(..., delay_min=?)
    MarginModel->>DataPrep: .add_delay_min(data, delay_min)
    DataPrep->>DataPrep: Resolve delay_min<br/>(NULL/scalar/column)
    DataPrep-->>MarginModel: data with delay_min
    MarginModel->>MarginModel: Validate delay_lwr >= delay_min
    MarginModel->>Stan: Pass delay_min → vreal5
    
    Note over RLikelihood,PCens: During likelihood evaluation
    RLikelihood->>RLikelihood: Extract prep$data$vreal5<br/>→ delay_min
    RLikelihood->>PCens: dpcens(..., L=delay_min)
    PCens-->>RLikelihood: log probability with<br/>left truncation
    
    Note over Stan,PCens: During Stan execution
    Stan->>Stan: marginal_family_lpmf<br/>(..., delay_min)
    Stan->>PCens: primarycensored_lpmf<br/>(..., L=delay_min)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related issues

Possibly related PRs

Poem

🐰 A hop through delay's shadow realm so deep,
Where left truncation's secrets now we keep,
With delay_min whispers through the Stan,
The marginal model hops a better plan,
Primary censored sings at last in tune—
Left truncation blooms beneath the moon! 🌙✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'Fix #588: Support left truncation via delay_min parameter' clearly and specifically describes the main change—adding left truncation support via a new delay_min parameter, directly addressing the linked issue.
Linked Issues check ✅ Passed All coding requirements from issue #583 are met: delay_min parameter added to as_epidist_marginal_model(), L parameter threading through data→vreal5→Stan→R functions (dpcens/rpcens), Stan signature updated with L argument, primarycensored dependency bumped to >=1.4.0, validation rules enforced (delay_min>=0, delay_lwr>=delay_min), tests added.
Out of Scope Changes check ✅ Passed Changes are well-scoped to left-truncation support: delay_min parameter threading, primarycensored integration, documentation updates, and helper functions. Incidental improvements (brmsfamily link updates, optional column aggregation, weight documentation) are minor and reasonably related to broader code quality.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (5)
tests/testthat/test-aggregate_data.R (1)

146-155: Strengthen this test with non-constant delay_min values.

Current assertions pass even if delay_min were accidentally not used as a grouping key. A mixed-value fixture would better catch regressions.

Suggested enhancement
 test_that(
   "as_epidist_aggregate_data.epidist_linelist_data preserves delay_min", # nolint
   {
     data_with_min <- sim_obs
-    data_with_min$delay_min <- 1
+    data_with_min$delay_min <- rep(c(0, 1), length.out = nrow(data_with_min))
     agg <- as_epidist_aggregate_data(data_with_min)
     expect_true("delay_min" %in% names(agg))
-    expect_true(all(agg$delay_min == 1))
+    expect_setequal(unique(agg$delay_min), c(0, 1))
   }
 )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/testthat/test-aggregate_data.R` around lines 146 - 155, The test uses a
constant delay_min so it won't detect regressions where delay_min isn't treated
as a grouping key; change the fixture (data_with_min derived from sim_obs) to
assign non-constant delay_min values (e.g., a repeating or varied vector) across
rows, call as_epidist_aggregate_data.epidist_linelist_data
(as_epidist_aggregate_data) to produce agg, and replace the simple all-equals
assertion with checks that agg contains the expected distinct delay_min values
and that rows with different delay_min in the input map to separate groups in
agg (use names: data_with_min, sim_obs, agg, and the function
as_epidist_aggregate_data.epidist_linelist_data to locate the code).
tests/testthat/test-gen.R (1)

54-55: Add one non-zero delay_min case in posterior draw tests.

Nice update for default behavior. To better protect the new left-truncation path, add at least one case with delay_min > 0 and assert generated values respect that lower bound (e.g., min(pred) >= delay_min).

Also applies to: 114-114

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/testthat/test-gen.R` around lines 54 - 55, Add a posterior-draw test
case that exercises the left-truncation path by passing a non-zero delay_min
(e.g., delay_min = 1 or >0) along with the existing parameters
(relative_obs_time, pwindow, swindow, delay_upr) and assert that the generated
predictions satisfy the lower bound (use an assertion like checking min(pred) >=
delay_min). Locate the posterior draw invocation in the test file (the block
using relative_obs_time, pwindow, swindow, delay_upr, delay_min) and
duplicate/extend it with a non-zero delay_min scenario and the corresponding
assertion to ensure values respect the lower bound.
R/aggregate_data.R (1)

178-187: Deduplicate group_vars before aggregation.

Now that optional columns are auto-included, by can re-add the same variables (e.g., delay_min). A unique() pass keeps grouping intent explicit and avoids duplicate selectors.

♻️ Proposed patch
   # Auto-include optional columns that exist in the data
   optional_cols <- intersect(.linelist_optional_cols(), names(data))
   group_vars <- c(group_vars, optional_cols)

   # Combine required variables with user-specified ones
   if (!is.null(by)) {
     assert_character(by)
     assert_names(names(data), must.include = by)
     group_vars <- c(group_vars, by)
   }
+  group_vars <- unique(group_vars)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@R/aggregate_data.R` around lines 178 - 187, The group_vars vector can get
duplicate entries because optional columns from .linelist_optional_cols() and
the user-provided by may overlap (e.g., delay_min); update the code that builds
group_vars (the variable group_vars in aggregate_data.R where optional_cols are
appended and where by is added) to deduplicate by wrapping the combined vector
with unique() (i.e., set group_vars <- unique(group_vars) after modifications)
so subsequent aggregation/grouping uses a clean list of grouping variables.
tests/testthat/test-marginal_model.R (1)

57-76: Strengthen column/inheritance tests with non-default values.

At Line 61 and Line 73, using only 0 can mask regressions because 0 is also the default fallback. Prefer non-default (and ideally varying) values plus exact equality checks.

✅ Suggested test hardening
 test_that(
   "as_epidist_marginal_model handles column name delay_min",
   {
     data_with_min <- sim_obs
-    data_with_min$my_min <- 0
+    data_with_min$my_min <- rep_len(c(1, 2), nrow(data_with_min))
     model <- as_epidist_marginal_model(
       data_with_min, delay_min = "my_min"
     )
-    expect_true(all(model$delay_min == 0))
+    expect_identical(model$delay_min, data_with_min$my_min)
   }
 )

 test_that(
   "as_epidist_marginal_model inherits delay_min from data",
   {
     data_with_min <- sim_obs
-    data_with_min$delay_min <- 0
+    data_with_min$delay_min <- rep_len(c(2, 3), nrow(data_with_min))
     model <- as_epidist_marginal_model(data_with_min)
-    expect_true(all(model$delay_min == 0))
+    expect_identical(model$delay_min, data_with_min$delay_min)
   }
 )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/testthat/test-marginal_model.R` around lines 57 - 76, The tests use the
default fallback value 0 which can mask regressions; update both tests for
as_epidist_marginal_model to set non-default and varying values (e.g., a vector
like c(1,2,3) or -1/1) in the column (data_with_min$my_min and
data_with_min$delay_min), call as_epidist_marginal_model with and without the
delay_min argument, and replace loose checks (expect_true(all(... == 0))) with
strict equality assertions (e.g., expect_equal(model$delay_min,
<expected_vector>)) to verify exact inheritance and mapping.
man/dot-add_delay_min.Rd (1)

12-21: Consider documenting the non-negative numeric constraint explicitly.

The docs currently say “numeric scalar” but not >= 0; adding that detail would match runtime validation and reduce ambiguity.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@man/dot-add_delay_min.Rd` around lines 12 - 21, Update the documentation for
the delay_min argument to state that when provided as a numeric scalar it must
be non-negative (>= 0); mention this constraint alongside the existing
descriptions for NULL and character values so the resolved behavior of delay_min
(NULL uses existing column or defaults to 0, numeric must be >= 0, character
looks up the named column) is explicit; reference the delay_min argument in
dot-add_delay_min.Rd and ensure the value section still indicates the data frame
will include a delay_min column.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@man/epidist_gen_log_lik.Rd`:
- Around line 25-26: Update the documentation for the parameter vreal5 to
reflect that it is optional and defaults to 0 (the implementation fallback in
R/gen.R sets delay_min <- 0 when vreal5 is missing); change the description for
vreal5 from a required "minimum delay (left truncation point)" to indicate it is
optional with default 0 and reference that it maps to delay_min in the code
(R/gen.R).

In `@R/marginal_model.R`:
- Around line 64-70: The validation for delay_min in .add_delay_min (R/utils.R)
currently only tests is.character(delay_min) and allows vectors like
c("col_a","col_b") which later cause a base subsetting error when used as
data[[delay_min]]; update .add_delay_min to explicitly reject non-scalar
character inputs by checking is.character(delay_min) && length(delay_min)==1
(and similarly ensure any numeric input is length 1), and raise a clear error
message that delay_min must be NULL, a single numeric scalar, or a single column
name string; apply the same scalar-character validation to the analogous
handling block referenced around lines 89-105 so both spots provide consistent,
user-friendly errors.

In `@R/utils.R`:
- Around line 414-417: When handling the character branch for delay_min
(condition using is.character(delay_min)), validate that delay_min is a single
scalar name before indexing: check length(delay_min) == 1 and, if not, raise a
clear error (e.g., "delay_min must be a single column name") so we don't hit a
cryptic [[ indexing failure later; then continue to call
assert_names(names(data), must.include = delay_min) and assign data$delay_min <-
data[[delay_min]] as before. Ensure you reference the existing symbols
delay_min, assert_names, and data$delay_min in the change.

---

Nitpick comments:
In `@man/dot-add_delay_min.Rd`:
- Around line 12-21: Update the documentation for the delay_min argument to
state that when provided as a numeric scalar it must be non-negative (>= 0);
mention this constraint alongside the existing descriptions for NULL and
character values so the resolved behavior of delay_min (NULL uses existing
column or defaults to 0, numeric must be >= 0, character looks up the named
column) is explicit; reference the delay_min argument in dot-add_delay_min.Rd
and ensure the value section still indicates the data frame will include a
delay_min column.

In `@R/aggregate_data.R`:
- Around line 178-187: The group_vars vector can get duplicate entries because
optional columns from .linelist_optional_cols() and the user-provided by may
overlap (e.g., delay_min); update the code that builds group_vars (the variable
group_vars in aggregate_data.R where optional_cols are appended and where by is
added) to deduplicate by wrapping the combined vector with unique() (i.e., set
group_vars <- unique(group_vars) after modifications) so subsequent
aggregation/grouping uses a clean list of grouping variables.

In `@tests/testthat/test-aggregate_data.R`:
- Around line 146-155: The test uses a constant delay_min so it won't detect
regressions where delay_min isn't treated as a grouping key; change the fixture
(data_with_min derived from sim_obs) to assign non-constant delay_min values
(e.g., a repeating or varied vector) across rows, call
as_epidist_aggregate_data.epidist_linelist_data (as_epidist_aggregate_data) to
produce agg, and replace the simple all-equals assertion with checks that agg
contains the expected distinct delay_min values and that rows with different
delay_min in the input map to separate groups in agg (use names: data_with_min,
sim_obs, agg, and the function as_epidist_aggregate_data.epidist_linelist_data
to locate the code).

In `@tests/testthat/test-gen.R`:
- Around line 54-55: Add a posterior-draw test case that exercises the
left-truncation path by passing a non-zero delay_min (e.g., delay_min = 1 or >0)
along with the existing parameters (relative_obs_time, pwindow, swindow,
delay_upr) and assert that the generated predictions satisfy the lower bound
(use an assertion like checking min(pred) >= delay_min). Locate the posterior
draw invocation in the test file (the block using relative_obs_time, pwindow,
swindow, delay_upr, delay_min) and duplicate/extend it with a non-zero delay_min
scenario and the corresponding assertion to ensure values respect the lower
bound.

In `@tests/testthat/test-marginal_model.R`:
- Around line 57-76: The tests use the default fallback value 0 which can mask
regressions; update both tests for as_epidist_marginal_model to set non-default
and varying values (e.g., a vector like c(1,2,3) or -1/1) in the column
(data_with_min$my_min and data_with_min$delay_min), call
as_epidist_marginal_model with and without the delay_min argument, and replace
loose checks (expect_true(all(... == 0))) with strict equality assertions (e.g.,
expect_equal(model$delay_min, <expected_vector>)) to verify exact inheritance
and mapping.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0a33974c-1786-4b0b-b90c-5e3ed808c47c

📥 Commits

Reviewing files that changed from the base of the PR and between b8a9c4b and 4519a77.

📒 Files selected for processing (28)
  • DESCRIPTION
  • NEWS.md
  • R/aggregate_data.R
  • R/gen.R
  • R/globals.R
  • R/linelist_data.R
  • R/marginal_model.R
  • R/utils.R
  • inst/stan/marginal_model/functions.stan
  • man/as_epidist_marginal_model.epidist_aggregate_data.Rd
  • man/as_epidist_marginal_model.epidist_linelist_data.Rd
  • man/as_epidist_naive_model.epidist_linelist_data.Rd
  • man/dot-add_delay_min.Rd
  • man/dot-add_dpar_info.Rd
  • man/dot-add_weights.Rd
  • man/dot-get_brms_fn.Rd
  • man/epidist.Rd
  • man/epidist_family.Rd
  • man/epidist_family_param.Rd
  • man/epidist_family_prior.Rd
  • man/epidist_family_prior.default.Rd
  • man/epidist_family_prior.lognormal.Rd
  • man/epidist_gen_log_lik.Rd
  • man/epidist_gen_posterior_epred.Rd
  • man/epidist_gen_posterior_predict.Rd
  • tests/testthat/test-aggregate_data.R
  • tests/testthat/test-gen.R
  • tests/testthat/test-marginal_model.R
💤 Files with no reviewable changes (1)
  • R/globals.R

Comment thread man/epidist_gen_log_lik.Rd Outdated
Comment thread R/marginal_model.R
Comment thread R/utils.R
seabbs-bot and others added 4 commits April 9, 2026 11:45
Add delay_min parameter to as_epidist_marginal_model() to support left
truncation (L parameter) from primarycensored >= 1.4.0. This fixes the
broken Stan template that was missing the L parameter in the
primarycensored_lpmf call.

Changes:
- Add delay_min parameter to marginal model (NULL/scalar/column name)
- Thread delay_min through vreal5 to Stan template and R-side gen
  functions (dpcens/rpcens)
- Preserve delay_min through aggregation via optional column mechanism
- Update tests for primarycensored 1.4.0 error handling changes
- Require primarycensored >= 1.4.0

Closes epinowcast#588
Closes epinowcast#583

Co-authored-by: Sam Abbott <contact@samabbott.co.uk>
Co-authored-by: Sam Abbott <contact@samabbott.co.uk>
- Use rep_len() instead of rep(length.out=) per rep_len_linter
- Avoid nested pipe inside suppressMessages per nested_pipe_linter
- Move delay_min NEWS entry from 0.4.0 to 0.4.0.1000 (dev version)
- Add delay_min = 0 to all vignette helper functions that construct
  newdata for add_epred_draws/add_predicted_draws (ebola.Rmd, faq.Rmd)
- Add scalar validation for character delay_min in .add_delay_min()
- Update vreal5 docs to note it defaults to 0 if absent
@seabbs seabbs self-requested a review April 9, 2026 10:56
Copy link
Copy Markdown
Contributor

@seabbs seabbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks reasonable but going to sit on it before merging and make a vignette.

Simulates left-truncated delay data and compares models with and
without the delay_min adjustment, showing parameter recovery and
fitted distribution plots.
Use simulate_gillespie + simulate_secondary instead of raw data
construction to fix as_epidist_linelist_data dispatch error in
R CMD check.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants