Improve simulation#254
Conversation
…n validation - Added support for variability formulas in the simulate_data function, allowing users to specify a formula for variability alongside the main formula. - Implemented dimension validation to ensure that simulated data does not exceed the dimensions of the original model, preventing potential errors during simulation. - Introduced a new utility function to normalize sum-to-zero vector parameters in posterior draws, addressing floating-point precision issues. - Updated Stan model to accommodate new design matrices for random effects and variability, ensuring compatibility with existing functionality. - Added comprehensive unit tests to validate the new features and ensure robustness against various model configurations.
- Introduced `sccomp_simulate` function for simulating data from fitted models, enhancing usability and flexibility. - Deprecated the `simulate_data` function, providing a warning to users to transition to `sccomp_simulate`. - Updated documentation to reflect changes and provide clear guidance on the new function's usage. - Enhanced unit tests to validate the new simulation functionality and ensure compatibility with existing workflows. - Adjusted Stan model to accommodate new parameters for user-provided coefficients, improving simulation accuracy and performance.
- Added validation to ensure user-provided coefficients sum to zero within a specified tolerance, crucial for compositional models. - Implemented checks using tidyverse to identify and report invalid coefficients, improving error handling and user feedback. - Updated comments to clarify the importance of coefficient normalization in the context of simulation.
- Added optional parameters `mean_dispersion_slope` and `mean_dispersion_intercept` to allow users to specify the slope and intercept for the mean-dispersion association, enhancing flexibility in simulations. - Updated the `simulate_data` function to reflect changes in parameter defaults, setting `variability_multiplier` to 1 for consistency. - Enhanced documentation to include new parameters and their usage, ensuring clarity for users. - Adjusted the Stan model to accommodate the new mean-dispersion parameters, improving simulation accuracy.
- Changed default `variability_multiplier` from 5 to 1 in `sccomp_simulate()` and `simulate_data()`, preserving model variability by default. - Enhanced simulation documentation for clarity on `mean_dispersion_slope` effects and updated vignette examples for consistency. - Clarified behavior of mean-dispersion association parameters in simulation, detailing their impact on intercept and factor-level variability.
There was a problem hiding this comment.
Pull request overview
This pull request improves the simulation functionality in the sccomp package by introducing a new sccomp_simulate() function with enhanced capabilities, deprecating the old simulate_data() function, and providing comprehensive documentation and tests.
Changes:
- Introduced new
sccomp_simulate()function with improved API design supporting separate coefficient tables, optional mean-dispersion association parameters, and better handling of new data scenarios - Changed default
variability_multiplierfrom 5 to 1 to preserve fitted model variability by default - Added comprehensive test suite with 11 test cases covering various simulation scenarios
- Updated Stan model to support user-provided coefficients and mean-dispersion parameters
- Created extensive vignette with practical examples and use cases
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| R/simulate_data.R | Major refactoring introducing sccomp_simulate() with new API; deprecated simulate_data() wrapper |
| inst/stan/glm_multi_beta_binomial_simulate_data.stan | Enhanced Stan model with user-provided coefficients and mean-dispersion parameters support |
| tests/testthat/test-simulate_data.R | New comprehensive test suite with 407 lines covering 11 different scenarios |
| man/sccomp_simulate.Rd | New documentation for sccomp_simulate() function |
| man/simulate_data.Rd | Updated to indicate deprecation |
| dev/sccomp_simulate.qmd | New 859-line vignette with detailed examples and use cases |
| R/utilities.R | Added normalize_sum_to_zero_params() utility function (not currently used) |
| inst/NEWS.rd | Release notes documenting simulation improvements |
| NAMESPACE | Updated exports for new function |
| DESCRIPTION | Version bump to 2.1.25 |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| mean_dispersion_slope = mean_dispersion_slope, | ||
| mean_dispersion_intercept = mean_dispersion_intercept, |
There was a problem hiding this comment.
The deprecated simulate_data() function references mean_dispersion_slope and mean_dispersion_intercept parameters on lines 730-731, but these parameters are not defined in the function signature. This will cause an error when users call simulate_data(). These parameters should either be added to the function signature or removed from the call to sccomp_simulate().
There was a problem hiding this comment.
removed from the call to sccomp_simulate()
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
- Updated unit tests to check for the presence of the `cache_stan_model` parameter in `sccomp_simulate`, reflecting the recent function introduction. - Maintained checks for the deprecated `simulate_data` function to ensure backward compatibility. - Verified that the default value of `cache_stan_model` is set correctly across relevant functions.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 8 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| first_indices = gsub(paste0("^", param_pattern, "\\[(\\d+),.*"), "\\1", param_names) | ||
| second_indices = gsub(paste0("^", param_pattern, "\\[\\d+,(\\d+)\\]"), "\\1", param_names) |
| # Unique variability design (for compatibility) | ||
| XA = Xa %>% | ||
| as_tibble() %>% | ||
| distinct() | ||
|
|
| matrix[A_simulated, A_simulated] XA_simulated; | ||
| matrix[C_simulated,M_simulated] beta_simulated; | ||
| matrix[N_simulated, A_simulated] Xa_simulated; // Variability design matrix for simulated data | ||
| matrix[A_simulated, A_simulated] XA_simulated; // Unique variability design (for compatibility with old code) |
| \item **Simulation improvements:** Changed default \code{variability_multiplier} parameter from 5 to 1 in \code{sccomp_simulate()} and \code{simulate_data()}. The new default preserves the fitted model's variability by default, rather than artificially increasing it. Users can still specify custom values greater than 1 to increase variability for benchmarking or sensitivity analysis. | ||
| \item Enhanced simulation documentation. Improved clarity on how \code{mean_dispersion_slope} parameter affects both intercept and factor-level variability parameters in the variability formula. Updated vignette examples to use consistent \code{number_of_draws} values. | ||
| \item Clarified behavior of mean-dispersion association parameters in simulation. The \code{mean_dispersion_slope} affects the slope for both intercept columns (alpha1) and factor columns (alpha2, e.g., type), with intercept only added to intercept columns as per the model specification. |
| #' | ||
| #' # # Simulate data | ||
| #' # simulate_data(counts_obj, estimate, ~type, ~1, sample, cell_group, c(b_0, b_1)) | ||
| #' # sccomp_simulate(estimate, ~type, ~1, counts_obj, sample, cell_group, c(b_0, b_1)) |
| # Helper function to skip tests if cmdstan is not available | ||
| skip_cmdstan <- function() { | ||
| if (!instantiate::stan_cmdstan_exists()) { | ||
| skip("CmdStan not available") | ||
| } | ||
| } | ||
|
|
| \item{mean_dispersion_slope}{Optional numeric value for the slope parameter of the mean-dispersion association. If NULL, the slope from the fitted model (prec_coeff\link{2}) will be used.} | ||
|
|
||
| \item{mean_dispersion_intercept}{Optional numeric value for the intercept parameter of the mean-dispersion association. If NULL, the intercept from the fitted model (prec_coeff\link{1}) will be used. This ensures that when only the slope is changed, the intercept remains the same.} |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
No description provided.