Skip to content

Commit 529ee03

Browse files
authored
Merge pull request #76 from kgoldfeld/joss-submission
Joss submission
2 parents 75b5d8d + b2d3382 commit 529ee03

12 files changed

+794
-16
lines changed

.Rbuildignore

+2
Original file line numberDiff line numberDiff line change
@@ -18,3 +18,5 @@
1818
^tests/\.lintr$
1919
^File_management$
2020
^simstudy\.code-workspace$
21+
^codemeta\.json$
22+
^paper$

DESCRIPTION

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
Type: Package
22
Package: simstudy
33
Title: Simulation of Study Data
4-
Version: 0.2.1.9000
5-
Date: 2020-10-07
4+
Version: 0.2.2
5+
Date: 2020-10-26
66
Authors@R:
77
c(person(given = "Keith",
88
family = "Goldfeld",

NEWS.md

+2-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
1-
# simstudy (development version)
1+
# simstudy 0.2.2
2+
* Improve documentation and vignettes.
23

34
# simstudy 0.2.1
45
* Add 'backports' for compatibility with R < 4.0

R/add_correlated_data.R

+3-1
Original file line numberDiff line numberDiff line change
@@ -292,13 +292,15 @@ addCorFlex <- function(dt, defs, rho = 0, tau = NULL, corstr = "cs",
292292
#' @param method Two methods are available to generate correlated data. (1) "copula" uses
293293
#' the multivariate Gaussian copula method that is applied to all other distributions; this
294294
#' applies to all available distributions. (2) "ep" uses an algorithm developed by
295-
#' Emrich and Piedmonte.
295+
#' Emrich and Piedmonte (1991).
296296
#' @param formSpec The formula (as a string) that was used to generate the binary
297297
#' outcome in the `defDataAdd` statement. This is only necessary when method "ep" is
298298
#' requested.
299299
#' @param periodvar A string value that indicates the name of the field that indexes
300300
#' the repeated measurement for an individual unit. The value defaults to "period".
301301
#' @return Original data.table with added column(s) of correlated data
302+
#' @references Emrich LJ, Piedmonte MR. A Method for Generating High-Dimensional
303+
#' Multivariate Binary Variates. The American Statistician 1991;45:302-4.
302304
#' @examples
303305
#' # Wide example
304306
#'

R/generate_correlated_data.R

+3-1
Original file line numberDiff line numberDiff line change
@@ -250,10 +250,12 @@ genCorFlex <- function(n, defs, rho = 0, tau = NULL, corstr = "cs", corMatrix =
250250
#' @param method Two methods are available to generate correlated data. (1) "copula" uses
251251
#' the multivariate Gaussian copula method that is applied to all other distributions; this
252252
#' applies to all available distributions. (2) "ep" uses an algorithm developed by
253-
#' Emrich and Piedmonte.
253+
#' Emrich and Piedmonte (1991).
254254
#' @param idname Character value that specifies the name of the id variable.
255255
#'
256256
#' @return data.table with added column(s) of correlated data
257+
#' @references Emrich LJ, Piedmonte MR. A Method for Generating High-Dimensional
258+
#' Multivariate Binary Variates. The American Statistician 1991;45:302-4.
257259
#' @examples
258260
#' set.seed(23432)
259261
#' l <- c(8, 10, 12)

README.Rmd

+3-1
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ knitr::opts_chunk$set(
1616
<!-- badges: start -->
1717
[![R build status](https://github.com/kgoldfeld/simstudy/workflows/R-CMD-check/badge.svg?branch=main)](https://github.com/kgoldfeld/simstudy/actions){target="_blank"}
1818
[![CRAN status](https://www.r-pkg.org/badges/version/simstudy)](https://CRAN.R-project.org/package=simstudy){target="_blank"}
19+
[![status](https://joss.theoj.org/papers/640fd4333948933b2817343e86df3424/status.svg)](https://joss.theoj.org/papers/640fd4333948933b2817343e86df3424){target="_blank"}
1920
[![CRAN downloads](https://cranlogs.r-pkg.org/badges/grand-total/simstudy)](https://CRAN.R-project.org/package=simstudy){target="_blank"}
2021
[![codecov](https://codecov.io/gh/kgoldfeld/simstudy/branch/main/graph/badge.svg)](https://codecov.io/gh/kgoldfeld/simstudy){target="_blank"}
2122
[![Lifecycle: stable](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://www.tidyverse.org/lifecycle/#stable){target="_blank"}
@@ -25,7 +26,8 @@ The `simstudy` package is a collection of functions that allow users to generate
2526

2627
Simulation using `simstudy` has two fundamental steps. The user (1) **defines** the data elements of a data set and (2) **generates** the data based on these definitions. Additional functionality exists to simulate observed or randomized **treatment assignment/exposures**, to create **longitudinal/panel** data, to create **multi-level/hierarchical** data, to create datasets with **correlated variables** based on a specified covariance structure, to **merge** datasets, to create data sets with **missing** data, and to create non-linear relationships with underlying **spline** curves.
2728

28-
The overarching philosophy of `simstudy` is to create data generating processes that mimic the typical models used to fit those types of data. So, the parameterization of some of the data generating processes may not follow the standard parameterizations for the specific distributions. For example, in `simstudy` *gamma*-distributed data are generated based on the specification of a mean &mu; (or log(&mu;)) and a dispersion $d$, rather than shape &alpha; and rate &beta; parameters that more typically characterize the *gamma* distribution. When we estimate the parameters, we are modeling &mu; (or some function of &mu;), so we should explicitly recover the `simstudy` parameters used to generate the model, thus illuminating the relationship between the underlying data generating processes and the models.
29+
The overarching philosophy of `simstudy` is to create data generating processes that mimic the typical models used to fit those types of data. So, the parameterization of some of the data generating processes may not follow the standard parameterizations for the specific distributions. For example, in `simstudy` *gamma*-distributed data are generated based on the specification of a mean &mu; (or log(&mu;)) and a dispersion $d$, rather than shape &alpha; and rate &beta; parameters that more typically characterize the *gamma* distribution. When we estimate the parameters, we are modeling &mu; (or some function of &mu;), so we should explicitly recover the `simstudy` parameters used to generate the model, thus illuminating the relationship between the underlying data generating processes and the models. For more details on the
30+
package, use cases, examples, and function reference see the [documentation page](https://kgoldfeld.github.io/simstudy/articles/simstudy.html).
2931

3032

3133
## Installation

README.md

+11-8
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ simstudy
99
status](https://github.com/kgoldfeld/simstudy/workflows/R-CMD-check/badge.svg?branch=main)](https://github.com/kgoldfeld/simstudy/actions)
1010
[![CRAN
1111
status](https://www.r-pkg.org/badges/version/simstudy)](https://CRAN.R-project.org/package=simstudy)
12+
[![status](https://joss.theoj.org/papers/640fd4333948933b2817343e86df3424/status.svg)](https://joss.theoj.org/papers/640fd4333948933b2817343e86df3424)
1213
[![CRAN
1314
downloads](https://cranlogs.r-pkg.org/badges/grand-total/simstudy)](https://CRAN.R-project.org/package=simstudy)
1415
[![codecov](https://codecov.io/gh/kgoldfeld/simstudy/branch/main/graph/badge.svg)](https://codecov.io/gh/kgoldfeld/simstudy)
@@ -48,7 +49,9 @@ typically characterize the *gamma* distribution. When we estimate the
4849
parameters, we are modeling μ (or some function of μ), so we should
4950
explicitly recover the `simstudy` parameters used to generate the model,
5051
thus illuminating the relationship between the underlying data
51-
generating processes and the models.
52+
generating processes and the models. For more details on the package,
53+
use cases, examples, and function reference see the [documentation
54+
page](https://kgoldfeld.github.io/simstudy/articles/simstudy.html).
5255

5356
## Installation
5457

@@ -83,16 +86,16 @@ dd <- trtAssign(dd, nTrt = 4, grpName = "grp", balanced = TRUE)
8386
dd
8487
#> id x y grp
8588
#> 1: 1 11.191960 8.949389 4
86-
#> 2: 2 10.418375 7.372060 2
87-
#> 3: 3 8.512109 6.925844 4
89+
#> 2: 2 10.418375 7.372060 4
90+
#> 3: 3 8.512109 6.925844 3
8891
#> 4: 4 11.361632 9.850340 4
89-
#> 5: 5 9.928811 6.515463 2
92+
#> 5: 5 9.928811 6.515463 4
9093
#> ---
91-
#> 246: 246 8.220609 7.898416 4
92-
#> 247: 247 8.531483 8.681783 4
93-
#> 248: 248 10.507370 8.552350 4
94+
#> 246: 246 8.220609 7.898416 2
95+
#> 247: 247 8.531483 8.681783 2
96+
#> 248: 248 10.507370 8.552350 3
9497
#> 249: 249 8.621339 6.652300 1
95-
#> 250: 250 9.508164 7.083845 4
98+
#> 250: 250 9.508164 7.083845 3
9699
```
97100

98101
## Contributing & Support

0 commit comments

Comments
 (0)