pleiosim is an R package for simulating pleiotropic genetic data, including genotype generation, phenotype simulation, and summary statistics calculation. It provides flexible tools to model complex genetic architectures, including pleiotropy, cross-trait heterogeneity, and within-trait heterogeneity.
You can install the development version of pleiosim from GitHub:
# Install devtools if not already installed
if (!requireNamespace("devtools", quietly = TRUE)) {
install.packages("devtools")
}
# Install pleiosim from GitHub
devtools::install_github("Broccolito/pleiosim")- Simulate pleiotropic, non-pleiotropic, and null variants
- Customize heritable and non-heritable correlation matrices
- Model genetic architectures with cross-trait and within-trait heterogeneity
- Generate summary statistics for downstream analyses
# Load the package
library(pleiosim)
# Set a random seed
set.seed(492357816)
# Run a basic simulation
pleio_object = pleiosim(
n_phenotype = 3, # Number of phenotypes
heritable_correlation = 0.4, # Uniform heritable correlation
nonheritable_correlation = 0.2, # Uniform non-heritable correlation
n_participant = 10000, # Number of participants
eaf = 0.4, # Effect allele frequency
n_variant_pleiotropic = 10, # Number of pleiotropic variants
n_variant_nonpleiotropic = c(10, 10, 10), # Non-pleiotropic variants per phenotype
n_variant_null = 960, # Number of null variants
heritability = c(0.1, 0.2, 0.3), # Heritability for each phenotype
unique_cohorts = TRUE, # Whether participants are unique across cohorts
crosstrait_heterogeneity = TRUE, # Enable cross-trait heterogeneity
withintrait_heterogeneity = TRUE, # Enable within-trait heterogeneity
random_crosstrait_heterogeneity = FALSE, # Disable random cross-trait heterogeneity
random_withintrait_heterogeneity = FALSE, # Disable random within-trait heterogeneity
customized_heritable_correlation_matrix = NULL, # Optional: provide custom matrix
customized_nonheritable_correlation_matrix = NULL, # Optional: provide custom matrix
customized_cohort_makeup_matrix = NULL, # Optional: custom cohort structure
customized_eaf_matrix = NULL, # Optional: custom effective allele frequency matrix
customized_efs_matrix_template = NULL, # Optional: custom effect size template
customized_efs_matrix = NULL # Optional: custom effect size matrix
)
# Display the simulation summary
pleio_objectThe object summary shows the overview of the simulation:
[pleio object]
A simulation of:
- 3 phenotypes
- 1000 SNPs
- 10 pleiotropic SNPs
- 30 non-pleiotropic SNPs
- 960 null SNPs
- Total of 30000 participantsThe summary statistics are generated and stored under
pleio@summary_stats_matrix:
$phenotype1
beta se p_value eaf
pleio_variant1 0.8029288 0.1288620 4.823870e-10 0.39970
pleio_variant2 -0.7932201 0.1294434 9.238782e-10 0.40105
pleio_variant3 0.6277678 0.1285016 1.048779e-06 0.39785
pleio_variant4 -0.8729335 0.1289542 1.366411e-11 0.40625
pleio_variant5 0.6303885 0.1288759 1.016620e-06 0.40365
pleio_variant6 -0.8310830 0.1294584 1.427507e-10 0.39590
$phenotype2
beta se p_value eaf
pleio_variant1 0.8261121 0.09105841 1.384372e-19 0.40260
pleio_variant2 -0.8485431 0.09004163 5.311210e-21 0.40380
pleio_variant3 0.8513850 0.09042683 5.766663e-21 0.40310
pleio_variant4 -0.8104512 0.09014205 2.898476e-19 0.40805
pleio_variant5 0.8120208 0.09062762 3.827500e-19 0.39975
pleio_variant6 -0.6766290 0.09095173 1.094387e-13 0.40455
$phenotype3
beta se p_value eaf
pleio_variant1 0.8188787 0.07376879 1.826734e-28 0.39780
pleio_variant2 -0.7735299 0.07422958 2.685493e-25 0.40270
pleio_variant3 0.8084351 0.07357418 6.302031e-28 0.39975
pleio_variant4 -0.8567336 0.07418582 1.175349e-30 0.39805
pleio_variant5 0.7578847 0.07425607 2.443735e-24 0.39950
pleio_variant6 -0.7877619 0.07391664 2.229663e-26 0.40170n_phenotype: Number of phenotypes to simulate.heritable_correlation: Correlation between phenotypes due to genetic effects.nonheritable_correlation: Correlation between phenotypes due to non-genetic factors.n_participant: Total number of simulated participants.eaf: Effect allele frequency for SNPs.n_variant_pleiotropic: Number of SNPs affecting multiple traits.n_variant_nonpleiotropic: Number of SNPs specific to each phenotype.n_variant_null: Number of SNPs with no effect.heritability: Vector of heritability values for each phenotype.unique_cohorts: IfTRUE, each cohort has unique participants.crosstrait_heterogeneity: Introduces variability in effects across traits.withintrait_heterogeneity: Introduces variability within each trait.random_crosstrait_heterogeneity/random_withintrait_heterogeneity: Randomize heterogeneity patterns.- Custom Matrices (
customized_*): Provide custom matrices for advanced simulations, following the structure of default matrices.
You can swap out the automatically generated matrices with customized versions for more control over your simulation. Below are tutorials for each matrix type.
Description: The heritable correlation matrix captures the genetic correlation between phenotypes, reflecting how much the genetic factors influencing one trait also affect others. Customizing this matrix allows users to simulate different genetic architectures, from independent traits to highly correlated ones.
Structure: A square matrix with dimensions equal to the number of phenotypes, diagonal values of 1, and off-diagonal values representing correlation coefficients.
Example:
custom_heritable_corr = matrix(c(1, 0.3, 0.2,
0.3, 1, 0.4,
0.2, 0.4, 1), nrow = 3)
pleio_object = pleiosim(
n_phenotype = 3,
customized_heritable_correlation_matrix = custom_heritable_corr
)Description: This matrix models the correlation between phenotypes due to environmental or other non-genetic factors. Customizing it helps simulate different shared environmental influences or measurement errors across traits.
Structure: Similar to the heritable correlation matrix.
Example:
custom_nonheritable_corr = matrix(c(1, 0.1, 0.05,
0.1, 1, 0.2,
0.05, 0.2, 1), nrow = 3)
pleio_object = pleiosim(
n_phenotype = 3,
customized_nonheritable_correlation_matrix = custom_nonheritable_corr
)Description: Defines the number of participants in each cohort and their overlaps. Customizing this matrix allows simulation of diverse cohort structures, including overlapping cohorts, multi-ancestry designs, or longitudinal datasets.
Structure: A matrix specifying participant counts for each cohort and their intersections.
Example:
custom_cohort_makeup = matrix(c(5000, 5000, 5000, 2000, 1000, 1500, 800), nrow = 1)
colnames(custom_cohort_makeup) = c("cohort1", "cohort2", "cohort3", "cohort1_cohort2", "cohort1_cohort3", "cohort2_cohort3", "cohort1_cohort2_cohort3")
rownames(custom_cohort_makeup) = "n_participant"
pleio_object = pleiosim(
n_phenotype = 3,
customized_cohort_makeup_matrix = custom_cohort_makeup
)Description: Represents the frequency of effect alleles across cohorts. Customizing this matrix helps simulate populations with different genetic ancestries, reflecting real-world allele frequency differences among diverse groups.
Structure: Rows represent SNPs, columns represent cohorts. Values are allele frequencies.
Example:
custom_eaf = matrix(runif(21, 0.1, 0.9), nrow = 21, ncol = 3)
colnames(custom_eaf) = c("cohort1", "cohort2", "cohort3")
rownames(custom_eaf) = paste0("variant", 1:21)
pleio_object = pleiosim(
n_phenotype = 3,
customized_eaf_matrix = custom_eaf
)Description: Specifies which variants affect which phenotypes. Customizing this template allows simulation of different pleiotropic patterns, helping explore genetic architectures where certain variants influence specific traits.
Structure: Rows represent phenotypes, columns represent variants. Binary values (1 or 0) indicate presence or absence of effects.
Example:
custom_efs_template = matrix(c(1, 0, 0,
1, 1, 0,
0, 1, 1), nrow = 3)
rownames(custom_efs_template) = c("phenotype1", "phenotype2", "phenotype3")
colnames(custom_efs_template) = paste0("variant", 1:3)
pleio_object = pleiosim(
n_phenotype = 3,
customized_efs_matrix_template = custom_efs_template
)Description: Defines the actual effect sizes of variants on phenotypes. Customizing this matrix enables control over the magnitude and direction of genetic effects, allowing simulations of strong or weak associations, or specific genetic models.
Structure: Similar to the template, but with continuous effect sizes instead of binary values.
Example:
custom_efs = matrix(c(0.8, -0.5, 0.3, 0.4, 1.2, -0.7, -0.3, 0.6, 0.9), nrow = 3)
rownames(custom_efs) = c("phenotype1", "phenotype2", "phenotype3")
colnames(custom_efs) = paste0("variant", 1:3)
pleio_object = pleiosim(
n_phenotype = 3,
customized_efs_matrix = custom_efs
)The pleio object contains the following slots:
-
n_phenotype: Number of phenotypes simulated.
-
Example:
3 -
n_variant_pleiotropic: Number of pleiotropic variants.
-
Example:
10 -
n_variant_nonpleiotropic: Number of non-pleiotropic variants per phenotype.
-
Example:
c(10, 10, 10) -
n_variant_null: Number of null variants.
-
Example:
960 -
heritability: Vector of heritability values for each phenotype.
-
Example:
c(0.1, 0.2, 0.3) -
unique_cohorts: Indicates if participants are unique across cohorts.
-
Example:
TRUE -
crosstrait_heterogeneity / withintrait_heterogeneity: Flags for heterogeneity modeling.
-
Example:
TRUE -
heritable_correlation_matrix: Matrix showing heritable correlation between phenotypes.
-
Example:
matrix(c(1, 0.4, 0.3, 0.4, 1, 0.5, 0.3, 0.5, 1), nrow=3) -
nonheritable_correlation_matrix: Matrix for non-heritable correlations.
-
Example:
matrix(c(1, 0.2, 0.1, 0.2, 1, 0.15, 0.1, 0.15, 1), nrow=3) -
cohort_makeup_matrix: Defines the participant structure.
-
Example: See Cohort Makeup Matrix section.
-
eaf_matrix: Matrix of effect allele frequencies.
-
Example: See Effect Allele Frequency Matrix section.
-
efs_matrix_template / efs_matrix: Effect size templates and actual matrices.
-
Example: See Effect Size Matrix section.
-
summary_stats_matrix: Summary statistics for each phenotype.
-
Example: Matrix with
beta,se,p_value,eafcolumns.
This package is licensed under the MIT License. See the LICENSE file for details.
Developed by Wanjun Gu (wanjun for gu at ucsf.edu)
