This archive is distributed in association with the INFORMS Journal on Computing under the MIT License.
The software and data in this repository are a snapshot of the software and data that were used in the research reported in the paper Fast Association Recovery in High Dimensions by Parallel Learning by Ruipeng Dong, and Canhong Wen.
Important: This code is only be used to reproduce the results of the paper, and we will not maintain this project continuously. If you find any bugs, please email Ruipeng Dong ([email protected] or [email protected]).
To cite the contents of this repository, please cite both the paper and this repo, using their respective DOIs.
https://doi.org/10.1287/ijoc.2024.0691
https://doi.org/10.1287/ijoc.2024.0691.cd
Below is the BibTex for citing this snapshot of the repository.
@misc{cospa2025,
author = {Ruipeng Dong and Canhong Wen},
publisher = {INFORMS Journal on Computing},
title = {Fast Association Recovery in High Dimensions by Parallel Learning},
year = {2025},
doi = {10.1287/ijoc.2024.0691.cd},
url = {https://github.com/INFORMSJoC/2024.0691},
note = {Available for download at https://github.com/INFORMSJoC/2024.0691},
}
We use R language for the numerical simulations and real-world data analysis. To run this project, make sure you have the following R packages installed. You can install them using:
install.packages("devtools")
install.packages("glmnet")
install.packages("MASS")
install.packages("scalreg")
install.packages("rrpack")
install.packages("secure")
install.packages("Rcpp")
install.packages("RcppArmadillo")
install.packages("foreach")
install.packages("doParallel")
install.packages("egg")
install.packages("ggplot2")
Note: Besides the above packages, we also need Rtools installed to compile our R package
cospa
.
- folder
data
: an expression quantitative trait loci data for the real-world data analysis - folder
results
:- subfolder
figures
: including all figures of the paper - subfolder
raw
: including all raw results of simulations and real-world data analysis that can be converted to the figures and tables in our paper - subfolder
tables
: including all tables of the paper
- subfolder
- folder
scripts
: including all scripts for running the simulation and data analysis in this paper- subfolder
yeast
: including all scripts of the real-world data analysis
- subfolder
- folder
src
: the source codes of our R package named bycospa
Caution
We use parallel computing to speed up simulations. The number of cores is set by the variable cl.num
in each script, with a default value of 50. Please set the number of cores suitable for your computing platform.
To perform all experiments, please install our R package cospa
first:
devtools::install_github("INFORMSJoC/2024.0691/src")
- All numerical results can be generated by the "pipeline.R" in the scripts folder.
- All results are stored in the results folder.
- For each script, the variable
path
corresponds to the absolute path of the folder 2024.0691.- For details, we introduce the usage of each scripts in the scripts folder as follows.
(a) To generate all raw simulation results, run the following scripts from the scripts folder:
source("scripts/sim-table.R") # generate all table results source("scripts/sim-time.R") # generate the time comparison result source("scripts/sim-vary.R") # generate the boxplot results(b) After obtaining the raw simulation results, run the following scripts from the scripts folder to generate tables, line and box plots:
source("scripts/get-table.R") # generate latex files summarizing error tables source("scripts/get-line.R") # generate the time performance figures source("scripts/get-box.R") # generate the box plots
To obtain the real data result, run the following scripts from the subfolder "/yeast" in the scripts folder:
(a) First, run the script "data_clean.R" to obtain the screened data that translate "yeast.rda" into "/data/yeast_preprocess_data.RData".
source("scripts/yeast/data_clean.R") # clean the data and save the screened data as "yeast_preprocess_data.RData" into the data folder(b) After obtaining the screened data, run the script "/yeast/analysis.R" that estimate models by "yeast_preprocess_data.RData".
source("scripts/yeast/analysis.R") # compare the estimations of different methods, and save the result as table-real-screening.RData(c) Finally, to obtain the summary table in our paper, run the script "/yeast/get-table-real.R" that outputs a latex file including the summary table.
source("scripts/yeast/get-table-real.R")