Fast Association Recovery in High Dimensions by Parallel Learning

This archive is distributed in association with the INFORMS Journal on Computing under the MIT License.

The software and data in this repository are a snapshot of the software and data that were used in the research reported in the paper Fast Association Recovery in High Dimensions by Parallel Learning by Ruipeng Dong, and Canhong Wen.

Important: This code is only be used to reproduce the results of the paper, and we will not maintain this project continuously. If you find any bugs, please email Ruipeng Dong ([email protected] or [email protected]).

Cite

To cite the contents of this repository, please cite both the paper and this repo, using their respective DOIs.

https://doi.org/10.1287/ijoc.2024.0691

https://doi.org/10.1287/ijoc.2024.0691.cd

Below is the BibTex for citing this snapshot of the repository.

@misc{cospa2025,
  author    = {Ruipeng Dong and Canhong Wen},
  publisher = {INFORMS Journal on Computing},
  title     = {Fast Association Recovery in High Dimensions by Parallel Learning},
  year      = {2025},
  doi       = {10.1287/ijoc.2024.0691.cd},
  url       = {https://github.com/INFORMSJoC/2024.0691},
  note      = {Available for download at https://github.com/INFORMSJoC/2024.0691},
}

Required Packages and Tools

We use R language for the numerical simulations and real-world data analysis. To run this project, make sure you have the following R packages installed. You can install them using:

install.packages("devtools")
install.packages("glmnet")
install.packages("MASS")
install.packages("scalreg")
install.packages("rrpack")
install.packages("secure")
install.packages("Rcpp")
install.packages("RcppArmadillo")
install.packages("foreach")
install.packages("doParallel")
install.packages("egg")
install.packages("ggplot2")

Note: Besides the above packages, we also need Rtools installed to compile our R package cospa.

Folders Organization

folder data: an expression quantitative trait loci data for the real-world data analysis
folder results:
- subfolder figures: including all figures of the paper
- subfolder raw: including all raw results of simulations and real-world data analysis that can be converted to the figures and tables in our paper
- subfolder tables: including all tables of the paper
folder scripts: including all scripts for running the simulation and data analysis in this paper
- subfolder yeast: including all scripts of the real-world data analysis
folder src: the source codes of our R package named by cospa

Running the Project

Caution

We use parallel computing to speed up simulations. The number of cores is set by the variable cl.num in each script, with a default value of 50. Please set the number of cores suitable for your computing platform.

Step 1: Installing R package

To perform all experiments, please install our R package cospa first:

devtools::install_github("INFORMSJoC/2024.0691/src")

Step 2: Running Scripts

All numerical results can be generated by the "pipeline.R" in the scripts folder.

All results are stored in the results folder.

For each script, the variable path corresponds to the absolute path of the folder 2024.0691.

For details, we introduce the usage of each scripts in the scripts folder as follows.

(1) Numerical simulations

(a) To generate all raw simulation results, run the following scripts from the scripts folder:
source("scripts/sim-table.R") # generate all table results
source("scripts/sim-time.R") # generate the time comparison result
source("scripts/sim-vary.R") # generate the boxplot results
(b) After obtaining the raw simulation results, run the following scripts from the scripts folder to generate tables, line and box plots:
source("scripts/get-table.R") # generate latex files summarizing error tables 
source("scripts/get-line.R") # generate the time performance figures
source("scripts/get-box.R") # generate the box plots

(2) Real data analysis

To obtain the real data result, run the following scripts from the subfolder "/yeast" in the scripts folder:

(a) First, run the script "data_clean.R" to obtain the screened data that translate "yeast.rda" into "/data/yeast_preprocess_data.RData".
source("scripts/yeast/data_clean.R") # clean the data and save the screened data as "yeast_preprocess_data.RData" into the data folder
(b) After obtaining the screened data, run the script "/yeast/analysis.R" that estimate models by "yeast_preprocess_data.RData".
source("scripts/yeast/analysis.R") # compare the estimations of different methods, and save the result as table-real-screening.RData
(c) Finally, to obtain the summary table in our paper, run the script "/yeast/get-table-real.R" that outputs a latex file including the summary table.
source("scripts/yeast/get-table-real.R")

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
data		data
results		results
scripts		scripts
src		src
.gitignore		.gitignore
AUTHORS		AUTHORS
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Fast Association Recovery in High Dimensions by Parallel Learning

Cite

Required Packages and Tools

Folders Organization

Running the Project

Step 1: Installing R package

Step 2: Running Scripts

(1) Numerical simulations

(2) Real data analysis

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

INFORMSJoC/2024.0691

Folders and files

Latest commit

History

Repository files navigation

Fast Association Recovery in High Dimensions by Parallel Learning

Cite

Required Packages and Tools

Folders Organization

Running the Project

Step 1: Installing R package

Step 2: Running Scripts

(1) Numerical simulations

(2) Real data analysis

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages