Noncoding variants and sulcal patterns in congenital heart disease: Machine learning to predict functional impact
This repository contains the code used in the study: Mondragon-Estrada et al., Noncoding variants and sulcal patterns in congenital heart disease: Machine learning to predict functional impact, iScience (2025), https://doi.org/10.1016/j.isci.2024.111707.
The main analyses were:
- Variant score prediction using deep learning models Basenji2 and Enformer
- Weighted correlation network analysis (WGCNA)
- Gene set enrichment analysis (clusterProfiler)
Deep learning predictions were carried out in GNU/Linux workstations with 128 GB of RAM and NVIDIA GPUs.
Scripts were executed on GNU/Linux Ubuntu 20.04.6 LTS (Focal Fossa)
. Deep learning models are implemented in Python3
and their corresponding usage and requirements can be seen in their repositories. WGCNA and GO enrichment analysis were performed in R 4.1.2
.
Basenji2's predictions were obtained using the script predict_scores_b.sh. Instructions for installing Basenji2 and creating a conda environment are available in the original Basenji2 GitHub repository.
Enformer's predictions were obtained using the script predict_scores_e.py. Instructions for installing Basenji2 and creating a virtual environment are available in the original Enformer GitHub repository.
For processing and statistical steps performed in Python3, we recommed creating a third virtual environment with this requirements.txt.
$ python3 -m venv venv
$ source venv/bin/activate
$ pip install -r requirements.txt
For analyses done in R, the following packages are required:
-
Bioconductor packages:
- WGCNA
- clusterProfiler
- biomaRt
- org.Hs.eg.db
- GenomicRanges
- ChIPpeakAnno
- TxDb.Hsapiens.UCSC.hg38.knownGene
-
CRAN packages:
- ggplot2
- dplyr
- RColorBrewer
- ppcor
They all can be installed using the following commands in R:
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("WGCNA")
BiocManager::install("clusterProfiler")
BiocManager::install("biomaRt")
BiocManager::install("org.Hs.eg.db")
BiocManager::install("GenomicRanges")
BiocManager::install("ChIPpeakAnno")
BiocManager::install("TxDb.Hsapiens.UCSC.hg38.knownGene")
# For installing ggplot and dplyr, it is easier to install the whole tidyverse:
install.packages("tidyverse")
# Alternatively, each package can be installed individually with the following commands:
install.packages("ggplot2")
install.packages("dplyr")
install.packages("RColorBrewer")
install.packages("ppcor")
@article{MONDRAGONESTRADA2025111707,
author = {Enrique Mondragon-Estrada and Jane W. Newburger and Steven R. DePalma and Martina Brueckner and John Cleveland and Wendy K. Chung and Bruce D. Gelb and Elizabeth Goldmuntz and Donald J. Hagler and Hao Huang and Patrick McQuillen and Thomas A. Miller and Ashok Panigrahy and George A. Porter and Amy E. Roberts and Caitlin K. Rollins and Mark W. Russell and Martin Tristani-Firouzi and P. Ellen Grant and Kiho Im and Sarah U. Morton},
title = {Noncoding variants and sulcal patterns in congenital heart disease: Machine learning to predict functional impact},
journal = {iScience},
volume = {28},
number = {2},
pages = {111707},
year = {2025},
issn = {2589-0042},
doi = {https://doi.org/10.1016/j.isci.2024.111707},
url = {https://www.sciencedirect.com/science/article/pii/S2589004224029341}
}