The repository contains code used for the identification of regulatory element modules (REMO) for the human hg38 reference genome.
For code used in the analysis of REMO modules presented in the manuscript, please see this repository: https://github.com/stuart-lab/remo-manuscript
The complete list of REMO modules and associated metadata can be found in the R package: https://github.com/stuart-lab/REMO.v1.GRCh38
# all steps to build the REMO modules
# first install the peakcluster R package in this repo containing functions used in REMO identification
Rscript -e "devtools::install('peakcluster')"
# 1. Build the combined set of CREs
## download ENCODE SCREEN v3
wget -O ENCODE/GRCh38-cCREs.bed https://downloads.wenglab.org/V3/GRCh38-cCREs.bed
## download cPeaks
wget -O cPeaks/cpeak_info.csv.gz https://cloud.tsinghua.edu.cn/f/6460b32917224d32aef1/?dl=1
## combine regions --> generates combined_cre.bed
Rscript code/combine_cre.R
# 2. Filter the ENCODE experiment metadata to get the list of experiments to include
## generates ENCODE/metadata_filtered.tsv
Rscript code/filter_encode_metadata.R
# 3. Download ENCODE data and build a matrix of experiment x CRE
sh code/quantify_encode.sh
python code/aggregate_encode.py ENCODE/data/combined/ ENCODE/data/combined/matrix/
python code/split_chr.py \
--input_file ENCODE/data/combined/matrix/encode.csv \
--input_colnames ENCODE/data/combined/matrix/encode_colnames.txt \
--output_dir ENCODE/data/combined/chr/ \
--regions combined_cre.tsv \
--ncol 1520441
# 4. Download the pseudobulk scATAC-seq data and build matrix of cluster x CRE
cd scATAC_atlas
snakemake
cd -
# 5. Download the ENCODE HiC datasets
mkdir -p hi-c/data
wget -i hi-c/encode_url.txt -P hi-c/data/
# 6. Run CRE clustering for each chromosome
chroms=(chr{1..22} chrX)
for c in "${chroms[@]}"; do
Rscript cluster_chromosome.R $c
done
# 7. Combine results from each chromosome into a single set of REMO modules
sh code/combined_chromosomes.sh
# 8. Rename modules with a unique ID
Rscript code/rename_modules.R remo/remo_all_chr.tsv remo/REMOv1_GRCh38.bed
# 9. bgzip compress
bgzip remo/REMOv1_GRCh38.bed
# 10. annotate modules with cell type information
cd annotate
snakemake