Calculation of Transcribed Regions Within RDCs from GRO-seq data
This script is designed to calculate reads per kilobase per millions (RPKM) in RDC regions.
Calculates RPKM for genes overlapping with RDCs (considering gene strandness). For RDCs without any overlapping genes, the calculation is based on the number of reads within the regions, without considering strandness.
# bash
module load R/4.4.3-GCCcore-14.1.0
module load SAMtools/1.20-GCC-14.1.0
module load Subread/2.0.6-GCC-14.1.0
# R
install.packages("readr")
install.packages("dplyr")
install.packages("optparse")
-o / --work_dir: Working directory. Specifies the main working directory where output files will be saved or accessed.
-r / --genome: Genome reference file (Such as hg38_refGene.bed)
-b / --bam_files: BAM files folder. Path to the directory containing the input BAM files used for read mapping and quantification.
-i / --rdc_input: RDC BED files folder (type: character). Path to the directory containing the input BED files for RDC regions.RDC RPKM: For RDCs that overlap with genes, the file names start with “Gene_“. For RDCs that do not overlap with genes, the file names start with “RDC_“.
Rscript 0_RDC_rpkm_V2.R -o ~/u2os_res/ -r hg38_refGene.bed -b ~/gro-seq-u2os-hg38/alignment/ -i ~/u2os/