Skip to content

Commit 64aa0b0

Browse files
committed
add inputs
1 parent fd08f4c commit 64aa0b0

File tree

4 files changed

+6
-12
lines changed

4 files changed

+6
-12
lines changed

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,8 @@ Contents:
1313
* `prepare.sh` - step 1 - prepare, clean, and bin the data
1414
* `summarize.R` - step 2 - combine and summarize binned data
1515
* Inputs:
16-
* `.bed.gz`
17-
* `.bed.gz`
16+
* `sv.1kg.bed.gz` - 1000 Genomes Project breakpoints
17+
* `sv.giab.bed.gz` - Genome in a Bottle breakpoints
1818

1919
---
2020

prepare.sh

Lines changed: 4 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -23,12 +23,6 @@ wget ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.chrom.sizes
2323
wget ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeMapability/wgEncodeCrgMapabilityAlign50mer.bigWig
2424
wget ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeMapability/wgEncodeCrgMapabilityAlign100mer.bigWig
2525

26-
# add chr to GIAB_H002 BED
27-
cat GIAB_H002.bed | awk -F $'\t' 'BEGIN {OFS=FS} {print "chr"$1,$2,$2,$3}' > GIAB_H002.chr.bed
28-
29-
# add chr to dbvar BED
30-
cat dbvar_estd219.uniq.bed | awk -F $'\t' 'BEGIN {OFS=FS} {print "chr"$1,$2,$2,$3}' > dbvar_estd219.uniq.chr.bed
31-
3226
# download GRC issues (GRCh37.p13_issues.gff3)
3327
wget ftp://ftp.ncbi.nlm.nih.gov/pub/grc/human/GRC/Issue_Mapping/GRCh37.p13_issues.gff3
3428

@@ -93,12 +87,12 @@ echo -e "#BIN\tENCODE_DAC_blacklisted" > summary.ENCODE_DAC_blacklisted.${bin_si
9387
bedtools coverage -a "$bin_bed" -b ENCODE_DAC_blacklisted.bed | cut -f 4,8 >> summary.ENCODE_DAC_blacklisted.${bin_size}.txt
9488

9589
# GIAB break points
96-
echo -e "#BIN\tevents_GIAB" > summary.GIAB_H002.${bin_size}.txt
97-
bedtools coverage -a "$bin_bed" -b GIAB_H002.chr.bed | cut -f 4,5 >> summary.GIAB_H002.${bin_size}.txt
90+
echo -e "#BIN\tevents_GIAB" > summary.GIAB.${bin_size}.txt
91+
bedtools coverage -a "$bin_bed" -b sv.giab.bed | cut -f 4,5 >> summary.GIAB.${bin_size}.txt
9892

9993
# 1KG break points
100-
echo -e "#BIN\tevents_1KG" > summary.dbvar_estd219.${bin_size}.txt
101-
bedtools coverage -a "$bin_bed" -b dbvar_estd219.uniq.chr.bed | cut -f 4,5 >> summary.dbvar_estd219.${bin_size}.txt
94+
echo -e "#BIN\tevents_1KG" > summary.1KG.${bin_size}.txt
95+
bedtools coverage -a "$bin_bed" -b sv.1kg.bed | cut -f 4,5 >> summary.1KG.${bin_size}.txt
10296

10397
# average mappability per bin (using bigWigAverageOverBed from UCSC)
10498
bigWigAverageOverBed wgEncodeCrgMapabilityAlign50mer.bigWig $bin_bed wgEncodeCrgMapabilityAlign50mer.${bin_size}.txt

sv.1kg.bed.gz

718 KB
Binary file not shown.

sv.giab.bed.gz

1.67 MB
Binary file not shown.

0 commit comments

Comments
 (0)