Skip to content

Commit 933f7d1

Browse files
authored
Version 3.2 (#30)
- Filter reads on rq (>=0.99), if rq is present in input bam - Add a `targeted` option for targeted data to drop the assumption of uniform coverage across the genome - Add two optional parameters for targeted data: `min-read-variant` and `min-read-haplotype` - Update coordinates of some target regions to include full genes whenever possible - Add TNXB as a region on its own so that the full gene can be genotyped - Improve fusion calling in cases of homozygous deletion - Add some homozygous sites to cover target regions evenly during phasing to improve read assignment to haplotypes and variant calling - Update a few gene-specific callers: hba, smn1, ikbkg, ncf1, rccx and pms2 - Support cram as input - Standardize haplotype naming across regions
1 parent f4630d2 commit 933f7d1

23 files changed

Lines changed: 1027 additions & 428 deletions

README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -86,6 +86,10 @@ Optional parameters:
8686
- `--genome`: Genome reference build. Default is `38`. If `37` or `19` is specified, Paraphase will run the analysis for GRCh37 or hg19, respectively (note that only 11 medically relevant [regions](docs/regions.md) are supported now for GRCh37/hg19).
8787
- `--gene1only`: If specified, variants calls will be made against the main gene only for SMN1, PMS2, STRC, NCF1 and IKBKG, see more information [here](docs/vcf.md).
8888
- `--novcf`: If specified, no VCF files will be produced.
89+
- `--write-nocalls-in-vcf`: If specified, Paraphase will write no-call sites in the VCFs, marked with LowQual filter.
90+
- `--targeted`: If specified, paraphase will not assume depth is uniform across the genome. See more information on running targeted data [here](docs/targeted_data.md).
91+
- `--min-read-variant`: Partially controls the number of supporting reads for a variant to be used for phasing. The cutoff for variant-supporting reads is determined by min(this number, max(5, depth\*0.11)). Default is 20 (at standard WGS depth, it is overwritten by max(5, depth*0.11)).
92+
- `--min-read-haplotype`: Minimum number of unique supporting reads for a haplotype. Default is 4.
8993
- `--samtools`: path to samtools. If the paths to samtools or minimap2 are not already in the PATH environment variable, they can be provided through the `--samtools` and `--minimap2` parameters.
9094
- `--minimap2`: path to minimap2
9195

docs/targeted_data.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# Running Paraphase on targeted data
2+
3+
Paraphase can work with targeted data, such as:
4+
- Hybrid capture based enrichment data
5+
- CRISPR-Cas9 targeted data
6+
- Amplicon data
7+
8+
The config file may need to be modified based on the design of the target panel. Please reach out to Xiao Chen (xchen@pacificbiosciences.com) if you need assistance.
9+
10+
Paraphase provides a few options for users to better work with targeted data:
11+
1) Use the `--targeted` option to drop the assumption of uniform coverage across the genome.
12+
2) Additionally there are two optional parameters one can tune for targeted data:
13+
- `--min-read-variant`: Partially controls the number of supporting reads for a variant for identifying variants used for phasing. The cutoff for variant-supporting reads is determined by min(this number, max(5, depth\*0.11)). Default is 20. At standard WGS depth, the default value is overwritten by max(5, depth*0.11). For targeted data with high coverage, set this number relatively high to avoid picking up sequencing errors and to reduce run time. For example, if you expect your per-haplotype depth is 200, you can set `--min-read-variant` to 40 or even higher.
14+
- `--min-read-haplotype`: Minimum number of unique supporting reads for a haplotype. Default is 4. For targeted data with high coverage, this cutoff can be increased to reduce errors and to reduce run time.

paraphase/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
__version__ = "3.1.2"
1+
__version__ = "3.2.0"

paraphase/data/19/config.yaml

Lines changed: 15 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -29,15 +29,15 @@ smn1:
2929
pms2:
3030
genes: PMS2
3131
check_nm: 0.1
32-
realign_region: chr7:6004631-6049631
33-
extract_regions: chr7:6004631-6033041 chr7:6773174-6799427
32+
use_r2k: True
33+
realign_region: chr7:6004631-6049131
34+
extract_regions: chr7:6004631-6049131 chr7:6773174-6799427
3435
gene2_region: chr7:6773174-6799427
35-
right_boundary: 6033081
3636
pivot_site: 6026200
37-
noisy_region: [[6020511, 6020611], [6019294, 6019300], [6015581, 6015711], [6028131, 6031981], [6032411, 6033031]]
37+
noisy_region: [[6020511, 6020611], [6019294, 6019300], [6015581, 6015711], [6036207, 6036263], [6042921, 6042979], [6048361, 6048369]]
3838
is_reverse: True
3939
add_sites: ["6026200_G_A"]
40-
clip_3p_positions: [6033041]
40+
clip_3p_positions: [6028720]
4141
rccx:
4242
genes: CYP21A2,C4A,C4B,TNXB
4343
is_tandem: True
@@ -95,15 +95,14 @@ cfc1:
9595
ikbkg:
9696
genes: IKBKG
9797
use_supplementary: True
98-
realign_region: chrX:153783815-153803263
99-
extract_regions: chrX:153784098-153800231 chrX:153860696-153876829
98+
realign_region: chrX:153775825-153800231
99+
extract_regions: chrX:153775825-153800231 chrX:153860696-153876829
100100
gene2_region: chrX:153860252-153877274
101101
clip_3p_positions: [153797929]
102-
clip_5p_positions: [153785351]
102+
clip_5p_positions: [153783719, 153785351]
103103
add_sites: ["153784097_C_G", "153798031_T_G"]
104104
pivot_site: 153784097
105-
left_boundary: 153783915
106-
right_boundary: 153797929
105+
right_boundary: 153798131
107106
deletion1_size: 11700
108107
deletion1_name: "153786229_del_11700"
109108
deletion1_in_gene1: "153786229_DEL_153797929"
@@ -112,8 +111,9 @@ ikbkg:
112111
del1_3p_pos2: 153786245
113112
del1_5p_pos1: 153797031
114113
del1_5p_pos2: 153797071
115-
noisy_region: [[153795215, 153795402], [153784084, 153784091], [153793571, 153793618]]
114+
noisy_region: [[153795215, 153795402], [153783765, 153783965], [153784084, 153784091], [153793571, 153793618]]
116115
is_reverse: True
116+
check_nm: 0.08
117117
ncf1:
118118
genes: NCF1
119119
check_nm: 0.1
@@ -159,10 +159,11 @@ opn1lw:
159159
exon3_variants: [[["153418460_C_A"], "M", "L"], [["153418514_G_A", "153418516_G_T"], "I", "V"], [["153418524_C_T"], "V", "A"], [["153418535_A_G"], "V", "I"], [["153418541_T_G"], "A", "S"]]
160160
hba:
161161
genes: HBA1,HBA2
162-
realign_region: chr16:221799-223899
162+
realign_region: chr16:221799-225499
163163
extract_regions: chr16:218655-227540
164-
clip_5p_positions: [221988]
165-
clip_3p_positions: [223728]
164+
check_nm: 0.25
165+
clip_5p_positions: [221988, 222889]
166+
clip_3p_positions: [223728, 225045]
166167
is_tandem: True
167168
depth_region: [[209999, 239999]]
168169
noisy_region: [[222416, 222417]]

0 commit comments

Comments
 (0)