Skip to content

Commit c43f93e

Browse files
authored
Starting to move to oncotator MAF instead of VCF (#3145)
* Starting to move to oncotator MAF instead of VCF. Additions to support MAF generation instead of VCF. Correcting typo Reducing requirements for running Oncotator Removing infer ONPs Adding TODO Put back infer-onps PR comments * Simple doc change to induce another automated test run.
1 parent 466fdae commit c43f93e

File tree

5 files changed

+64
-22
lines changed

5 files changed

+64
-22
lines changed

scripts/mutect2_wdl/README.md

Lines changed: 27 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,14 @@ This file has reasonable default parameters.
1818
- "broadinstitute/gatk-protected:1.0.0.0-alpha1.2.4" (This is a private image! Recommended use ``gatk_jar`` as ``/root/gatk.jar``)
1919
- "broadinstitute/genomes-in-the-cloud:2.2.4-1469632282" (You must specify a ``gatk4_jar_override``)
2020

21+
### Functional annotation (Oncotator)
22+
23+
The M2 WDL can optionally run oncotator for functional annotation and produce a TCGA MAF from the M2 VCF. *Oncotator is not a GATK4 tool and is provided in the M2 WDL as a convenience.* There are several notes and caveats
24+
- Several parameters should be passed in to populate the TCGA MAF metadata fields. Default values are provided, though we recommend that you specify the values. These parameters are ignored if you do not run oncotator.
25+
- Several fields in a TCGA MAF cannot be generated by M2 and oncotator, such as all fields relating to validation alleles. These will need to be populated by a downstream process created by the user.
26+
- Oncotator does not enforce the TCGA MAF controlled vocabulary, since it is often too restrictive for general use. This is up to the user to specify correctly.
27+
*Therefore, we cannot guarantee that a TCGA MAF generated here will pass the TCGA Validator*. If you are unsure about the ramifications of this statement, then it probably does not concern you.
28+
- More information about Oncotator can be found at: http://archive.broadinstitute.org/cancer/cga/oncotator
2129

2230
### Parameter descriptions
2331

@@ -44,13 +52,9 @@ Recommended default values (where possible) are found in ``mutect2_multi_sample_
4452
- ``Mutect2_Multi.gatk4_jar_override`` -- (optional) A GATK4 jar file to be used instead of the jar file in the docker image. (See ``Mutect2_Multi.gatk4_jar``) This can be very useful for developers. Please note that you need to be careful that the docker image you use is compatible with the GATK4 jar file given here -- no automated checks are made.
4553
- ``Mutect2_Multi.preemptible_attempts`` -- Number of times to attempt running a task on a preemptible VM. This is only used for cloud backends in cromwell and is ignored for local and SGE backends.
4654
- ``Mutect2_Multi.artifact_modes`` -- List of artifact modes to search for in the orientation bias filter. For example to filter the OxoG artifact, you would specify ``["G/T"]``. For both the FFPE artifact and the OxoG artifact, specify ``["G/T", "C/T"]``. If you do not wish to search for any artifacts, please set ``Mutect2_Multi.is_run_orientation_bias_filter`` to ``false``.
47-
- ``Mutect2_Multi.onco_ds_tar_gz`` -- (optional) A tar.gz file of the oncotator datasources -- often quite large (>15GB). This will be uncompressed as part of the oncotator task. Depending on backend used, this can be specified as a path on the local filesystem of a cloud storage container (e.g. gs://...). Typically the Oncotator default datasource can be downloaded at ``ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/oncotator/``. Do not put the FTP URL into the json file.
48-
- ``Mutect2_Multi.onco_ds_local_db_dir`` -- (optional) A direct path to the Oncotator datasource directory (uncompressed). While this is the fastest approach, it cannot be used with docker unless your docker image already has the datasources in it. For cromwell backends without docker, this can be a local filesystem path. *This cannot be a cloud storage location*
4955
- ``Mutect2_Multi.picard_jar`` -- A direct path to a picard jar for using ``CollectSequencingArtifactMetrics``. This parameter requirement will be eliminated in the future.
5056
- ``Mutect2_Multi.m2_extra_args`` -- (optional) a string of additional command line arguments of the form "-argument1 value1 -argument2 value2" for Mutect 2. Most users will not need this.
5157
- ``Mutect2_Multi.m2_extra_filtering_args`` -- (optional) a string of additional command line arguments of the form "-argument1 value1 -argument2 value2" for Mutect 2 filtering. Most users will not need this.
52-
Note: If neither ``Mutect2_Multi.onco_ds_tar_gz`` nor ``Mutect2_Multi.onco_ds_local_db_dir`` are specified, the Oncotator task will download and uncompress for each execution.
53-
5458
- ``Mutect2_Multi.pair_list`` -- a tab-separated table with no header in the following formats. For tumor-normal mode:
5559
```
5660
TUMOR_1_BAM</TAB>TUMOR_1_BAM_INDEX</TAB>TUMOR_1_SAMPLE</TAB>NORMAL_1_BAM</TAB>NORMAL_1_BAM_INDEX</TAB>NORMAL_1_SAMPLE</TAB>
@@ -64,6 +68,21 @@ TUMOR_2_BAM</TAB>TUMOR_2_BAM_INDEX</TAB>TUMOR_2_SAMPLE
6468
. . .
6569
```
6670

71+
- ``Mutect2_Multi.onco_ds_tar_gz`` -- (optional) A tar.gz file of the oncotator datasources -- often quite large (>15GB). This will be uncompressed as part of the oncotator task. Depending on backend used, this can be specified as a path on the local filesystem of a cloud storage container (e.g. gs://...). Typically the Oncotator default datasource can be downloaded at ``ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/oncotator/``. Do not put the FTP URL into the json file.
72+
- ``Mutect2_Multi.onco_ds_local_db_dir`` -- (optional) A direct path to the Oncotator datasource directory (uncompressed). While this is the fastest approach, it cannot be used with docker unless your docker image already has the datasources in it. For cromwell backends without docker, this can be a local filesystem path. *This cannot be a cloud storage location*
73+
74+
Note: If neither ``Mutect2_Multi.onco_ds_tar_gz``, nor ``Mutect2_Multi.onco_ds_local_db_dir``, is specified, the Oncotator task will download and uncompress for each execution.
75+
76+
The following three parameters are useful for rendering TCGA MAFs using oncotator. These parameters are ignored if ``is_run_oncotator`` is ``false``.
77+
- ``Mutect2_Multi.sequencing_center`` -- (optional) center reporting this variant. Please see ``https://wiki.nci.nih.gov/display/TCGA/Mutation+Annotation+Format+%28MAF%29+Specification+-+v2.4`` for more details.
78+
- ``Mutect2_Multi.sequence_source`` -- (optional) ``WGS`` or ``WXS`` for whole genome or whole exome sequencing, respectively. Please note that the controlled vocabulary of the TCGA MAF spec is *not* enforced. Please see ``https://wiki.nci.nih.gov/display/TCGA/Mutation+Annotation+Format+%28MAF%29+Specification+-+v2.4`` for more details.
79+
- ``Mutect2_Multi.default_config_file`` -- (optional) A configuration file that can direct oncotator to use default values for unspecified annotations in the TCGA MAF. This help prevents having MAF files with a lot of "__UNKNOWN__" values. An usable example is given below. Here is an example that should work for most users:
80+
81+
```
82+
[manual_annotations]
83+
override:NCBI_Build=37,Strand=+,status=Somatic,phase=Phase_I,sequencer=Illumina,Tumor_Validation_Allele1=,Tumor_Validation_Allele2=,Match_Norm_Validation_Allele1=,Match_Norm_Validation_Allele2=,Verification_Status=,Validation_Status=,Validation_Method=,Score=,BAM_file=,Match_Norm_Seq_Allele1=,Match_Norm_Seq_Allele2=
84+
```
85+
6786
#### mutect2 (single pair/sample)
6887

6988
Recommended default values (where possible) are found in ``mutect2_template.json``
@@ -98,6 +117,9 @@ Recommended default values (where possible) are found in ``mutect2_template.json
98117
- ``Mutect2.picard_jar`` -- Please see parameter description above in the mutect2_multi_sample.
99118
- ``Mutect2.m2_extra_args`` -- Please see parameter description above in the mutect2_multi_sample.
100119
- ``Mutect2.m2_extra_filtering_args`` -- Please see parameter description above in the mutect2_multi_sample.
120+
- ``Mutect2.sequencing_center`` -- Please see parameter description above in the mutect2_multi_sample.
121+
- ``Mutect2.sequence_source`` -- Please see parameter description above in the mutect2_multi_sample.
122+
- ``Mutect2.default_config_file`` -- Please see parameter description above in the mutect2_multi_sample.
101123

102124
#### mutect2-replicate-validation
103125

@@ -174,7 +196,7 @@ gs://broad-dsde-methods/takuto/na12878-crsp-ice/SM-612V3.bam gs://broad-dsde-
174196
"Mutect2_Multi.is_run_orientation_bias_filter": true,
175197
"Mutect2_Multi.is_run_oncotator": true,
176198
"Mutect2_Multi.m2_docker": "broadinstitute/gatk:1.0.0.0-alpha1.2.4",
177-
"Mutect2_Multi.oncotator_docker": "broadinstitute/oncotator:1.9.2.0",
199+
"Mutect2_Multi.oncotator_docker": "broadinstitute/oncotator:1.9.3.0",
178200
"Mutect2_Multi.preemptible_attempts": 2,
179201
"Mutect2_Multi.onco_ds_tar_gz": "/data/onco_dir/oncotator_v1_ds_April052016.tar.gz",
180202
"Mutect2_Multi.m2_extra_args": "--maxNumHaplotypesInPopulation 50 --tumor_lod_to_emit 4.0",

scripts/mutect2_wdl/mutect2.wdl

Lines changed: 25 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
# gnomad, gnomad_index: optional database of known germline variants, obtainable from http://gnomad.broadinstitute.org/downloads
1212
# variants_for_contamination, variants_for_contamination_index: vcf of common variants with allele frequencies fo calculating contamination
1313
# is_run_orientation_bias_filter: if true, run the orientation bias filter post-processing step
14-
# is_run_oncotator: if true, annotate the M2 VCFs using oncotator. Important: This requires a docker image and should
14+
# is_run_oncotator: if true, annotate the M2 VCFs using oncotator (to produce a TCGA MAF). Important: This requires a docker image and should
1515
# not be run in environments where docker is unavailable (e.g. SGE cluster on a Broad on-prem VM). Access to docker
1616
# hub is also required, since the task will download a public docker image.
1717
#
@@ -54,6 +54,9 @@ workflow Mutect2 {
5454
File picard_jar
5555
String? m2_extra_args
5656
String? m2_extra_filtering_args
57+
String? sequencing_center
58+
String? sequence_source
59+
File? default_config_file
5760

5861
call ProcessOptionalArguments {
5962
input:
@@ -157,7 +160,10 @@ workflow Mutect2 {
157160
preemptible_attempts = preemptible_attempts,
158161
oncotator_docker = oncotator_docker,
159162
onco_ds_tar_gz = onco_ds_tar_gz,
160-
onco_ds_local_db_dir = onco_ds_local_db_dir
163+
onco_ds_local_db_dir = onco_ds_local_db_dir,
164+
sequencing_center = sequencing_center,
165+
sequence_source = sequence_source,
166+
default_config_file = default_config_file
161167
}
162168
}
163169
@@ -167,8 +173,8 @@ workflow Mutect2 {
167173
File filtered_vcf = Filter.filtered_vcf
168174
File filtered_vcf_index = Filter.filtered_vcf_index
169175

170-
# select_first() fails if nothing resulve to non-null, so putting in "/dev/null" for now.
171-
File? oncotated_m2_vcf = select_first([oncotate_m2.oncotated_m2_vcf, "null"])
176+
# select_first() fails if nothing resolves to non-null, so putting in "null" for now.
177+
File? oncotated_m2_maf = select_first([oncotate_m2.oncotated_m2_maf, "null"])
172178
}
173179
}
174180

@@ -426,6 +432,10 @@ task oncotate_m2 {
426432
String oncotator_docker
427433
File? onco_ds_tar_gz
428434
String? onco_ds_local_db_dir
435+
String? oncotator_exe
436+
String? sequencing_center
437+
String? sequence_source
438+
File? default_config_file
429439
command {
430440
431441
# fail if *any* command below (not just the last) doesn't return 0, in particular if wget fails
@@ -439,8 +449,8 @@ task oncotate_m2 {
439449
440450
elif [[ "${onco_ds_tar_gz}" == *.tar.gz ]]; then
441451
echo "Using given tar file: ${onco_ds_tar_gz}"
442-
tar zxvf ${onco_ds_tar_gz}
443-
ln -s oncotator_v1_ds_April052016 onco_dbdir
452+
mkdir onco_dbdir
453+
tar zxvf ${onco_ds_tar_gz} -C onco_dbdir --strip-components 1
444454
445455
else
446456
echo "Downloading and installing oncotator datasources from Broad FTP site..."
@@ -450,20 +460,22 @@ task oncotate_m2 {
450460
ln -s oncotator_v1_ds_April052016 onco_dbdir
451461
fi
452462
453-
454-
/root/oncotator_venv/bin/oncotator --db-dir onco_dbdir/ -c $HOME/tx_exact_uniprot_matches.AKT1_CRLF2_FGFR1.txt \
455-
-v ${m2_vcf} ${entity_id}.oncotated.vcf hg19 -i VCF -o VCF --infer-onps --collapse-number-annotations --log_name oncotator.log
463+
${default="/root/oncotator_venv/bin/oncotator" oncotator_exe} --db-dir onco_dbdir/ -c $HOME/tx_exact_uniprot_matches.AKT1_CRLF2_FGFR1.txt \
464+
-v ${m2_vcf} ${entity_id}.maf.annotated hg19 -i VCF -o TCGAMAF --skip-no-alt --infer-onps --collapse-number-annotations --log_name oncotator.log \
465+
-a Center:${default="Unknown" sequencing_center} \
466+
-a source:${default="Unknown" sequence_source} \
467+
${"--default_config " + default_config_file}
456468
}
457469

458470
runtime {
459471
docker: "${oncotator_docker}"
460-
memory: "5 GB"
461-
bootDiskSizeGb: 10
462-
disks: "local-disk 150 SSD"
472+
memory: "3 GB"
473+
bootDiskSizeGb: 12
474+
disks: "local-disk 100 HDD"
463475
preemptible: "${preemptible_attempts}"
464476
}
465477

466478
output {
467-
File oncotated_m2_vcf="${entity_id}.oncotated.vcf"
479+
File oncotated_m2_maf="${entity_id}.maf.annotated"
468480
}
469481
}

scripts/mutect2_wdl/mutect2_multi_sample.wdl

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,9 @@ workflow Mutect2_Multi {
7676
File picard_jar
7777
String? m2_extra_args
7878
String? m2_extra_filtering_args
79+
String? sequencing_center
80+
String? sequence_source
81+
File? default_config_file
7982

8083
scatter( row in pairs ) {
8184
# If the condition is true, variables inside the 'if' block retain their values outside the block.
@@ -117,7 +120,10 @@ workflow Mutect2_Multi {
117120
artifact_modes = artifact_modes,
118121
picard_jar = picard_jar,
119122
m2_extra_args = m2_extra_args,
120-
m2_extra_filtering_args = m2_extra_filtering_args
123+
m2_extra_filtering_args = m2_extra_filtering_args,
124+
sequencing_center = sequencing_center,
125+
sequence_source = sequence_source,
126+
default_config_file = default_config_file
121127
}
122128
}
123129
@@ -142,5 +148,7 @@ workflow Mutect2_Multi {
142148
Array[File] unfiltered_vcf_files = Mutect2.unfiltered_vcf
143149
Array[File] filtered_vcf_files = Mutect2.filtered_vcf
144150
Array[File] filtered_vcf_index_files = Mutect2.filtered_vcf_index
151+
152+
Array[File?] oncotated_m2_mafs = Mutect2.oncotated_m2_maf
145153
}
146154
}

scripts/mutect2_wdl/mutect2_template.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020
"Mutect2.is_run_orientation_bias_filter": "$__is_run_orientation_bias_filter__",
2121
"Mutect2.is_run_oncotator": "$__is_run_oncotator__",
2222
"Mutect2.m2_docker": "broadinstitute/gatk-protected:1.0.0.0-alpha1.2.4",
23-
"Mutect2.oncotator_docker": "broadinstitute/oncotator:1.9.2.0",
23+
"Mutect2.oncotator_docker": "broadinstitute/oncotator:1.9.3.0",
2424
"Mutect2.preemptible_attempts": 2,
2525
"Mutect2.artifact_modes": ["G/T", "C/T"],
2626
"Mutect2.picard_jar": "$__picard_jar__"

src/test/java/org/broadinstitute/hellbender/tools/walkers/mutect/Mutect2IntegrationTest.java

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,8 +40,8 @@ public class Mutect2IntegrationTest extends CommandLineProgramTest {
4040
* DREAM challenge vcfs):
4141
*
4242
* Sample 1: pure monoclonal sample, SNVs only
43-
* Sample2: 80% pure monoclonal sample, SNVs only
44-
* Sample3: pure triclonal sample, subclone minor allele frequencies are 1/2, 1/3, and 1/5, SNVs and indels
43+
* Sample 2: 80% pure monoclonal sample, SNVs only
44+
* Sample 3: pure triclonal sample, subclone minor allele frequencies are 1/2, 1/3, and 1/5, SNVs and indels
4545
* Sample 4: 80% biclonal sample, subclone minor allele fractions are 50% and 35%, SNVs and indels
4646
*
4747
* @throws Exception

0 commit comments

Comments
 (0)