You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Starting to move to oncotator MAF instead of VCF (#3145)
* Starting to move to oncotator MAF instead of VCF.
Additions to support MAF generation instead of VCF.
Correcting typo
Reducing requirements for running Oncotator
Removing infer ONPs
Adding TODO
Put back infer-onps
PR comments
* Simple doc change to induce another automated test run.
Copy file name to clipboardExpand all lines: scripts/mutect2_wdl/README.md
+27-5Lines changed: 27 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,6 +18,14 @@ This file has reasonable default parameters.
18
18
- "broadinstitute/gatk-protected:1.0.0.0-alpha1.2.4" (This is a private image! Recommended use ``gatk_jar`` as ``/root/gatk.jar``)
19
19
- "broadinstitute/genomes-in-the-cloud:2.2.4-1469632282" (You must specify a ``gatk4_jar_override``)
20
20
21
+
### Functional annotation (Oncotator)
22
+
23
+
The M2 WDL can optionally run oncotator for functional annotation and produce a TCGA MAF from the M2 VCF. *Oncotator is not a GATK4 tool and is provided in the M2 WDL as a convenience.* There are several notes and caveats
24
+
- Several parameters should be passed in to populate the TCGA MAF metadata fields. Default values are provided, though we recommend that you specify the values. These parameters are ignored if you do not run oncotator.
25
+
- Several fields in a TCGA MAF cannot be generated by M2 and oncotator, such as all fields relating to validation alleles. These will need to be populated by a downstream process created by the user.
26
+
- Oncotator does not enforce the TCGA MAF controlled vocabulary, since it is often too restrictive for general use. This is up to the user to specify correctly.
27
+
*Therefore, we cannot guarantee that a TCGA MAF generated here will pass the TCGA Validator*. If you are unsure about the ramifications of this statement, then it probably does not concern you.
28
+
- More information about Oncotator can be found at: http://archive.broadinstitute.org/cancer/cga/oncotator
21
29
22
30
### Parameter descriptions
23
31
@@ -44,13 +52,9 @@ Recommended default values (where possible) are found in ``mutect2_multi_sample_
44
52
-``Mutect2_Multi.gatk4_jar_override`` -- (optional) A GATK4 jar file to be used instead of the jar file in the docker image. (See ``Mutect2_Multi.gatk4_jar``) This can be very useful for developers. Please note that you need to be careful that the docker image you use is compatible with the GATK4 jar file given here -- no automated checks are made.
45
53
-``Mutect2_Multi.preemptible_attempts`` -- Number of times to attempt running a task on a preemptible VM. This is only used for cloud backends in cromwell and is ignored for local and SGE backends.
46
54
-``Mutect2_Multi.artifact_modes`` -- List of artifact modes to search for in the orientation bias filter. For example to filter the OxoG artifact, you would specify ``["G/T"]``. For both the FFPE artifact and the OxoG artifact, specify ``["G/T", "C/T"]``. If you do not wish to search for any artifacts, please set ``Mutect2_Multi.is_run_orientation_bias_filter`` to ``false``.
47
-
-``Mutect2_Multi.onco_ds_tar_gz`` -- (optional) A tar.gz file of the oncotator datasources -- often quite large (>15GB). This will be uncompressed as part of the oncotator task. Depending on backend used, this can be specified as a path on the local filesystem of a cloud storage container (e.g. gs://...). Typically the Oncotator default datasource can be downloaded at ``ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/oncotator/``. Do not put the FTP URL into the json file.
48
-
-``Mutect2_Multi.onco_ds_local_db_dir`` -- (optional) A direct path to the Oncotator datasource directory (uncompressed). While this is the fastest approach, it cannot be used with docker unless your docker image already has the datasources in it. For cromwell backends without docker, this can be a local filesystem path. *This cannot be a cloud storage location*
49
55
-``Mutect2_Multi.picard_jar`` -- A direct path to a picard jar for using ``CollectSequencingArtifactMetrics``. This parameter requirement will be eliminated in the future.
50
56
-``Mutect2_Multi.m2_extra_args`` -- (optional) a string of additional command line arguments of the form "-argument1 value1 -argument2 value2" for Mutect 2. Most users will not need this.
51
57
-``Mutect2_Multi.m2_extra_filtering_args`` -- (optional) a string of additional command line arguments of the form "-argument1 value1 -argument2 value2" for Mutect 2 filtering. Most users will not need this.
52
-
Note: If neither ``Mutect2_Multi.onco_ds_tar_gz`` nor ``Mutect2_Multi.onco_ds_local_db_dir`` are specified, the Oncotator task will download and uncompress for each execution.
53
-
54
58
-``Mutect2_Multi.pair_list`` -- a tab-separated table with no header in the following formats. For tumor-normal mode:
-``Mutect2_Multi.onco_ds_tar_gz`` -- (optional) A tar.gz file of the oncotator datasources -- often quite large (>15GB). This will be uncompressed as part of the oncotator task. Depending on backend used, this can be specified as a path on the local filesystem of a cloud storage container (e.g. gs://...). Typically the Oncotator default datasource can be downloaded at ``ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/oncotator/``. Do not put the FTP URL into the json file.
72
+
-``Mutect2_Multi.onco_ds_local_db_dir`` -- (optional) A direct path to the Oncotator datasource directory (uncompressed). While this is the fastest approach, it cannot be used with docker unless your docker image already has the datasources in it. For cromwell backends without docker, this can be a local filesystem path. *This cannot be a cloud storage location*
73
+
74
+
Note: If neither ``Mutect2_Multi.onco_ds_tar_gz``, nor ``Mutect2_Multi.onco_ds_local_db_dir``, is specified, the Oncotator task will download and uncompress for each execution.
75
+
76
+
The following three parameters are useful for rendering TCGA MAFs using oncotator. These parameters are ignored if ``is_run_oncotator`` is ``false``.
77
+
-``Mutect2_Multi.sequencing_center`` -- (optional) center reporting this variant. Please see ``https://wiki.nci.nih.gov/display/TCGA/Mutation+Annotation+Format+%28MAF%29+Specification+-+v2.4`` for more details.
78
+
-``Mutect2_Multi.sequence_source`` -- (optional) ``WGS`` or ``WXS`` for whole genome or whole exome sequencing, respectively. Please note that the controlled vocabulary of the TCGA MAF spec is *not* enforced. Please see ``https://wiki.nci.nih.gov/display/TCGA/Mutation+Annotation+Format+%28MAF%29+Specification+-+v2.4`` for more details.
79
+
-``Mutect2_Multi.default_config_file`` -- (optional) A configuration file that can direct oncotator to use default values for unspecified annotations in the TCGA MAF. This help prevents having MAF files with a lot of "__UNKNOWN__" values. An usable example is given below. Here is an example that should work for most users:
0 commit comments