Enable DRAGEN-SV as a caller in cohort mode#803
Conversation
mwalker174
left a comment
There was a problem hiding this comment.
Thanks @kjaisingh this is very thorough and culmination of a lot of evaluation work. There are a couple of things I'll need to ask you to do below but they should be relatively easy. We should hold off on documenting this as a fully supported tool - there are some changes to genotype filtering that need to go in as well, as we will need to modify the GQRecalibrator to treat manta/dragen interchangeably.
| print("Took %f seconds to process" % delta) | ||
|
|
||
| # Filter out INS that are manta or melt only and are SR only, have GQ=0, and FILTER contains 'HIGH_SR_BACKGROUND' | ||
| # TODO: Do we also have to filter out DRAGEN-only records matching the Manta condition? |
There was a problem hiding this comment.
Yes we should add DRAGEN support here. @VJalili please take note that we are making dragen/manta calls equivalent for this condition.
There was a problem hiding this comment.
Updated accordingly.
|
|
||
| for batch in Phase1 Pilot; do | ||
| for source in delly lumpy manta wham depth; do | ||
| for source in delly dragen lumpy manta wham depth; do |
There was a problem hiding this comment.
Great that you made these changes but these scripts are no longer in use. Fine to keep the changes in.
There was a problem hiding this comment.
We should clean these out at some point
There was a problem hiding this comment.
Got it, thanks for letting me know.
| # Optional overrides | ||
| File? adjudicate_cutoffs | ||
| File? adjudicate_scores | ||
| File? adjudicate_rf_files |
There was a problem hiding this comment.
Can you elaborate on the comment so users understand what this is for? I think this will be useful in the future. Also please add it to the PR notes so that we can incorporate it in the next release notes.
There was a problem hiding this comment.
I did not add more documentation here as I did not originally intend to add this change to the PR, but given what you said, it seems like it's potentially useful.
With this in mind, I've updated the documentation and PR notes accordingly to now include this.
|
|
||
| # TODO: Do we also have to include Dragen? | ||
|
|
||
| workflow GATKSVPipelineBatch { |
There was a problem hiding this comment.
This workflow will be deprecated in the future, so let's not spend time on it. Same with the Phase1 wdl below.
There was a problem hiding this comment.
Updated accordingly.
| # Runs GatherSampleEvidence, EvidenceQC, GatherBatchEvidence, ClusterBatch, FilterBatch.MergePesrVcfs, GenotypeBatch, | ||
| # MakeCohortVcf (CombineBatches, ResolveComplexVariants, GenotypeComplexVariants, GenotypeComplexVariants), and AnnotateVcf | ||
|
|
||
| # TODO: Do we also have to include Dragen? |
There was a problem hiding this comment.
Let's leave that as a future PR. Make a note in the PR that we are only incorporating for joint calling for now (I just modified the title).
There was a problem hiding this comment.
Updated accordingly.
| if(defined(bincov_matrix_samples)) { | ||
| String bincov_matrix_header = read_lines(SetBins.bincov_matrix_header_file)[0] | ||
| } | ||
|
|
||
| Array[String]+ all_samples = flatten([samples, select_all([bincov_matrix_header])]) | ||
| Array[File]+ all_count_files = flatten([count_files, select_all([bincov_matrix])]) | ||
|
|
There was a problem hiding this comment.
Was this causing an error? Would be good to mention in PR notes.
There was a problem hiding this comment.
Sorry, I think this was just a one-time modification that spilled over into the PR - removing.
| File original_bam_or_cram_file | ||
| File original_bam_or_cram_index | ||
| File counts_file | ||
| # TODO: Do we also have to include Dragen? If so, we will need to update src/sv-pipeline/scripts/make_scramble_vcf.py too |
There was a problem hiding this comment.
Yes I overlooked this before but this is actually critical for filtering FP deletions called by Scramble. I don't think the changes are hard - just change the "manta" variables/inputs to something more general like "deletion" and rewire the GatherSampleEvidence to optionally take in dragen. If a dragen vcf is provided and manta is also run then dragen takes preference.
There was a problem hiding this comment.
Updated accordingly - thanks for the guidance here.
| workflow TinyResolve { | ||
| input { | ||
| Array[String] samples # Sample ID | ||
| # TODO: Do we also have to include Dragen calls? |
There was a problem hiding this comment.
Yes let's do this as well, again with dragen taking preference. Should just be a matter of WDL adjustments.
There was a problem hiding this comment.
Updated accordingly - haven't renamed the tasks within TinyResolve to not use Manta in their name, but let me know if this is preferred.
| maximizes the sensitivity of SV discovery by harmonizing output from six tools: | ||
| Dragen/Manta, Wham, Scramble, cn.MOPS, and GATK-gCNV. To minimize false positives, raw SVs |
There was a problem hiding this comment.
Let's not document this yet
There was a problem hiding this comment.
Updated accordingly.
| and annotates the calls from these tools to produce a final call set. | ||
|
|
||
| The SV calling tools, sometimes referred to as "PE/SR" tools, include: | ||
| - [DRAGEN-SV](https://help.dragen.illumina.com/product-guides/dragen-v4.3/dragen-dna-pipeline/sv-calling) |
There was a problem hiding this comment.
Move to end of list
| - [DRAGEN-SV](https://help.dragen.illumina.com/product-guides/dragen-v4.3/dragen-dna-pipeline/sv-calling) | |
| - [DRAGEN-SV](https://help.dragen.illumina.com/product-guides/dragen-v4.3/dragen-dna-pipeline/sv-calling) (not yet fully supported) |
There was a problem hiding this comment.
Updated accordingly.
Co-authored-by: Mark Walker <markw@broadinstitute.org>
@mwalker174 Thanks for all the feedback in this review - I believe I have made all requested changes, as well as some additional related ones. Not sure if I should plan re-run the pipeline end-to-end with these changes, (maybe just with a single trio?), let me know what you think. |
|
Also not sure if you suggest I run the pipeline end-to-end with these updated changes prior to merging (maybe on a subset of samples?), though there should not been any major functional changes since I ran it on the DRAGEN-SV cohort. |
|
Additional note - I've intentionally only updated the JSON templates in the As you suggested to not integrate DRAGEN-SV changes into this just yet, I did not update the JSON templates for it either. |
mwalker174
left a comment
There was a problem hiding this comment.
Looks good although there are a couple of merge conflicts you need to resolve. Just a reminder to revert the changes to the dockstore yaml as well.
Thanks for highlighting this - rebased and ran an additional EvidenceQc test with the changes, which have passed. |
Description
This PR is intended to integrate calls made by DRAGEN-SV into the GATK-SV pipeline for the joint calling mode. It introduces several changes relating to this, including but not limited to:
-Pparameter that can be used to add a padding window when generating depth plots withRdTestV2.R, invoking this inVisualizeCnvs.FilterBatchSites, which enables customized random forest training if desired.Testing
19-FilterGenotypesshow similar results to what we observed in our initial run of this pipeline during the formal evaluations.RdTestV2.Rscript by runningVisualizeCnvs, a supporting workflow in GATK-SV which invokes this script. For reference, this job shows the output of that same WDL but prior to any changes, whereas this job shows the output of the updated workflow albeit without including the padding parameter.womtool.Pre-Merge Changes Required
Remove automated syncing of WDLs to Dockstore.
Extensions & Follow-Up Work