Enable DRAGEN-SV as a caller in cohort mode by kjaisingh · Pull Request #803 · broadinstitute/gatk-sv

kjaisingh · 2025-04-25T18:31:54Z

Description

This PR is intended to integrate calls made by DRAGEN-SV into the GATK-SV pipeline for the joint calling mode. It introduces several changes relating to this, including but not limited to:

Updates all GATK-SV workflows to allow for DRAGEN-SV calls to be passed through the pipeline.
Introduces a DRAGEN-SV standardizer that processes raw DRAGEN-SV VCFs.
Adds DRAGEN-SV parameters to WDLs that process and/or analyze caller-specific VCFs.
Provides a -P parameter that can be used to add a padding window when generating depth plots with RdTestV2.R, invoking this in VisualizeCnvs.
Enables manually bypassing the cutoffs and scores generation in FilterBatchSites, which enables customized random forest training if desired.

Testing

This Terra workspace shows an example run of the entire pipeline prior to this change, using Manta instead of DRAGEN-SV.
This Terra workspace shows an example run of the entire pipeline with this change, which used an updated docker image across all workflows run. The output plots in 19-FilterGenotypes show similar results to what we observed in our initial run of this pipeline during the formal evaluations.
This job shows an example run of the updated RdTestV2.R script by running VisualizeCnvs, a supporting workflow in GATK-SV which invokes this script. For reference, this job shows the output of that same WDL but prior to any changes, whereas this job shows the output of the updated workflow albeit without including the padding parameter.
Validated all WDLs with womtool.

Pre-Merge Changes Required

Remove automated syncing of WDLs to Dockstore.

Extensions & Follow-Up Work

Integrate DRAGEN-SV into the single-sample pipeline.
Benchmark and integrate DRAGEN-CNV into both the cohort and single-sample pipelines.

mwalker174

Thanks @kjaisingh this is very thorough and culmination of a lot of evaluation work. There are a couple of things I'll need to ask you to do below but they should be relatively easy. We should hold off on documenting this as a fully supported tool - there are some changes to genotype filtering that need to go in as well, as we will need to modify the GQRecalibrator to treat manta/dragen interchangeably.

mwalker174 · 2025-04-28T16:48:29Z

    print("Took %f seconds to process" % delta)

    # Filter out INS that are manta or melt only and are SR only, have GQ=0, and FILTER contains 'HIGH_SR_BACKGROUND'
+    # TODO: Do we also have to filter out DRAGEN-only records matching the Manta condition?


Yes we should add DRAGEN support here. @VJalili please take note that we are making dragen/manta calls equivalent for this condition.

Updated accordingly.

mwalker174 · 2025-04-28T16:49:03Z


 for batch in Phase1 Pilot; do
-  for source in delly lumpy manta wham depth; do
+  for source in delly dragen lumpy manta wham depth; do


Great that you made these changes but these scripts are no longer in use. Fine to keep the changes in.

We should clean these out at some point

Got it, thanks for letting me know.

mwalker174 · 2025-04-28T16:57:29Z

+    # Optional overrides
+    File? adjudicate_cutoffs
+    File? adjudicate_scores
+    File? adjudicate_rf_files


Can you elaborate on the comment so users understand what this is for? I think this will be useful in the future. Also please add it to the PR notes so that we can incorporate it in the next release notes.

I did not add more documentation here as I did not originally intend to add this change to the PR, but given what you said, it seems like it's potentially useful.

With this in mind, I've updated the documentation and PR notes accordingly to now include this.

mwalker174 · 2025-04-28T17:04:00Z


+# TODO: Do we also have to include Dragen?
+
 workflow GATKSVPipelineBatch {


This workflow will be deprecated in the future, so let's not spend time on it. Same with the Phase1 wdl below.

Updated accordingly.

mwalker174 · 2025-04-28T17:05:29Z

 # Runs GatherSampleEvidence, EvidenceQC, GatherBatchEvidence, ClusterBatch, FilterBatch.MergePesrVcfs, GenotypeBatch, 
 # MakeCohortVcf (CombineBatches, ResolveComplexVariants, GenotypeComplexVariants, GenotypeComplexVariants), and AnnotateVcf

+# TODO: Do we also have to include Dragen?


Let's leave that as a future PR. Make a note in the PR that we are only incorporating for joint calling for now (I just modified the title).

Updated accordingly.

mwalker174 · 2025-04-28T17:07:16Z

+  if(defined(bincov_matrix_samples)) {
+    String bincov_matrix_header = read_lines(SetBins.bincov_matrix_header_file)[0]
+  }
+
+  Array[String]+ all_samples = flatten([samples, select_all([bincov_matrix_header])])
+  Array[File]+ all_count_files = flatten([count_files, select_all([bincov_matrix])])
+


Was this causing an error? Would be good to mention in PR notes.

Sorry, I think this was just a one-time modification that spilled over into the PR - removing.

mwalker174 · 2025-04-28T17:10:44Z

    File original_bam_or_cram_file
    File original_bam_or_cram_index
    File counts_file
+    # TODO: Do we also have to include Dragen? If so, we will need to update src/sv-pipeline/scripts/make_scramble_vcf.py too


Yes I overlooked this before but this is actually critical for filtering FP deletions called by Scramble. I don't think the changes are hard - just change the "manta" variables/inputs to something more general like "deletion" and rewire the GatherSampleEvidence to optionally take in dragen. If a dragen vcf is provided and manta is also run then dragen takes preference.

Updated accordingly - thanks for the guidance here.

mwalker174 · 2025-04-28T17:11:36Z

 workflow TinyResolve {
  input {
    Array[String] samples         # Sample ID
+    # TODO: Do we also have to include Dragen calls?


Yes let's do this as well, again with dragen taking preference. Should just be a matter of WDL adjustments.

Updated accordingly - haven't renamed the tasks within TinyResolve to not use Manta in their name, but let me know if this is preferred.

mwalker174 · 2025-04-28T17:23:13Z

+maximizes the sensitivity of SV discovery by harmonizing output from six tools: 
+Dragen/Manta, Wham, Scramble, cn.MOPS, and GATK-gCNV. To minimize false positives, raw SVs 


Let's not document this yet

Updated accordingly.

mwalker174 · 2025-04-28T17:23:50Z

 and annotates the calls from these tools to produce a final call set.

 The SV calling tools, sometimes referred to as "PE/SR" tools, include:
+- [DRAGEN-SV](https://help.dragen.illumina.com/product-guides/dragen-v4.3/dragen-dna-pipeline/sv-calling)


Move to end of list

Suggested change

- [DRAGEN-SV](https://help.dragen.illumina.com/product-guides/dragen-v4.3/dragen-dna-pipeline/sv-calling)

- [DRAGEN-SV](https://help.dragen.illumina.com/product-guides/dragen-v4.3/dragen-dna-pipeline/sv-calling) (not yet fully supported)

Updated accordingly.

Co-authored-by: Mark Walker <markw@broadinstitute.org>

kjaisingh · 2025-04-29T19:39:55Z

Thanks @kjaisingh this is very thorough and culmination of a lot of evaluation work. There are a couple of things I'll need to ask you to do below but they should be relatively easy. We should hold off on documenting this as a fully supported tool - there are some changes to genotype filtering that need to go in as well, as we will need to modify the GQRecalibrator to treat manta/dragen interchangeably.

@mwalker174 Thanks for all the feedback in this review - I believe I have made all requested changes, as well as some additional related ones. Not sure if I should plan re-run the pipeline end-to-end with these changes, (maybe just with a single trio?), let me know what you think.

kjaisingh · 2025-05-01T15:11:29Z

Also not sure if you suggest I run the pipeline end-to-end with these updated changes prior to merging (maybe on a subset of samples?), though there should not been any major functional changes since I ran it on the DRAGEN-SV cohort.

kjaisingh · 2025-05-09T00:39:21Z

Additional note - I've intentionally only updated the JSON templates in the terra directory, as my understanding is that the JSON templates in the test directory are intended to be used in tandem with the GATKSVPipelineBatch.wdl workflow. I deduced this after noticing that some of the named inputs in these test JSON templates were different from that of the terra directory, and corresponded to outputs as named in the GATKSVPipelineBatch.wdl workflow - not sure if this is 100% accurate though.

As you suggested to not integrate DRAGEN-SV changes into this just yet, I did not update the JSON templates for it either.

mwalker174

Looks good although there are a couple of merge conflicts you need to resolve. Just a reminder to revert the changes to the dockstore yaml as well.

kjaisingh · 2025-08-09T21:55:11Z

Looks good although there are a couple of merge conflicts you need to resolve. Just a reminder to revert the changes to the dockstore yaml as well.

Thanks for highlighting this - rebased and ran an additional EvidenceQc test with the changes, which have passed.

Initial commit

67e829e

kjaisingh added the enhancement New feature or request label Apr 25, 2025

kjaisingh self-assigned this Apr 25, 2025

kjaisingh added 3 commits April 25, 2025 14:43

Modified dockstore sync to use correct branch name

0cd15dc

Added dragen-sv params to cohort mode json templates

5e1777d

Resolved linting errors

236dab6

kjaisingh requested a review from mwalker174 April 25, 2025 19:44

kjaisingh marked this pull request as ready for review April 25, 2025 19:44

kjaisingh added 2 commits April 25, 2025 16:06

Modified name of json param to pass womtool evaluation test

b2d40f3

Removed dragen-cnv standardizer

4c02fd3

mwalker174 changed the title ~~Integrate DRAGEN-SV into GATK-SV~~ Integrate DRAGEN-SV into GATK-SV joint calling Apr 28, 2025

mwalker174 requested changes Apr 28, 2025

View reviewed changes

kjaisingh and others added 5 commits April 29, 2025 12:57

Initial set of responses to PR comments

986763a

Responses to PR feedback pt. 2

4ac9a3d

Update website/docs/gs/sv_callers.md

58ae43f

Co-authored-by: Mark Walker <markw@broadinstitute.org>

Moving dragen to end of list of supported callers

b2d3787

More minor resolutions

a888a68

kjaisingh requested a review from mwalker174 April 29, 2025 19:40

Merge branch 'main' into kj_dragensv_integration

ac8dddc

kjaisingh added 5 commits May 8, 2025 11:55

Merge branch 'main' into kj_dragensv_integration

dc2ae71

Added padding parameter to visualizecnv json templates

20cfeb0

Updated padding to default to no-padding if failed

30321a7

Remove unnecessary changes

b1cc509

Formatting changes

ddecc28

kjaisingh added 3 commits July 2, 2025 20:01

Merge branch 'main' into kj_dragensv_integration

43b6eb3

Merged main into branch

3b76571

Undo inclusion of SR genotyping in PR

795f160

kjaisingh added 4 commits July 2, 2025 20:05

Moved main version into branch

3cfc858

Added branch to all GATK-SV wdls to ensure nothing goes missing

50cad0a

Added comment to trigger dockstore sync

4658173

Merge branch 'main' into kj_dragensv_integration

454a6f8

kjaisingh changed the title ~~Integrate DRAGEN-SV into GATK-SV joint calling~~ Integrate DRAGEN-SV into GATK-SV cohort mode Jul 28, 2025

kjaisingh marked this pull request as draft July 28, 2025 16:20

kjaisingh changed the title ~~Integrate DRAGEN-SV into GATK-SV cohort mode~~ Enable DRAGEN-SV as a caller in cohort mode Jul 31, 2025

Merge branch 'main' into kj_dragensv_integration

255bb1b

kjaisingh marked this pull request as ready for review July 31, 2025 15:41

mwalker174 approved these changes Aug 8, 2025

View reviewed changes

kjaisingh added 6 commits August 8, 2025 13:31

Resolve merge conflicts

3667357

Resolved flake8 error

05b08f5

Removed branch from dockstore sync

51eb058

Modified argument to use -h rather than -z, which had conflict

937b7b3

Changed argument name for incorrect input

f30f656

Updated param to -q

d9e5b67

kjaisingh merged commit c4f44ea into main Aug 11, 2025
12 checks passed

kjaisingh deleted the kj_dragensv_integration branch August 11, 2025 14:36

VJalili mentioned this pull request Aug 13, 2025

Refactor manta-vcf arg to input-vcf to match recent refactors #853

Merged


		# TODO: Do we also have to include Dragen?

		workflow GATKSVPipelineBatch {

		maximizes the sensitivity of SV discovery by harmonizing output from six tools:
		Dragen/Manta, Wham, Scramble, cn.MOPS, and GATK-gCNV. To minimize false positives, raw SVs

	- [DRAGEN-SV](https://help.dragen.illumina.com/product-guides/dragen-v4.3/dragen-dna-pipeline/sv-calling)
	- [DRAGEN-SV](https://help.dragen.illumina.com/product-guides/dragen-v4.3/dragen-dna-pipeline/sv-calling) (not yet fully supported)

Conversation

kjaisingh commented Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Testing

Pre-Merge Changes Required

Extensions & Follow-Up Work

Uh oh!

mwalker174 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kjaisingh commented Apr 29, 2025

Uh oh!

kjaisingh commented May 1, 2025

Uh oh!

kjaisingh commented May 9, 2025

Uh oh!

mwalker174 left a comment

Choose a reason for hiding this comment

Uh oh!

kjaisingh commented Aug 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kjaisingh commented Apr 25, 2025 •

edited

Loading