Skip to content

Enable DRAGEN-SV as a caller in cohort mode#803

Merged
kjaisingh merged 31 commits intomainfrom
kj_dragensv_integration
Aug 11, 2025
Merged

Enable DRAGEN-SV as a caller in cohort mode#803
kjaisingh merged 31 commits intomainfrom
kj_dragensv_integration

Conversation

@kjaisingh
Copy link
Copy Markdown
Collaborator

@kjaisingh kjaisingh commented Apr 25, 2025

Description

This PR is intended to integrate calls made by DRAGEN-SV into the GATK-SV pipeline for the joint calling mode. It introduces several changes relating to this, including but not limited to:

  • Updates all GATK-SV workflows to allow for DRAGEN-SV calls to be passed through the pipeline.
  • Introduces a DRAGEN-SV standardizer that processes raw DRAGEN-SV VCFs.
  • Adds DRAGEN-SV parameters to WDLs that process and/or analyze caller-specific VCFs.
  • Provides a -P parameter that can be used to add a padding window when generating depth plots with RdTestV2.R, invoking this in VisualizeCnvs.
  • Enables manually bypassing the cutoffs and scores generation in FilterBatchSites, which enables customized random forest training if desired.

Testing

  • This Terra workspace shows an example run of the entire pipeline prior to this change, using Manta instead of DRAGEN-SV.
  • This Terra workspace shows an example run of the entire pipeline with this change, which used an updated docker image across all workflows run. The output plots in 19-FilterGenotypes show similar results to what we observed in our initial run of this pipeline during the formal evaluations.
  • This job shows an example run of the updated RdTestV2.R script by running VisualizeCnvs, a supporting workflow in GATK-SV which invokes this script. For reference, this job shows the output of that same WDL but prior to any changes, whereas this job shows the output of the updated workflow albeit without including the padding parameter.
  • Validated all WDLs with womtool.

Pre-Merge Changes Required

Remove automated syncing of WDLs to Dockstore.

Extensions & Follow-Up Work

  • Integrate DRAGEN-SV into the single-sample pipeline.
  • Benchmark and integrate DRAGEN-CNV into both the cohort and single-sample pipelines.

@kjaisingh kjaisingh added the enhancement New feature or request label Apr 25, 2025
@kjaisingh kjaisingh self-assigned this Apr 25, 2025
@kjaisingh kjaisingh requested a review from mwalker174 April 25, 2025 19:44
@kjaisingh kjaisingh marked this pull request as ready for review April 25, 2025 19:44
@mwalker174 mwalker174 changed the title Integrate DRAGEN-SV into GATK-SV Integrate DRAGEN-SV into GATK-SV joint calling Apr 28, 2025
Copy link
Copy Markdown
Collaborator

@mwalker174 mwalker174 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kjaisingh this is very thorough and culmination of a lot of evaluation work. There are a couple of things I'll need to ask you to do below but they should be relatively easy. We should hold off on documenting this as a fully supported tool - there are some changes to genotype filtering that need to go in as well, as we will need to modify the GQRecalibrator to treat manta/dragen interchangeably.

Comment thread src/denovo/denovo_svs.py Outdated
print("Took %f seconds to process" % delta)

# Filter out INS that are manta or melt only and are SR only, have GQ=0, and FILTER contains 'HIGH_SR_BACKGROUND'
# TODO: Do we also have to filter out DRAGEN-only records matching the Manta condition?
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we should add DRAGEN support here. @VJalili please take note that we are making dragen/manta calls equivalent for this condition.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated accordingly.


for batch in Phase1 Pilot; do
for source in delly lumpy manta wham depth; do
for source in delly dragen lumpy manta wham depth; do
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great that you made these changes but these scripts are no longer in use. Fine to keep the changes in.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should clean these out at some point

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks for letting me know.

Comment thread wdl/FilterBatchSites.wdl
Comment on lines +22 to +25
# Optional overrides
File? adjudicate_cutoffs
File? adjudicate_scores
File? adjudicate_rf_files
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on the comment so users understand what this is for? I think this will be useful in the future. Also please add it to the PR notes so that we can incorporate it in the next release notes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not add more documentation here as I did not originally intend to add this change to the PR, but given what you said, it seems like it's potentially useful.

With this in mind, I've updated the documentation and PR notes accordingly to now include this.


# TODO: Do we also have to include Dragen?

workflow GATKSVPipelineBatch {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This workflow will be deprecated in the future, so let's not spend time on it. Same with the Phase1 wdl below.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated accordingly.

Comment thread wdl/GATKSVPipelineSingleSample.wdl Outdated
# Runs GatherSampleEvidence, EvidenceQC, GatherBatchEvidence, ClusterBatch, FilterBatch.MergePesrVcfs, GenotypeBatch,
# MakeCohortVcf (CombineBatches, ResolveComplexVariants, GenotypeComplexVariants, GenotypeComplexVariants), and AnnotateVcf

# TODO: Do we also have to include Dragen?
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leave that as a future PR. Make a note in the PR that we are only incorporating for joint calling for now (I just modified the title).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated accordingly.

Comment thread wdl/MakeBincovMatrix.wdl Outdated
Comment on lines +19 to +25
if(defined(bincov_matrix_samples)) {
String bincov_matrix_header = read_lines(SetBins.bincov_matrix_header_file)[0]
}

Array[String]+ all_samples = flatten([samples, select_all([bincov_matrix_header])])
Array[File]+ all_count_files = flatten([count_files, select_all([bincov_matrix])])

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this causing an error? Would be good to mention in PR notes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I think this was just a one-time modification that spilled over into the PR - removing.

Comment thread wdl/Scramble.wdl Outdated
File original_bam_or_cram_file
File original_bam_or_cram_index
File counts_file
# TODO: Do we also have to include Dragen? If so, we will need to update src/sv-pipeline/scripts/make_scramble_vcf.py too
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I overlooked this before but this is actually critical for filtering FP deletions called by Scramble. I don't think the changes are hard - just change the "manta" variables/inputs to something more general like "deletion" and rewire the GatherSampleEvidence to optionally take in dragen. If a dragen vcf is provided and manta is also run then dragen takes preference.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated accordingly - thanks for the guidance here.

Comment thread wdl/TinyResolve.wdl Outdated
workflow TinyResolve {
input {
Array[String] samples # Sample ID
# TODO: Do we also have to include Dragen calls?
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes let's do this as well, again with dragen taking preference. Should just be a matter of WDL adjustments.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated accordingly - haven't renamed the tasks within TinyResolve to not use Manta in their name, but let me know if this is preferred.

Comment thread website/docs/intro.md Outdated
Comment on lines +11 to +12
maximizes the sensitivity of SV discovery by harmonizing output from six tools:
Dragen/Manta, Wham, Scramble, cn.MOPS, and GATK-gCNV. To minimize false positives, raw SVs
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not document this yet

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated accordingly.

Comment thread website/docs/gs/sv_callers.md Outdated
and annotates the calls from these tools to produce a final call set.

The SV calling tools, sometimes referred to as "PE/SR" tools, include:
- [DRAGEN-SV](https://help.dragen.illumina.com/product-guides/dragen-v4.3/dragen-dna-pipeline/sv-calling)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move to end of list

Suggested change
- [DRAGEN-SV](https://help.dragen.illumina.com/product-guides/dragen-v4.3/dragen-dna-pipeline/sv-calling)
- [DRAGEN-SV](https://help.dragen.illumina.com/product-guides/dragen-v4.3/dragen-dna-pipeline/sv-calling) (not yet fully supported)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated accordingly.

@kjaisingh
Copy link
Copy Markdown
Collaborator Author

Thanks @kjaisingh this is very thorough and culmination of a lot of evaluation work. There are a couple of things I'll need to ask you to do below but they should be relatively easy. We should hold off on documenting this as a fully supported tool - there are some changes to genotype filtering that need to go in as well, as we will need to modify the GQRecalibrator to treat manta/dragen interchangeably.

@mwalker174 Thanks for all the feedback in this review - I believe I have made all requested changes, as well as some additional related ones. Not sure if I should plan re-run the pipeline end-to-end with these changes, (maybe just with a single trio?), let me know what you think.

@kjaisingh kjaisingh requested a review from mwalker174 April 29, 2025 19:40
@kjaisingh
Copy link
Copy Markdown
Collaborator Author

Also not sure if you suggest I run the pipeline end-to-end with these updated changes prior to merging (maybe on a subset of samples?), though there should not been any major functional changes since I ran it on the DRAGEN-SV cohort.

@kjaisingh
Copy link
Copy Markdown
Collaborator Author

Additional note - I've intentionally only updated the JSON templates in the terra directory, as my understanding is that the JSON templates in the test directory are intended to be used in tandem with the GATKSVPipelineBatch.wdl workflow. I deduced this after noticing that some of the named inputs in these test JSON templates were different from that of the terra directory, and corresponded to outputs as named in the GATKSVPipelineBatch.wdl workflow - not sure if this is 100% accurate though.

As you suggested to not integrate DRAGEN-SV changes into this just yet, I did not update the JSON templates for it either.

@kjaisingh kjaisingh changed the title Integrate DRAGEN-SV into GATK-SV joint calling Integrate DRAGEN-SV into GATK-SV cohort mode Jul 28, 2025
@kjaisingh kjaisingh marked this pull request as draft July 28, 2025 16:20
@kjaisingh kjaisingh changed the title Integrate DRAGEN-SV into GATK-SV cohort mode Enable DRAGEN-SV as a caller in cohort mode Jul 31, 2025
@kjaisingh kjaisingh marked this pull request as ready for review July 31, 2025 15:41
Copy link
Copy Markdown
Collaborator

@mwalker174 mwalker174 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good although there are a couple of merge conflicts you need to resolve. Just a reminder to revert the changes to the dockstore yaml as well.

@kjaisingh
Copy link
Copy Markdown
Collaborator Author

Looks good although there are a couple of merge conflicts you need to resolve. Just a reminder to revert the changes to the dockstore yaml as well.

Thanks for highlighting this - rebased and ran an additional EvidenceQc test with the changes, which have passed.

@kjaisingh kjaisingh merged commit c4f44ea into main Aug 11, 2025
12 checks passed
@kjaisingh kjaisingh deleted the kj_dragensv_integration branch August 11, 2025 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants