Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
84 commits
Select commit Hold shift + click to select a range
68d2fa5
Update AlignedMetrics.wdl to expose runtime_attr_override
brosula Jan 30, 2024
791458d
Memory bump to ReadMetrics task in AlignedMetrics.wdl
brosula Jan 30, 2024
98e135c
Create SRDownsampleBam
brosula Feb 5, 2024
9fa3cf6
Update .dockstore.yml to include SRDownsampleBam
brosula Feb 5, 2024
3dd949e
Add SRDownsampleBam.wdl
brosula Feb 5, 2024
fff5b70
Update .dockstore.yml to include SRDownsampleBam.wdl
brosula Feb 5, 2024
4403b68
include Finalize in SRDownsampleBam.wdl
brosula Feb 5, 2024
6a2f2b3
Update .dockstore.yml
brosula Feb 5, 2024
02ed6ac
add bam size to SRDownsampleBam.wdl
brosula Feb 5, 2024
aef17c9
rename conflicting variables in SRDownsampleBam.wdl
brosula Feb 5, 2024
ae28bae
Update paths in SRDownsampleBam.wdl
brosula Feb 6, 2024
83306d5
Update paths in SRIndexBam.wdl
brosula Feb 6, 2024
5f4feef
Update paths in SRIndexBam.wdl
brosula Feb 6, 2024
3179cfd
Update parameter typo in SRIndexBam.wdl
brosula Feb 6, 2024
adf55b5
Added Bowtie2 alignment functionality to SRUtils.wdl
brosula Apr 1, 2024
32df65c
Added Bowtie2 as an option for decontamination
brosula Apr 1, 2024
911f2cb
Dockerfile for bowtie2
brosula Apr 1, 2024
9a4a32e
Added Bowtie2 as an option for decontamination
brosula Apr 1, 2024
f9c7e25
Added Bowtie2 decontamination to SRFlowcell
brosula Apr 1, 2024
cac9a27
Added Bowtie2 decontamination to SRFlowcell
brosula Apr 1, 2024
a731f57
Correct ref_basename type to String
brosula Apr 2, 2024
33dbe91
removed if statement in SRFlowcell
brosula Apr 2, 2024
907294f
corrected directory naming for bowtie2
brosula Apr 2, 2024
6e200ae
Change samtools view command
brosula Apr 2, 2024
16cc0f0
Update bowtie2 rg-id
brosula Apr 3, 2024
6c86925
Output sam for debugging
brosula Apr 3, 2024
45bf870
Redefine read group according to bowtie2 specs
brosula Apr 3, 2024
3f690b0
read in optional inputs correctly to Bowtie2
brosula Apr 3, 2024
df05e9f
corrected bowtie2 read group strings
brosula Apr 3, 2024
0470814
Added SRFlowcellDecontaminate.wdl workflow
brosula Apr 3, 2024
6a93d72
Add SRFlowcellDecontaminate to dockstore
brosula Apr 3, 2024
02d172f
Update bowtie2 command
brosula Apr 3, 2024
a85a05a
Update bowtie2 command
brosula Apr 3, 2024
a1192b5
Reintroduced `ReblockGVCF` step.
jonn-smith Apr 16, 2024
4ad79b7
Removed hard filtered output file.
jonn-smith Apr 17, 2024
0aa6881
Updating all GATK 4.3 tasks to GATK 4.5
jonn-smith Apr 17, 2024
51dfaab
Disabled QC when running on a singe bam file input.
jonn-smith Apr 18, 2024
c4a52d7
Added a runtime_attr override for HaplotypeCaller subworkflow.
jonn-smith Apr 19, 2024
f45f368
Update Utils.wdl
shadizaheri Apr 19, 2024
d3fffee
Updating tasks to use `SSD` rather than `LOCAL` disk.
jonn-smith Apr 22, 2024
416d1c0
Merge remote-tracking branch 'origin/jts_updates_to_malaria_for_joint…
brosula Apr 22, 2024
7fd5406
Merge _Bootstrap wdls into rrb_malaria_from_main branch, add standalo…
brosula Apr 22, 2024
43e6cd0
Correct arguments of HaplotypeCallerBootstrap.wdl
brosula Apr 23, 2024
fa3c100
Add size reduction reporting to SRReblockGVCFs
brosula Apr 23, 2024
66cb6b2
Add ref_map_file to SRReblockGVCFs
brosula Apr 23, 2024
4bc929c
Finalize files to gcs path in SRReblockGVCFs
brosula Apr 23, 2024
fb98bed
Remove prefix option in favor of participant_name
brosula Apr 23, 2024
f47a933
Remove prefix option in favor of participant_name
brosula Apr 23, 2024
f98e856
Update HaplotypeCallerBootstrap.wdl to GATK v4.5.0
brosula Apr 26, 2024
66bc1d2
Update containers in HaplotypeCaller to recent nightly snapshot of GATK4
brosula May 2, 2024
fb22b06
Update to GATK 4.5
brosula May 2, 2024
61ac6e1
Add contaminated bam/metrics to SRFlowcell output
brosula May 6, 2024
d921497
Add workflows/tasks to extract genome-wide depth from mosdepth and me…
brosula May 14, 2024
4487328
correct spelling in .dockstore.yml
brosula May 14, 2024
a8955c0
Localize contig_files instead
brosula May 14, 2024
5a7b68b
Correct reference to localized file in ExtractCoverageFromMosDepth.wdl
brosula May 14, 2024
da45200
Use Mosdepth to calculate WGS coverage statistics (derived from PMICP…
brosula May 14, 2024
e3257c3
Add error catch for empty string produced from checking threshold
brosula May 15, 2024
a0305bb
Use mosdepth and samtools coverage tools
brosula May 15, 2024
fec2b06
Remove ref_map_file for ExtractCoverageStats.wdl
brosula May 15, 2024
2db1c40
Correct calculation to account for files with low threshold
brosula May 15, 2024
2dd00d4
Correct awk script
brosula May 15, 2024
42055a3
Correct samtools command
brosula May 15, 2024
dba28de
Make gcs_out_root_dir optional (prepare for merge with main)
brosula May 18, 2024
1174ba8
Merge remote-tracking branch 'origin/main' into rrb_malaria_from_main
brosula May 18, 2024
8e43291
Correct for optional gcs_out_root_dir in SRJointCallGVCFsWithGenomics…
brosula May 18, 2024
146f8b2
Change handling of optional decontamination/contamination outputs
brosula May 25, 2024
de03e26
Specify intervals for manual sharding in SRJointCallGVCFsWithGenomicsDB
brosula May 28, 2024
c21d28c
Correct index for scattering in SRJointCallGVCFsWithGenomicsDB
brosula May 29, 2024
a1e32da
Scatter contig_2 with manual indexes
brosula May 29, 2024
0c6e53f
Add DownloadFromFTP to Dockstore
brosula May 29, 2024
7c04d54
Add option to turn off scattering for MosDepth
brosula May 31, 2024
830d827
Add raw_chr_intervals for SRFlowcell keyfile
brosula May 31, 2024
04a691d
Surface per_chr_metrics Boolean in SRFlowcell
brosula May 31, 2024
f0fcb40
Update SRWholeGenomeBootstrap.wdl to default to False for pileup dete…
brosula Jun 26, 2024
4601306
Update MosDepthWGSAtThreshold to report correct metrics
brosula Oct 16, 2024
9b1db8f
Update MosDepth to report whole genome coverage with BED file
brosula Oct 16, 2024
4d4d7e0
Update SRFlowcellDecontaminate.wdl
brosula Mar 20, 2025
b2c4e4e
Update GetCurrentTimeStampString
brosula Jun 27, 2025
c87f74f
Expose `max-num-haplotypes-in-population` and `max-reads-per-alignmen…
brosular Oct 24, 2025
89c3f70
Merge branch 'rrb_malaria_from_main' of https://github.com/broadinsti…
brosular Oct 24, 2025
2ae16c3
Set default values for `max_reads_per_alignment_start` and `max_num_h…
brosula Oct 24, 2025
e96b809
Turn exposed parameters from to
brosula Oct 24, 2025
98267d2
Merge branch 'rrb_malaria_from_main' of https://github.com/broadinsti…
brosula Oct 24, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions .dockstore.yml
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,27 @@ workflows:
- name: ExpandedDrugResistanceMarkerAggregation
subclass: wdl
primaryDescriptorPath: /wdl/pipelines/TechAgnostic/TertiaryAnalysis/ExpandedDrugResistanceMarkerAggregation.wdl
- name: SRDownsampleBam
subclass: wdl
primaryDescriptorPath: /wdl/pipelines/TechAgnostic/Utility/SRDownsampleBam.wdl
- name: SRFlowcellDecontaminate
subclass: wdl
primaryDescriptorPath: /wdl/pipelines/ILMN/Alignment/SRFlowcellDecontaminate.wdl
- name: SvQCPlots
subclass: wdl
primaryDescriptorPath: /wdl/pipelines/TechAgnostic/Visualization/SvQCPlots.wdl
- name: SRWholeGenomeBootstrap
subclass: wdl
primaryDescriptorPath: /wdl/pipelines/ILMN/VariantCalling/SRWholeGenomeBootstrap.wdl
- name: SRJointCallGVCFsWithGenomicsDBBootstrap
subclass: wdl
primaryDescriptorPath: /wdl/pipelines/TechAgnostic/VariantCalling/SRJointCallGVCFsWithGenomicsDBBootstrap.wdl
- name: SRReblockGVCFs
subclass: wdl
primaryDescriptorPath: /wdl/pipelines/ILMN/VariantCalling/SRReblockGVCFs.wdl
- name: ExtractCoverageStats
subclass: wdl
primaryDescriptorPath: /wdl/pipelines/TechAgnostic/Utility/ExtractCoverageStats.wdl
- name: DownloadFromFTP
subclass: wdl
primaryDescriptorPath: /wdl/pipelines/TechAgnostic/Utility/DownloadFromFTP.wdl
54 changes: 54 additions & 0 deletions docker/sr-bowtie2/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Modified Dockerfile script from other long-read-pipelines scripts to download bowtie2 and samtools

# Start with a good base python3 image:
FROM ubuntu:20.04
MAINTAINER Raphael Brosula <brosula@broadinstitute.org>

# Make sure we don't need to interact with any package installations:
ARG DEBIAN_FRONTEND=noninteractive

# Set the working directory to /
WORKDIR /

########################################################################################################################
# DEPENDENCIES
RUN apt-get --allow-releaseinfo-change update
RUN apt-get install -y build-essential

# Dependencies for samtools:
RUN apt-get install -y bzip2 curl gnupg2 libc-dev ncurses-dev libcurl4-openssl-dev libssl-dev libbz2-dev liblzma-dev zlib1g-dev

# Additional Dependencies:
RUN apt install -y curl wget pkg-config zip unzip default-jre
########################################################################################################################
# SOFTWARE:

RUN mkdir -p /usr/local/bin /usr/local/lib /usr/local/etc

# Samtools:
# Get samtools source:
RUN wget https://github.com/samtools/samtools/releases/download/1.11/samtools-1.11.tar.bz2 && \
tar -xjf samtools-1.11.tar.bz2 && \
cd samtools-1.11 && \
./configure && make install && \
cd .. && \
rm -rf samtools-1.11 samtools-1.11.tar.bz2

# bowtie2/2.5.3
RUN wget https://sourceforge.net/projects/bowtie-bio/files/bowtie2/2.5.3/bowtie2-2.5.3-linux-x86_64.zip && \
unzip bowtie2-2.5.3-linux-x86_64.zip && \
cd bowtie2-2.5.3-linux-x86_64 && \
mv bowtie2* /usr/local/bin && \
mkdir -p /usr/local/etc/bowtie2 && \
mv AUTHORS BOWTIE2_VERSION LICENSE MANUAL MANUAL.markdown NEWS README.md /usr/local/etc/bowtie2 && \
cd .. \
rm -rf bowtie2-2.5.3-linux-x86_64.zip bowtie2-2.5.3-linux-x86_64/

########################################################################################################################
########################################################################################################################
########################################################################################################################

# Other utilities:
RUN apt-get clean
RUN apt install -y vim emacs nano
RUN apt-get clean
68 changes: 45 additions & 23 deletions wdl/pipelines/ILMN/Alignment/SRFlowcell.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ workflow SRFlowcell {
fq_end2: "FASTQ file containing end 2 of the short read data to process. `fq_end1` must be defined if this argument is. This argument and `fq_end1` are mutually exclusive with `bam` and `bai`"

SM: "Sample name for the given bam file."
LM: "Library name for the given bam file."
LB: "Library name for the given bam file."

ref_map_file: "Reference map file for the primary reference sequence and auxillary file locations."
contaminant_ref_name: "Name for the contaminant reference."
Expand All @@ -36,6 +36,8 @@ workflow SRFlowcell {
DEBUG_MODE: "If true, will add extra logging and extra debugging outputs."

platform: "Platform on which the sample for the given bam file was sequenced."

per_chr_metrics: "Flag to scatter metrics by chromosome or to calculate them for the whole genome."
}

input {
Expand All @@ -55,12 +57,14 @@ workflow SRFlowcell {
String dir_prefix

String? gcs_out_root_dir

Boolean perform_BQSR = true

Boolean DEBUG_MODE = false

String platform = "illumina"

Boolean per_chr_metrics = true
}

####################################
Expand Down Expand Up @@ -96,11 +100,8 @@ workflow SRFlowcell {
call Utils.GetRawReadGroup as t_004_GetRawReadGroup { input: gcs_bam_path = select_first([bam]) }
}

# OK, this is inefficient, but let's NOW extract our contaminated reads if we have the info.
# TODO: Move this into the sections above to make it more efficient. Specifically where we convert bam -> fastq.
# TODO: Re-enable this section after decontamination is fixed. The alignment based method with BWA-MEM doesn't work. Not clear why, but this does seem somewhat inadequate (simplistic alignment-based strategies).
if (false && defined(contaminant_ref_map_file)) {

# Update: Enabled bowtie2-based decontamination of reads
if (defined(contaminant_ref_map_file)) {
# Call our sub-workflow for decontamination:
# NOTE: We don't need to be too concerned with the finalization info.
# This will be partially filled in by the WDL itself, so we can pass the same inputs for
Expand All @@ -113,6 +114,7 @@ workflow SRFlowcell {
SM = SM,
LB = LB,
platform = platform,
aligner = "bowtie2",

contaminant_ref_name = select_first([contaminant_ref_name]),
contaminant_ref_map_file = select_first([contaminant_ref_map_file]),
Expand Down Expand Up @@ -243,9 +245,18 @@ workflow SRFlowcell {
aligned_bai = final_bai,
ref_fasta = ref_map['fasta'],
ref_dict = ref_map['dict'],
scatter_by_chr = per_chr_metrics,
gcs_output_dir = metrics_dir
}

# Collect stats on decontaminated reads
if(defined(contaminant_ref_map_file)) {
# Metrics on contaminated bam
call SRUTIL.ComputeBamStats as t_020_ComputeContaminatedBamStats { input: bam_file = select_first([DecontaminateSample.contaminated_bam]) }
# Metrics on original bam file
call SRUTIL.ComputeBamStats as t_021_ComputeUnalignedBamStats { input: bam_file = select_first([t_002_RevertSam.bam, DecontaminateSample.all_reads_bam])}
}

############################################
# _____ _ _ _
# | ___(_)_ __ __ _| (_)_______
Expand All @@ -268,7 +279,7 @@ workflow SRFlowcell {
File keyfile = t_014_ComputeBamStats.results_file

# Finalize our unaligned reads first:
call FF.FinalizeToDir as t_020_FinalizeUnalignedFastqReads {
call FF.FinalizeToDir as t_022_FinalizeUnalignedFastqReads {
input:
outdir = unaligned_reads_dir,
files =
Expand All @@ -279,7 +290,7 @@ workflow SRFlowcell {
keyfile = keyfile
}
if (defined(bam)) {
call FF.FinalizeToDir as t_021_FinalizeUnalignedReadsFromBam {
call FF.FinalizeToDir as t_023_FinalizeUnalignedReadsFromBam {
input:
outdir = unaligned_reads_dir,
files = select_all(
Expand All @@ -292,7 +303,7 @@ workflow SRFlowcell {
}
}

call FF.FinalizeToDir as t_022_FinalizeAlignedReads {
call FF.FinalizeToDir as t_024_FinalizeAlignedReads {
input:
outdir = aligned_reads_dir,
files =
Expand All @@ -306,22 +317,22 @@ workflow SRFlowcell {
keyfile = keyfile
}

call FF.FinalizeToFile as t_023_FinalizeAlignedBam {
call FF.FinalizeToFile as t_025_FinalizeAlignedBam {
input:
outdir = aligned_reads_dir,
file = final_bam,
keyfile = keyfile
}

call FF.FinalizeToFile as t_024_FinalizeAlignedBai {
call FF.FinalizeToFile as t_026_FinalizeAlignedBai {
input:
outdir = aligned_reads_dir,
file = final_bai,
keyfile = keyfile
}

# Finalize our metrics:
call FF.FinalizeToDir as t_025_FinalizeMetrics {
call FF.FinalizeToDir as t_027_FinalizeMetrics {
input:
outdir = metrics_dir,
files =
Expand All @@ -340,15 +351,15 @@ workflow SRFlowcell {

# Finalize BQSR Metrics if it was run:
if (perform_BQSR) {
call FF.FinalizeToDir as t_026_FinalizeBQSRMetrics {
call FF.FinalizeToDir as t_028_FinalizeBQSRMetrics {
input:
outdir = metrics_dir,
files = select_all([t_009_BaseRecalibrator.recalibration_report]),
keyfile = keyfile
}
}

call FF.FinalizeToFile as t_027_FinalizeFastQCReport {
call FF.FinalizeToFile as t_029_FinalizeFastQCReport {
input:
outdir = metrics_dir,
file = t_012_FastQC.report
Expand All @@ -357,19 +368,29 @@ workflow SRFlowcell {
# Prep a few files for output:
File fq1_o = unaligned_reads_dir + "/" + basename(fq_e1)
File fq2_o = unaligned_reads_dir + "/" + basename(fq_e2)

if (defined(bam)) {
File unaligned_bam_o = unaligned_reads_dir + "/" + basename(select_first([bam]))
File unaligned_bai_o = unaligned_reads_dir + "/" + basename(select_first([bai]))
File fqboup = unaligned_reads_dir + "/" + basename(select_first([DecontaminateSample.decontaminated_unpaired, t_003_Bam2Fastq.fq_unpaired]))
}

if (defined(contaminant_ref_map_file)) {
call FF.FinalizeToFile as t_030_FinalizeContaminatedBam {
input:
outdir = reads_dir,
file = select_first([DecontaminateSample.contaminated_bam])
}
Float num_contam_reads_o = select_first([t_020_ComputeContaminatedBamStats.results])['reads']
Float pct_contam_reads_o = select_first([t_020_ComputeContaminatedBamStats.results])['reads'] / select_first([t_021_ComputeUnalignedBamStats.results])['reads'] * 100.0
}
}

# Prep some output values before the output block:
Float raw_est_fold_cov_value = if t_013_ComputeGenomeLength.length != 0 then t_014_ComputeBamStats.results['bases']/t_013_ComputeGenomeLength.length else 0.0
Float aligned_frac_bases_value = if t_011_SamStats.stats_map['total_length'] != 0 then t_011_SamStats.stats_map['bases_mapped']/t_011_SamStats.stats_map['total_length'] else 0.0
Float aligned_est_fold_cov_value = if t_013_ComputeGenomeLength.length != 0 then t_011_SamStats.stats_map['bases_mapped']/t_013_ComputeGenomeLength.length else 0.0
Float average_identity_value = if t_011_SamStats.stats_map['bases_mapped'] != 0 then 100.0 - (100.0*t_011_SamStats.stats_map['mismatches']/t_011_SamStats.stats_map['bases_mapped']) else 0.0

############################################
# ___ _ _
# / _ \ _ _| |_ _ __ _ _| |_
Expand All @@ -379,7 +400,7 @@ workflow SRFlowcell {
# |_|
############################################
output {
# Unaligned reads
# Unaligned reads - If decontamination is ran, these also carry the decontaminated bams
File fq1 = select_first([fq1_o, fq_e1])
File fq2 = select_first([fq2_o, fq_e1])
File? fq_unpaired = fqboup
Expand All @@ -388,13 +409,14 @@ workflow SRFlowcell {
File? unaligned_bam = unaligned_bam_o
File? unaligned_bai = unaligned_bai_o

# Contaminated BAM file:
# TODO: This will need to be fixed for optional finalization:
File? contaminated_bam = DecontaminateSample.contaminated_bam
# Contaminated BAM file and metrics
File? contaminated_bam = t_030_FinalizeContaminatedBam.gcs_path
Float? num_contam_reads = num_contam_reads_o
Float? pct_contam_reads = pct_contam_reads_o

# Aligned BAM file
File aligned_bam = select_first([t_023_FinalizeAlignedBam.gcs_path, final_bam])
File aligned_bai = select_first([t_024_FinalizeAlignedBai.gcs_path, final_bai])
File aligned_bam = select_first([t_025_FinalizeAlignedBam.gcs_path, final_bam])
File aligned_bai = select_first([t_026_FinalizeAlignedBai.gcs_path, final_bai])

# Unaligned read stats
Float num_reads = t_014_ComputeBamStats.results['reads']
Expand Down Expand Up @@ -426,6 +448,6 @@ workflow SRFlowcell {

Float average_identity = average_identity_value

File fastqc_report = select_first([t_027_FinalizeFastQCReport.gcs_path, t_012_FastQC.report])
File fastqc_report = select_first([t_029_FinalizeFastQCReport.gcs_path, t_012_FastQC.report])
}
}
Loading
Loading