Skip to content

Commit fd09bc9

Browse files
committed
feat: report mean gap-compressed identity for primary and supplementary alignments
docs: Update documentation for output of bam_statistics task. feat: add read length n50 calculation Rebase submodule.
1 parent 104f617 commit fd09bc9

9 files changed

Lines changed: 102 additions & 53 deletions

File tree

docs/bam_statistics.md

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
# bam_stats outputs
2+
3+
## `bam_statistics`
4+
5+
A compressed TSV file with a row for each record in the haplotagged BAM and the following columns:
6+
7+
- movie name
8+
- read name
9+
- read length
10+
- Phred scaled read quality
11+
- alignment type (unmapped, primary, supplementary; because supplementary alignments are included, reads may appear on multiple rows)
12+
- mapping quality (`MAPQ`), if mapped
13+
- gap-compressed identity (`mg`), if mapped
14+
15+
## `read_length_plot`
16+
17+
A histogram of read lengths, using only records marked `prim` or `unmapped`.
18+
19+
## `read_quality_plot`
20+
21+
A histogram of read qualities, using only records marked `prim` or `unmapped`. This output is only generated if the input BAMs contain the `rq` tag.
22+
23+
## `mapq_distribution_plot`, `mg_distribution_plot`
24+
25+
A histogram of mapping qualities and gap-compressed identities, respectively.
26+
27+
## `stat_num_reads`, `stat_read_length_mean`, `stat_read_length_median`, `stat_read_length_n50`, `stat_read_quality_mean`, `stat_read_quality_median`
28+
29+
Statistics computed using only records marked `prim` or `unmapped`.
30+
31+
## `stat_mapped_read_count`, `stat_mapped_percent`
32+
33+
Count of primary alignments, and primary alignments as a percentage of total reads.
34+
35+
## `stat_mean_gap_compressed_identity`
36+
37+
Mean gap-compressed identity of primary and supplementary alignments.

docs/bam_stats.md

Lines changed: 0 additions & 14 deletions
This file was deleted.

docs/family.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -169,10 +169,12 @@ The `Sample` struct contains sample specific data and metadata. The struct has t
169169
| Array\[String\] | stat_num_reads | Number of reads | |
170170
| Array\[String\] | stat_read_length_mean | Mean read length | |
171171
| Array\[String\] | stat_read_length_median | Median read length | |
172+
| Array\[String\] | stat_read_length_n50 | Read length N50 | |
172173
| Array\[String\] | stat_read_quality_mean | Mean read quality | |
173174
| Array\[String\] | stat_read_quality_median | Median read quality | |
174175
| Array\[String\] | stat_mapped_read_count | Count of reads mapped to reference | |
175176
| Array\[String\] | stat_mapped_percent | Percent of reads mapped to reference | |
177+
| Array\[String\] | stat_mean_gap_compressed_identity | Mean gap-compressed identity | |
176178
| Array\[String\] | inferred_sex | Inferred sex | Sex is inferred based on relative depth of chrY alignments. |
177179
| Array\[String\] | stat_mean_depth | Mean depth | |
178180

docs/singleton.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -125,10 +125,12 @@ flowchart TD
125125
| String | stat_num_reads | Number of reads | |
126126
| String | stat_read_length_mean | Mean read length | |
127127
| String | stat_read_length_median | Median read length | |
128+
| String | stat_read_length_n50 | Read length N50 | |
128129
| String | stat_read_quality_mean | Mean read quality | |
129130
| String | stat_read_quality_median | Median read quality | |
130131
| String | stat_mapped_read_count | Count of reads mapped to reference | |
131132
| String | stat_mapped_percent | Percent of reads mapped to reference | |
133+
| String | stat_mean_gap_compressed_identity | Mean gap-compressed identity | |
132134
| String | inferred_sex | Inferred sex | Sex is inferred based on relative depth of chrY alignments. |
133135
| String | stat_mean_depth | Mean depth | |
134136

wdl-ci.config.json

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -269,7 +269,7 @@
269269
"tasks": {
270270
"bam_stats": {
271271
"key": "bam_stats",
272-
"digest": "3gn3hurjmdifhucjdrnykji4w4cf7yjq",
272+
"digest": "n7yzgzllk24zqced7wsgijsmdjoot66t",
273273
"tests": [
274274
{
275275
"inputs": {
@@ -339,6 +339,12 @@
339339
"compare_string"
340340
]
341341
},
342+
"stat_read_length_n50": {
343+
"value": "16945",
344+
"test_tasks": [
345+
"compare_string"
346+
]
347+
},
342348
"stat_read_quality_mean": {
343349
"value": "35.91",
344350
"test_tasks": [
@@ -362,6 +368,12 @@
362368
"test_tasks": [
363369
"compare_string"
364370
]
371+
},
372+
"stat_mean_gap_compressed_identity": {
373+
"value": "99.77",
374+
"test_tasks": [
375+
"compare_string"
376+
]
365377
}
366378
}
367379
}

workflows/downstream/downstream.wdl

Lines changed: 15 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -212,19 +212,21 @@ workflow downstream {
212212
String stat_phase_block_ng50 = hiphase.stat_phase_block_ng50
213213

214214
# bam stats
215-
File bam_statistics = bam_stats.bam_statistics
216-
File read_length_plot = bam_stats.read_length_plot
217-
File? read_quality_plot = bam_stats.read_quality_plot
218-
File mapq_distribution_plot = bam_stats.mapq_distribution_plot
219-
File mg_distribution_plot = bam_stats.mg_distribution_plot
220-
String stat_num_reads = bam_stats.stat_num_reads
221-
String stat_read_length_mean = bam_stats.stat_read_length_mean
222-
String stat_read_length_median = bam_stats.stat_read_length_median
223-
String stat_read_quality_mean = bam_stats.stat_read_quality_mean
224-
String stat_read_quality_median = bam_stats.stat_read_quality_median
225-
String stat_mapped_read_count = bam_stats.stat_mapped_read_count
226-
String stat_mapped_percent = bam_stats.stat_mapped_percent
227-
File trgt_coverage_dropouts = coverage_dropouts.dropouts
215+
File bam_statistics = bam_stats.bam_statistics
216+
File read_length_plot = bam_stats.read_length_plot
217+
File? read_quality_plot = bam_stats.read_quality_plot
218+
File mapq_distribution_plot = bam_stats.mapq_distribution_plot
219+
File mg_distribution_plot = bam_stats.mg_distribution_plot
220+
String stat_num_reads = bam_stats.stat_num_reads
221+
String stat_read_length_mean = bam_stats.stat_read_length_mean
222+
String stat_read_length_median = bam_stats.stat_read_length_median
223+
String stat_read_length_n50 = bam_stats.stat_read_length_n50
224+
String stat_read_quality_mean = bam_stats.stat_read_quality_mean
225+
String stat_read_quality_median = bam_stats.stat_read_quality_median
226+
String stat_mapped_read_count = bam_stats.stat_mapped_read_count
227+
String stat_mapped_percent = bam_stats.stat_mapped_percent
228+
String stat_mean_gap_compressed_identity = bam_stats.stat_mean_gap_compressed_identity
229+
File trgt_coverage_dropouts = coverage_dropouts.dropouts
228230

229231
# small variant stats
230232
File small_variant_stats = bcftools_stats_roh_small_variants.stats

workflows/family.wdl

Lines changed: 16 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -224,10 +224,12 @@ workflow humanwgs_family {
224224
'num_reads': downstream.stat_num_reads,
225225
'read_length_mean': downstream.stat_read_length_mean,
226226
'read_length_median': downstream.stat_read_length_median,
227+
'read_length_n50': downstream.stat_read_length_n50,
227228
'read_quality_mean': downstream.stat_read_quality_mean,
228229
'read_quality_median': downstream.stat_read_quality_median,
229230
'mapped_read_count': downstream.stat_mapped_read_count,
230231
'mapped_percent': downstream.stat_mapped_percent,
232+
'mean_gap_compressed_identity': downstream.stat_mean_gap_compressed_identity,
231233
'mean_depth': upstream.stat_mean_depth,
232234
'inferred_sex': upstream.inferred_sex,
233235
'stat_phased_basepairs': downstream.stat_phased_basepairs,
@@ -267,18 +269,20 @@ workflow humanwgs_family {
267269
File msg_file = consolidate_stats.messages
268270

269271
# bam stats
270-
Array[File] bam_statistics = downstream.bam_statistics
271-
Array[File] read_length_plot = downstream.read_length_plot
272-
Array[File?] read_quality_plot = downstream.read_quality_plot
273-
Array[File] mapq_distribution_plot = downstream.mapq_distribution_plot
274-
Array[File] mg_distribution_plot = downstream.mg_distribution_plot
275-
Array[String] stat_num_reads = downstream.stat_num_reads
276-
Array[String] stat_read_length_mean = downstream.stat_read_length_mean
277-
Array[String] stat_read_length_median = downstream.stat_read_length_median
278-
Array[String] stat_read_quality_mean = downstream.stat_read_quality_mean
279-
Array[String] stat_read_quality_median = downstream.stat_read_quality_median
280-
Array[String] stat_mapped_read_count = downstream.stat_mapped_read_count
281-
Array[String] stat_mapped_percent = downstream.stat_mapped_percent
272+
Array[File] bam_statistics = downstream.bam_statistics
273+
Array[File] read_length_plot = downstream.read_length_plot
274+
Array[File?] read_quality_plot = downstream.read_quality_plot
275+
Array[File] mapq_distribution_plot = downstream.mapq_distribution_plot
276+
Array[File] mg_distribution_plot = downstream.mg_distribution_plot
277+
Array[String] stat_num_reads = downstream.stat_num_reads
278+
Array[String] stat_read_length_mean = downstream.stat_read_length_mean
279+
Array[String] stat_read_length_median = downstream.stat_read_length_median
280+
Array[String] stat_read_length_n50 = downstream.stat_read_length_n50
281+
Array[String] stat_read_quality_mean = downstream.stat_read_quality_mean
282+
Array[String] stat_read_quality_median = downstream.stat_read_quality_median
283+
Array[String] stat_mapped_read_count = downstream.stat_mapped_read_count
284+
Array[String] stat_mapped_percent = downstream.stat_mapped_percent
285+
Array[String] stat_mean_gap_compressed_identity = downstream.stat_mean_gap_compressed_identity
282286

283287
# merged, haplotagged alignments
284288
Array[File] merged_haplotagged_bam = downstream.merged_haplotagged_bam

workflows/singleton.wdl

Lines changed: 16 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -169,10 +169,12 @@ workflow humanwgs_singleton {
169169
'num_reads': [downstream.stat_num_reads],
170170
'read_length_mean': [downstream.stat_read_length_mean],
171171
'read_length_median': [downstream.stat_read_length_median],
172+
'read_length_n50': [downstream.stat_read_length_n50],
172173
'read_quality_mean': [downstream.stat_read_quality_mean],
173174
'read_quality_median': [downstream.stat_read_quality_median],
174175
'mapped_read_count': [downstream.stat_mapped_read_count],
175176
'mapped_percent': [downstream.stat_mapped_percent],
177+
'mean_gap_compressed_identity': [downstream.stat_mean_gap_compressed_identity],
176178
'mean_depth': [upstream.stat_mean_depth],
177179
'inferred_sex': [upstream.inferred_sex],
178180
'stat_phased_basepairs': [downstream.stat_phased_basepairs],
@@ -211,18 +213,20 @@ workflow humanwgs_singleton {
211213
File msg_file = consolidate_stats.messages
212214

213215
# bam stats
214-
File bam_statistics = downstream.bam_statistics
215-
File read_length_plot = downstream.read_length_plot
216-
File? read_quality_plot = downstream.read_quality_plot
217-
File mapq_distribution_plot = downstream.mapq_distribution_plot
218-
File mg_distribution_plot = downstream.mg_distribution_plot
219-
String stat_num_reads = downstream.stat_num_reads
220-
String stat_read_length_mean = downstream.stat_read_length_mean
221-
String stat_read_length_median = downstream.stat_read_length_median
222-
String stat_read_quality_mean = downstream.stat_read_quality_mean
223-
String stat_read_quality_median = downstream.stat_read_quality_median
224-
String stat_mapped_read_count = downstream.stat_mapped_read_count
225-
String stat_mapped_percent = downstream.stat_mapped_percent
216+
File bam_statistics = downstream.bam_statistics
217+
File read_length_plot = downstream.read_length_plot
218+
File? read_quality_plot = downstream.read_quality_plot
219+
File mapq_distribution_plot = downstream.mapq_distribution_plot
220+
File mg_distribution_plot = downstream.mg_distribution_plot
221+
String stat_num_reads = downstream.stat_num_reads
222+
String stat_read_length_mean = downstream.stat_read_length_mean
223+
String stat_read_length_median = downstream.stat_read_length_median
224+
String stat_read_length_n50 = downstream.stat_read_length_n50
225+
String stat_read_quality_mean = downstream.stat_read_quality_mean
226+
String stat_read_quality_median = downstream.stat_read_quality_median
227+
String stat_mapped_read_count = downstream.stat_mapped_read_count
228+
String stat_mapped_percent = downstream.stat_mapped_percent
229+
String stat_mean_gap_compressed_identity = downstream.stat_mean_gap_compressed_identity
226230

227231
# merged, haplotagged alignments
228232
File merged_haplotagged_bam = downstream.merged_haplotagged_bam

0 commit comments

Comments
 (0)