-
Notifications
You must be signed in to change notification settings - Fork 4
Expand file tree
/
Copy pathcrumble.1
More file actions
617 lines (540 loc) · 22.7 KB
/
crumble.1
File metadata and controls
617 lines (540 loc) · 22.7 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
.TH crumble 1 "13 April 2022" "crumble-0.9" "Bioinformatics tools"
.SH NAME
.PP
crumble \- Lossy compression of DNA sequence quality values
.SH SYNOPSIS
.PP
\fBcrumble\fR [\fIoptions\fR] [\fIinput_file\fR [\fIoutput_file\fR]]
.SH DESCRIPTION
.PP
\fBCrumble\fR reads an aligned SAM, BAM or CRAM file, performs lossy
compression of the quality values based on the consistency of the
multiple alignment, and writes a modified SAM, BAM or CRAM file.
Reading and writing can be to stdin and stdout, either by omitting the
field or specifying the filename as "\fB-\fR".
.PP
The method of quality compression stems from observations of how
quality values affect variant calling algorithms. If we run a variant
caller on a file with and without quality values and get the same
result with comparable confidence both times, then we conclude the
quality values were not important and can be discarded.
.PP
We do not wish to know the internals of each possible variant caller,
so instead use a deliberately simplistic but pessimistic consensus
algorithm, capable of calling homozygous and heterozygous calls for a
single diploid sample with an associated consensus confidence. It
does this without the use of a reference sequence, using solely the
data within the aligned BAM. If we are unsure of the accuracy of our
consensus call then we keep the qualities intact or apply only a light
smoothing filter, otherwise we adjust the qualities to fixed values to
reduce the overall variability, and thus entropy, increasing
compression ratios.
.PP
\fBCrumble\fR can compute the consensus with and/or without using
mapping quality. By default it only computes it once (with), but if
both consensus methods are requested then each will have its own set
of thresholds to use in validation. The minimum quality threshold
(\fB-q\fR and \fB-Q\fR) needs to be above zero in order for that
consensus method to be used.
.PP
\fBCrumble\fR uses the consensus confidence plus heuristics listed
below to classify positions into good and bad. A good region will
have the qualities modified; bases in that column that agree with the
homozygous or heterozygous consensus call will have their quality
values increased to a fixed value (specified by \fB-u\fR), while bases
that disagree with have their values either decreased (specified by
\fB-l\fR) or heavily quantised (\fB-l\fR, \fB-c\fR and \fB-u\fR). In
all other regions, \fBcrumble\fR either keeps the quality values
intact or performs a simple smoothing (using the CSAM P-block
algorithm with level indicated by \fB-p\fR). A global upper-bound
quality value can be set using the \fB-U\fR option, which can be used
to set a cap on excessively high qualities such as seen in PacBio HiFi.
.PP
A series of heuristics are employed to further filter high confidence
consensus bases in case of systematic problems, to prevent seemingly
confident but incorrect consensus calls leading to inappropriate
quality loss. The possible heuristics that are applied depend on the
compression level used. Some are per column, such as consensus
confidence, while others are per read. In the latter case, if we
determine the whole read is potentially misplaced / misaligned then we
preserve qualities for the entire read and not just the column being
studied.
.TP 4
.B Per base: SNP / Indel confidence
This is the starting point for deciding whether to keep a quality. An
indel in this situation is considered to be any column where at least
one base is either absent or inserted with respect to the reference
coordinate system.
If the consensus confidence is below the minimum specified
confidence for a SNP (\fB-Q\fR and/or \fB-q\fR depending on whether
the consensus algorithm is using mapping quality) or an indel (\fB-D\fR
or \fB-d\fR) it will be retained.
.TP 4
.B Per base: consensus discrepancy
The consensus will make either a homozygous or heterozygous call, but
there may be bases present that disagree. By computing a new consensus
solely from these discrepant bases and comparing this to the original
call we get a discrepancy factor. High discrepancy (above 1) may be
indicative of misaligned data. Specified using the \fB-X\fR and
\fB-x\fR parameters.
.TP 4
.B Per base: 'keep' bed file
Any base which overlaps the regions specified in the bed file
(\fB-b\fR) will be retained verbatim.
.TP 4
.B Per base: proximity to SNP / Indel
A short tandem repeat (STR) with indels will often lead to misaligned
bases. A variant caller may employ a local reassembly method to tidy
up alignments in such regions, but this has the effect of moving bases
between columns which in turn may invalidate our calculations of which
columns are high confidence. We can compensate for this by preserve
qualities spanning the entire STR plus a little bit more either side,
either as a proportion of the STR length or a fixed number of base pairs.
For any location where \fBCrumble\fR wants to preserve base
confidences using the above rules, it checks whether this base forms
part of an STR and if so computes the start and end of this STR. It
then expands the region to preserve qualities using \fBnew_start = pos - (pos -
start) * STR_mul - STR_add\fR and \fBnew_end = pos + (end - pos) * STR_mul +
STR_add\fR where \fIpos\fR is the current location.
We have SNP and Indel specific versions of these multiply and addition
parameters, specified using \fB\s\fR and \fB-i\fR. These are specified
as the multiplier and additive values separated with a comma. For
example \fB-i1.5,1\fR will make preserve qualities for 50% larger plus
1 extra bp on either side of the STR.
Platforms with an inherently large indel error rate may wish to use
the \fB-Y\fR \fIfraction\fR option to restrict the STR assessment to
columns where the number of reads containing indels are above a given
threshold. This can be a significant performance gain on such data.
.TP 4
.B Per base: presence of "keep" quality values.
Specific qualities may be marked as ones to keep / preserve using the
\fB-k\fR and \fB-K\fR options. If these are present then either those
specific bases are kept or with \fB-N\fR the entire column is kept.
The difference between \fB-k\fR and \fB-K\fR is whether we preserve
qualities only when we're not certain of a positive heterozygous call
or a homozygous call without any discrepant bases (\fB-k\fR) or for
all occurrences (\fB-K\fR).
The intention of these options are to retain high quality discrepant
bases and facilitate the possibility of somatic mutation detection.
This is probably best combined with a lower \fB-X\fR parameter too.
.TP 4
.B Per read: excessive depth
Regions of collapsed repeats or large insertions being aligned to the
next best reference match lead to high depth. Typically such
locations can yield overly confident heterozygous variant calls.
The average depth is predominantly from the previous Mb of consensus.
A column with depth more than \fB-P\fR \fIdepth\fR times the average
will have mean all qualities for reads overlapping that quality will
be retained, not just in that column alone.
.TP 4
.B Per read: high proportion of low mapping quality
Regions where most of the reads have low mapping quality are either
repeat copies or, potentially, incorrect reference. An updated
reference may move these reads and we may not wish to pre-judge
whether their quality values will be necessary after remapping.
Controlled by the \fB-M\fR option, which should be a fraction between
0.0 (keep qualities if any low mapping quality data is present) and
1.0 (do not use this rule).
Note that all reads in regions matching this rule will be have their
qualities preserved, not just the low mapping quality ones.
.TP 4
.B Per read: high proportion of soft-clipping
Many reads having soft-clipped portions of their CIGAR strings is
indicative of either a missing insertion or a collapsed repeat.
Relying on the validity of the consensus call and confidence in this
case can be wrong. The \fB-C\fR option specifies the fraction of
reads present in order to retain quality values, with 1.0 disabling
this rule.
.TP 4
.B Per read: variability in indel size
Given we are targeting a diploid single sample file, the length of
indels should be at most bi-modal in distribution. We find the two
most frequent lengths of indel and count the proportion of other
lengths over all reads spanning the indel at this region. If it is
too high (above the fraction in \fB-Z\fR option), then there may be
additional alignment problems causing bases to be in incorrect
columns.
.TP 4
.B Per read: low proportion of reads spanning indel
A pairwise alignment naturally leads to some reads ending within an
indel, which in turn leads to the number of reads spanning the indel
being slightly lower than the number of reads flanking it. However
too large a drop is sometimes indicative of a large structural
variation or a misalignment. We keep qualities if the fraction of
reads overlapping the indel is lower than the amount specified by
\fB-V\fR, with zero disabling this filter.
.SH OPTIONS
.PP
All the compound options such as \fB-1\fR to \fB-9\fR should be
specified first, followed by instrument options (\fB-y\fR) and then
other options. This avoids \fB-9\fR from overwriting any option changes
already made.
.PP
.TP
\fB-1\fR
Short hand for \fB-p0 -Q75 -D150 -X1 -M0.5 -Z0.1 -V0.5 -P3.0 -s1.0,5
-i2.0,1 -m5\fR.
.PP
.TP
\fB-3\fR
Short hand for \fB-p0 -Q75 -D150 -X1 -M0.5 -Z0.1 -V0.5 -P3.0 -s1.0,0
-i1.1,2 -m0\fR.
.PP
.TP
\fB-5\fR
Short hand for \fB-p0 -Q75 -D150 -X1 -M0.5 -Z0.1 -V0.5 -P3.0 -s0.0,0
-i1.1,2 -m0\fR.
.PP
.TP
\fB-7\fR
Short hand for \fB-p0 -Q75 -D150 -X1 -M1 -Z1 -V0 -P999 -s0.0,0
-i1.1,2 -m0\fR.
.PP
.TP
\fB-8\fR
Short hand for \fB-p0 -Q70 -D125 -X1.5 -M1 -Z1 -V0 -P999 -s0.0,0
-i1.0,2 -m0\fR.
.PP
.TP
\fB-9\fR
Short hand for \fB-p8 -Q70 -D125 -X1.5 -M1 -Z1 -V0 -P999 -s0.0,0
-i1.0,2 -m0\fR.
.PP
.TP
\fB-Y\fR \fIplatform\fR
Recommended options for a specific platform. This option should come
after any \fB-1\fR to \fB-9\fR option and before others.
The current supported \fIplatform\fR strings are "illumina" which does
nothing as the defaults are for this instrument, and "pbccs" with the
latter being equivalent to \fB-Y0.1 -u50 -U50 -p16\fR.
.PP
.TP
\fB-b\fR \fIout.bed\fR
Writes out a bed file containing regions that trigger the various
heuristics that \fBcrumble\fI uses to preserve quality values.
.PP
.TP
\fB-B\fR
Enables binary quality mode. If the \fB-L 1\fR option (enabled by
default) is used to reduce mismatching qualities, bases in good
regions that disagree with the called consensus will be modified.
Without the binary quality mode these will be set to a constant low
value, specified by the \fB-l\fR parameter. With binary quality mode,
these are instead quantised to two values; low and high as governed by
the \fB-l\fR, \fB-u\fR and \fB-c\fR parameters.
.PP
.TP
\fB-C\fR \fIfloat\fR
Keep all qualities for reads at this site if >= \fIfloat\fR proportion
of reads have soft-clipping.
.PP
.TP
\fB-c\fR \fIqual_cutoff\fR
.TQ
\fB-l\fR \fIqual_lower\fR
.TQ
\fB-u\fR \fIqual_upper\fR
In highly confident regions, quality are quantised into those \fI>=
qual_cutoff\fR and those \fI< qual_cutoff\fR, being replaced by
\fIqual_upper\fR and \fIqual_lower\fR respectively.
.PP
.TP
\fB-U\fR \fIqual_cap\fR
An absolute upper limit on quality values. This is useful with PacBio
HiFi data with an unrealistic (and expensive) large range of qualities.
This is performed right at the start of the Crumble algorithm and
applies to all data, even those that are otherwise kept intact.
.PP
.TP
\fB-k\fR \fIqual_range\fR
.TQ
\fB-K\fR \fIqual_range\fR
.TQ
\fB-N\fR
These options mark specific quality values as ones we wish to keep.
The most basic option is \fB-K\fR which preserves all indicated
quality values. The purpose is to facilite the possibility of somatic
variation detection, where the germline call may be an obvious ("no
mutation"), but we do not wish to quantise any abberant high-quality
bases.
However this can lead to larger data as most high quality bases match
the called consensus (either hom or het). The \fB-k\fR option is a
more relaxed definition of "keep" where only bases that disagree with
the most likely call and have the specific quality values are kept,
along with other "keep" qualities in that same column.
Combined with either option is \fB-N\fR which expands the list of
bases for which qualities are retained to include all other bases in
the same column. The intention of this is to not over-emphasise high
quality discrepant bases relative to the agreeing bases, which may
have been quantised or capped using other options.
The \fIqual_range\fR argument can be a single quality ("93"), a start
and end range ("90-99"), and also a comma separated list ("0,90-99").
.PP
.TP
\fB-d\fR \fIqual\fR
Keep quality for bases at this position if the consensus indel
confidence when computed without mapping quality is < \fIqual\fR.
.PP
.TP
\fB-D\fR \fIqual\fR
Keep quality for bases at this position if the consensus indel
confidence when computed using mapping quality is < \fIqual\fR.
.PP
.TP
\fB-e\fR \fIBD_low\fR
See \fB-f\fR for more details.
.PP
.TP
\fB-E\fR \fIBI_low\fR
See \fB-F\fR for more details.
.PP
.TP
\fB-f\fR \fIBD_cutoff\fR
If set, BD:Z tags will be binary quantised into values >=
\fIBD_cutoff\fR and values < \fIBD_cutoff\fR, replacing these with
\fIBD_low\fR and \fIBD_high\fR values specified using the \fB-e\fR and
\fB-g\fR options respectively
.PP
.TP
\fB-F\fR \fIBI_cutoff\fR
If set, BI:Z tags will be binary quantised into values >=
\fIBI_cutoff\fR and values < \fIBI_cutoff\fR, replacing these with
\fIBI_low\fR and \fIBI_high\fR values specified using the \fB-E\fR and
\fB-G\fR options respectively
.PP
.TP
\fB-g\fR \fIBD_high\fR
See \fB-f\fR for more details.
.PP
.TP
\fB-G\fR \fIBI_high\fR
See \fB-F\fR for more details.
.PP
.TP
\fB-i\fR \fIi_mul,i_add\fR
Sets the multiplier and additive values when expanding the size of
short tandem repeats containing an indel.
.PP
.TP
\fB-I\fR \fIfmt[,opt...]\fR
Specifies the input format, with any format specific options specified
as key=value pairs. See the samtools man page for a description of
these format options.
.PP
.TP
\fB-l\fR \fIqual_lower\fR
See \fB-c\fR for a description.
.PP
.TP
\fB-L\fR \fIbool\fR
If \fIbool\fR is 1 (the default), quality values for bases overlapping
high confidence consensus locations that do not match the consensus
call will have their qualities adjusted. These will either be
quantised to \fIqual_lower\fR or \fIqual_upper\fR if binary
quantisation is enabled (see \fB-B\fR) or set to \fIqual_lower\fR if
no quantisation is happening. Also see \fB-l\fR, \fB-c\fR and
\fB-u\fR options.
.PP
.TP
\fB-m\fR \fImqual\fR
Keeps all quality values for reads with mapping quality < \fImqual\fR.
.PP
.TP
\fB-M\fR \fIfloat\fR
Keep all qualities for reads at this site if >= \fIfloat\fR proportion
of indel sizes do not fit a bi-modal distribution.
.PP
.TP
\fB-O\fR \fIfmt[,opt...]\fR
Specifies the output format, with any format specific options specified
as key=value pairs. See the samtools man page for a description of
these format options.
.PP
.TP
\fB-p\fR \fIspan\fR
Applies the P-block algorithm from libCSAM. For qualities that we
wish to keep, we still have the option of reducing their fidelity
using a smoothing algorithm. For each run of quality values that have
a minimum to maximum range <= \fIspan\fR we replace them with the
midpoint of that span. Use \fB-p 0\fR to disable this feature.
.PP
.TP
\fB-P\fR \fIfloat\fR
Keeps qualities if the depth locally is \fIfloat\fR times higher than
average.
.PP
.TP
\fB-q\fR \fIqual\fR
Keep quality for bases at this position if the consensus SNP
confidence when computed without mapping quality is < \fIqual\fR.
.PP
.TP
\fB-Q\fR \fIqual\fR
Keep quality for bases at this position if the consensus SNP
confidence when computed using mapping quality is < \fIqual\fR.
.PP
.TP
\fB-r\fR \fIregion\fR
Runs crumble only on a specific region specified in chr, chr:start or
chr:start-end syntax. Note the output will only cover this region.
If you wish run crumble on an entire file but restrict which regions
are (not) modified, use the \fB-R\fR option instead.
.PP
.TP
\fB-R\fR \fIexclude.bed\fR
Keeps qualities for bases overlapping regions specified in
\fIexclude.bed\fR.
.PP
.TP
\fB-s\fR \fIs_mul,s_add\fR
Sets the multiplier and additive values when expanding the size of
short tandem repeats containing a SNP.
.PP
.TP
\fB-S\fR
Also quantises qualities in soft-clipped bases, using the parameters
specified via \fB-l\fR, \fB-c\fR and \fB-u\fR.
.PP
.TP
\fB-t\fR \fItag[,tag...]\fR
Specifies a comma separated list of tags to keep. All others are
discarded.
.PP
.TP
\fB-T\fR \fItag[,tag...]\fR
Specifies a comma separated list of tags to discard. If both \fB-t\fR
and \fB-T\fR are specified, the whitelist is applied first followed by
the blacklist.
.PP
.TP
\fB-u\fR \fIqual_upper\fR
See \fB-c\fR for a description.
.PP
.TP
\fB-V\fR \fIfloat\fR
Keep all qualities for reads at this site if < \fIfloat\fR proportion
of reads span an indel.
.PP
.TP
\fB-v\fR
Increases verbosity of output. Can be specified more than once.
.PP
.TP
\fB-x\fR \fIqual\fR
Keep quality for bases at this position if the consensus discrepancy, computed without mapping quality, is >= \fIqual\fR.
.PP
.TP
\fB-X\fR \fIqual\fR
Keep quality for bases at this position if the consensus discrepancy, computed using mapping quality, is >= \fIqual\fR.
.PP
.TP
\fB-Z\fR \fIfloat\fR
Keep all qualities for reads at this site if >= \fIfloat\fR proportion
of reads have soft-clipping.
.PP
.TP
\fB-z\fR
Do not add an @PG SAM header line.
.SH EXAMPLES
.PP
Using crumble to convert BAM to CRAM with lossy read-names and
dropping the OQ,BD and BI auxiliary tags.
.PP
.EX
crumble -O cram,lossy_names -T OQ,BD,BI in.bam out.cram
.EE
An example mpileup alignment of a short tandem repeat before and after
running crumble with -i1.0,2.
.EX
samtools mpileup -Q0 -B -r 1:1488390-1488424 CHM1_CHM13_2.15x.chr1.cram
1 1488390 N 11 g$gCGggGGgGG =<#?=7>>#,.
1 1488391 N 10 aAAaaAAaAA >#?<.==#?1
1 1488392 N 11 cCCccCCcCC^IC >#<>2==#<<;
1 1488393 N 12 tTTttTTtTTT^]T ?#0.+??#@?>=
1 1488394 N 12 gAGggGGgGGGG <#75'>>#><>9
1 1488395 N 12 tGTttTTtTTTT ?#.8=>>#>9?=
1 1488396 N 12 cCCccCCcCCCC <#><8==#==>=
1 1488397 N 12 tTTtt-1nTTtTTTT @#07(??#9>2@
1 1488398 N 12 c$CCc-1n*C-1NC+1AaCC-1NCC-1N >#::2==#==>?
1 1488399 N 11 AA*a*AaA*A* #>=2>>>>==>
1 1488400 N 11 AAaaAAaAAAA #@==>>?===>
1 1488401 N 11 AAaaAAaAAAA #>==>>>=;=>
1 1488402 N 11 A$AaaAAaAAAA #7==>>7===>
1 1488403 N 10 AaaAAaAAAA <==>>;>>=>
1 1488404 N 10 AaaAAaAAAA @==>>?>===
1 1488405 N 10 AaaAAaAAAA @>=>>>>.=>
1 1488406 N 10 AaaAAaAAAA @==>>?>>==
1 1488407 N 10 AaaAAaAAAA ===?>>>===
1 1488408 N 10 AaaAAaAAAA -==?>==>==
1 1488409 N 10 AaaAAaAAAA >==?>?>>==
1 1488410 N 10 AaaAAaAAAA 8==?>:>===
1 1488411 N 10 AaaAAaAAAA 8==?>-====
1 1488412 N 10 AaaAAaAAAA 8==?>?;===
1 1488413 N 10 AaaAAaAAAA ;==?>?4===
1 1488414 N 10 A$aaAAaAAAA ?==?>=><==
1 1488415 N 9 aaAAaAAAA >=?><>===
1 1488416 N 9 aaAAaAAAA >=??6>>==
1 1488417 N 9 aaAAaAAAA >=?>=><:=
1 1488418 N 9 aaAAaAAAA >=??>>===
1 1488419 N 10 aaAAaA$AAA^]a >=??>4=:=?
1 1488420 N 9 ttTTtTT$Tt >=>?;=>89
1 1488421 N 8 ttTTtTTt @>>.95=;
1 1488422 N 8 aaAAaAAa @=@,:;=;
1 1488423 N 8 ccCCcCCc ?=>&?:=?
1 1488424 N 8 ttTTtTTt >=@;>=>?
crumble -9p8 -l5 -u40 -i1.0,2 CHM1_CHM13_2.15x.chr1.cram crumble.cram
samtools mpileup -Q0 -B -r 1:1488390-1488424 crumble.cram
1 1488390 N 11 g$gCGggGGgGG II&IIIIIIII
1 1488391 N 10 aAAaaAAaAA IIIIIIIIII
1 1488392 N 11 cCCccCCcCC^IC IIIIIIIIIII
1 1488393 N 12 tTTttTTtTTT^]T >#66)>>#:<8<
1 1488394 N 12 gAGggGGgGGGG >#66)>>#:<8<
1 1488395 N 12 tGTttTTtTTTT >#66:>>#:<8<
1 1488396 N 12 cCCccCCcCCCC >#66:>>#:<8<
1 1488397 N 12 tTTtt-1nTTtTTTT >#66->>#:<8<
1 1488398 N 12 c$CCc-1n*C-1NC+1AaCC-1NCC-1N >#66->>#:<8<
1 1488399 N 11 AA*a*AaA*A* #66->>;:<8<
1 1488400 N 11 AAaaAAaAAAA #;6=>>;:<8<
1 1488401 N 11 AAaaAAaAAAA #;6=>>;:<8<
1 1488402 N 11 A$AaaAAaAAAA #;6=>>;:<8<
1 1488403 N 10 AaaAAaAAAA ;6=>>;:<8<
1 1488404 N 10 AaaAAaAAAA ;6=>>;:<8<
1 1488405 N 10 AaaAAaAAAA ;6=>>;:68<
1 1488406 N 10 AaaAAaAAAA ;6=>>;:68<
1 1488407 N 10 AaaAAaAAAA ;6=>>;:68<
1 1488408 N 10 AaaAAaAAAA -6=>>;:68<
1 1488409 N 10 AaaAAaAAAA ;6=>>;:68<
1 1488410 N 10 AaaAAaAAAA ;6=>>;:68<
1 1488411 N 10 AaaAAaAAAA ;6=>>-:68<
1 1488412 N 10 AaaAAaAAAA ;6=>>::68<
1 1488413 N 10 AaaAAaAAAA ;6=>>::68<
1 1488414 N 10 A$aaAAaAAAA ;6=>>::68<
1 1488415 N 9 aaAAaAAAA 6=>>::68<
1 1488416 N 9 aaAAaAAAA 6=>>::68<
1 1488417 N 9 aaAAaAAAA 6=>>::68<
1 1488418 N 9 aaAAaAAAA 6=>>::68<
1 1488419 N 10 aaAAaA$AAA^]a 6=>>::68<<
1 1488420 N 9 ttTTtTT$Tt 6=>>:68<<
1 1488421 N 8 ttTTtTTt @=>.:6<<
1 1488422 N 8 aaAAaAAa IIIIIIII
1 1488423 N 8 ccCCcCCc IIIIIIII
1 1488424 N 8 ttTTtTTt IIIIIIII
The heterozygous deletion at 1488399 is neighboured by TCTC STR to the
left and poly-A to the right, extended by an additional 2bp either
side. Qualities outside this region are replaced with Q40 while
qualities inside are smoothed linearly along each read.
.EE
.SH LIMITATIONS
.PP
\fBCrumble\fR is designed to operate on files containing a single
sample with a diploid genome of approximately equal allelic frequency.
While there are some options which may improve the use on somatic
data, notably -k, -N and -X, it is strongly recommended that you
perform your own evaluation before using Crumble on such data sets.
.SH AUTHOR
.PP
The original idea came from discussions between James Bonfield, Shane
McCarthy and Richard Durbin while at the Sanger Institute. James
Bonfield wrote the implementation.
.SH SEE ALSO
.IR samtools (1)