You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: README.md
+38-3
Original file line number
Diff line number
Diff line change
@@ -45,8 +45,43 @@ inputs:
45
45
Each directory as given by `path` and `runs`, i.e. `/data/subjectXY/run_1` and so on, is traversed by WarpSTR to find .bam files and .fast5 files.
46
46
47
47
## Output
48
+
The upper path for output is given in the .yaml configuration file as `output` element. Outputs are separated for each locus as subdirectories of this upper path, where names of subdirectories are the same as the locus name.
48
49
49
-
... TBA
50
+
The output structure for one locus is as follows:
51
+
```
52
+
alignments/ # contains alignments of template flanks with reads
53
+
expected_signals/ # contains template flanks as sequences and expected signals
54
+
fast5/ # signals extracted as encompasssing the locus, stored as signle .fast5 files
55
+
predictions/ # contains visualizations of automaton alignments and basecalled sequences (see below)
56
+
summaries/ # contains visualizations produced in the last summarizing phase (see below)
57
+
overview.csv # .csv file with read information and output
58
+
```
59
+
60
+
Some output files are optional and can be controlled by the .yaml config file.
61
+
62
+
### Predictions
63
+
In the `predictions` directory of each locus there would be a large variety of outputted files in other subdirectories.
64
+
65
+
In **basecalls** subdirectory are output files related to basecalling, such as `all.fasta` containing basecalled sequences of all reads encompassing the locus as given by SAM/BAM, `basecalls_all.fasta` containing only reads in which flanks were found. This file is further split per strand into `basecalls_reverse.fasta` and `basecalls_template.fasta`. In case of running muscle for MSA - multiple sequence alignment (controlled by advanced_params config), there would be `msa_all.fasta` file with MSA. In case of running summarizing, there would be `group1.fasta` and `group2.fasta` files where would be basecalled sequences split into groups as summarized by the last step of WarpSTR. In such case MSA output would be also created only for basecalled sequences of each group.
66
+
67
+
In `complex_repeat_units.csv` file there is counter for each repeat structure of the complex STR locus. Each row denote a read, and in columns are counts for repeat structures.
68
+
69
+
In **sequences** subdirectory there is analogous information as in **basecalls** subdirectory, but the information is not produced from the basecalled sequences but from sequences as given by WarpSTR.
70
+
71
+
In **DTW_alignments** subdirectory there are visualized alignments of STR signal with automaton (in both stages). Visualizations are truncated to first 2000 values.
72
+
73
+
### Summaries
74
+
In the `summaries` directory of each locus there is a myriad of optional visualizations:
75
+
76
+
```
77
+
alleles.svg - Summarized predictions of repeat lengths in 1 or 2 groups and for WarpSTR and basecall.
78
+
collapsed_predictions.svg - Complex repeat structure counts, only for WarpSTR.
79
+
collapsed_predictions_strand.svg - As above, but further split by strand.
80
+
complex_genotypes.svg - Summarized complex repeat structure counts in 1 or 2 groups.
81
+
predictions_cost.svg - Scatterplot of state-wise cost and allele lengths.
82
+
predictions_phase.svg - Violinplots of repeat lengths in the first and second phase.
83
+
predictions_strand.svg - Violinplots of repeat lengths as split by strand.
84
+
```
50
85
51
-
###Additional information
52
-
Newer .fast5 files are usually VBZ compressed, therefore VBZ plugin for HD5 is required to be installed, so WarpSTR can handle such files. See `https://github.com/nanoporetech/vbz_compression`.
86
+
## Additional information
87
+
Newer .fast5 files are usually VBZ compressed, therefore VBZ plugin for HD5 is required to be installed, so WarpSTR can handle such files. See `https://github.com/nanoporetech/vbz_compression`.
0 commit comments