You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: assembler/src/projects/pathracer/README.md
+45-56Lines changed: 45 additions & 56 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,74 +9,74 @@ MANUAL
9
9
<!-- The tool finds all proper alignments rather than only the best one. -->
10
10
<!-- That allows extracting all genes satisfying HMM gene model from the assembly. -->
11
11
<!---->
12
-
**PathRacer** is a novel standalone tool that aligns profile HMM directly to the
12
+
**PathRacer** is a standalone tool that performs profile HMM alignment directly to the
13
13
assembly graph (performing the codon translation on fly for amino acid pHMMs).
14
14
The tool provides the set of most probable paths traversed by a HMM through the
15
15
whole assembly graph, regardless whether the sequence of interested is encoded
16
-
on the single contig or scattered across the set of edges, therefore
16
+
on the single contig or scattered across the set of edges, therefore
17
17
significantly improving the recovery of sequences of interest even from
18
18
fragmented metagenome assemblies.
19
19
20
20
### Input
21
-
For this moment the tool supports only _de Bruijn_ graphs in GFA format produced by **SPAdes**.
21
+
Currently the tool supports only _de Bruijn_ graphs in GFA format as produced by **SPAdes** or compatible assembler in this matter (e.g. **MEGAHIT**).
22
22
Contact us if you need some other format support.
23
23
24
-
Profile HMM should be in **HMMer3** format, but one can pass nucleotide or amino acid sequence(s) to be converted to pHMM(s) that would be equivalent
25
-
to performing Levenshtein search for each input sequence.
26
-
27
-
### Output
28
-
For each pHMM (gene) the tool reports:
29
-
30
-
-**<gene\_name>.seqs.fa**: sequences correspondent to _N_ (parameter, see below) best score paths ordered by score along with their alignment in CIGAR format
31
-
-**<gene\_name>.nucs.fa**: _(for amino acids pHHMs only)_ the same sequences in nucleotides
32
-
-**<gene\_name>.edges.fa**: unique unitig (edge) paths correspondent to best score paths above
33
-
-**<gene\_name>.{domtblout, pfamtblout, tblout}**: _(optional)_ unitig paths realignment by **HMMer3**`hmmalign` in various formats
34
-
-**event\_graph\_<gene\_name>\_component\_<component\_id>\_size\_<component\_size>.cereal**: _(optional, debug output)_ connected components of the aligned graph
35
-
-**<component\_id>.dot**: _(optional, plot)_ connected component of matched neighborhood subgraph
36
-
-**<component\_id>\_<path\_index>.dot**: _(optional, plot)_ neighborhood of the found path
37
-
38
-
In addition:
39
-
40
-
-**all.edges.fa**: unique unitig paths for all pHMMs in one file
41
-
-**pathracer.log**: log file
42
-
-**graph\_with\_hmm\_paths.gfa**: _(optional)_ input graph with annotated unitig paths
24
+
Profile HMM should be in **HMMer3** format, but one can pass nucleotide or amino acid sequences as well. These sequences will be converted to proxy pHMM. Aligning of these pHMMs would be equivalent to performing alignment using Levenshtein distance for each input sequence.
43
25
44
26
45
27
### Command line options
46
28
Required positional arguments:
47
29
48
-
1. Query gene models file (.hmm file or .fasta)
49
-
2.Graph in GFA format
50
-
3._k_ (_de Bruijn_ overlap size) for the input graph
30
+
1. Query file (.hmm file or .fasta)
31
+
2.Assembly graph in GFA format
32
+
3._k_ (_de Bruijn_vertex overlap size) for the input graph
51
33
52
34
Main options:
53
35
54
36
-`--output`, `-o` DIR: output directory
55
-
-`--hmm` | `--nt` | `--aa`: match against pHMM(s) [default] | nucleotide sequences | amino acid sequences
37
+
-`--hmm` | `--nt` | `--aa`: perform match against pHMM(s) [default] | nucleotide sequences | amino acid sequences
56
38
-`--queries` Q1 [Q2 [...]]: queries names to lookup [default: all queries from input query file]
0 commit comments