Skip to content

Commit b22ae4b

Browse files
author
Jon Palmer
committed
update to include some repeat filtering options
1 parent cf1799b commit b22ae4b

File tree

1 file changed

+12
-0
lines changed

1 file changed

+12
-0
lines changed

docs/predict.rst

+12
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,9 @@ Gene prediction in funannotate is dynamic in the sense that it will adjust based
88

99
Note that as of funannotate v1.4.0, repeat masking is decoupled from :code:`funannotate predict`, thus predict is expecting that your genome input (:code:`-i`) is softmasked multi-FASTA file. RepeatModeler/RepeatMasker mediated masking is now done with the :code:`funannotate mask` command. You can read more about repeat masking here: :ref:`repeatmasking`
1010

11+
Explanation of steps in examples:
12+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
13+
1114
**1. Genome fasta file, Trinity transcripts, RNAseq BAM file and PASA/transdecoder data.**
1215

1316
.. code-block:: none
@@ -70,6 +73,15 @@ Note that as of funannotate v1.4.0, repeat masking is decoupled from :code:`funa
7073
7. Convert to GenBank format using tbl2asn
7174
8. Parse NCBI error reports and alert user to invalid gene models
7275

76+
How are repeats used/dealt with:
77+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
78+
Repetitive regions are parsed from the softmasked genome fasta file -- these data are then turned into a BED file. The softmasked genomes are then passed to the *ab initio* predictors Augustus and GeneMark which each have their internal ways of working with the data -- which according to the developers is preferential than hard masking the sequences.
79+
80+
- `--soft_mask` option controls how GeneMark deals with repetitive regions. By default this set to `2000` which means that GeneMark skips prediction on repeat regions shorter than 2 kb.
81+
82+
- `--repeats2evm` option passes the repeat GFF3 file to Evidence Modeler. This option is by default turned off this can too stringent for many fungal genomes that have high gene density. You might want to turn this option on for larger genomes or those that have a high repeat content.
83+
- `--repeat_filter` is an option that controls how funannotate filters out repetitive gene models. Default is to use both overlap and blast filtering -- overlap filtering uses the repeat BED file and drops gene models that are more than 90% contained within a repeat region while the blast filtering compares the amino acid sequences to a small database of known transposons.
84+
7385

7486
Explanation of inputs and options:
7587
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

0 commit comments

Comments
 (0)