Skip to content

Commit 671e7fe

Browse files
Jon PalmerJon Palmer
Jon Palmer
authored and
Jon Palmer
committed
faqs
1 parent ac43348 commit 671e7fe

File tree

1 file changed

+4
-4
lines changed

1 file changed

+4
-4
lines changed

docs/faqs.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -11,10 +11,10 @@ Prokaryotes are not supported -> use [Prokka](https://github.com/tseemann/prokka
1111

1212
###4) How does funannotate train Augustus?
1313
Training Augustus is not very easy. There are several ways to do it in funannotate and the script will automatically pick a training path based on your input data. For all of these training steps, the more evidence you can provide the better your training will be (`--protein_evidence` and `--transcript_evidence`). This is how the "logic" in the script is setup.
14-
1) If you pass a valid pre-trained species to `--augustus_species` or there is already one trained (`--species "Aspergillus nidulans"` will essentially be turned into `--augustus_species aspergillus_nidulans`) then the scripts will NOT train Augustus and will use the pre-trained parameters. Note you can check which species have been pretrained with `funannotate species`.
15-
2) If you provide a coordinate sorted BAM file via `--rna_bam`, Augustus and GeneMark will be trained using BRAKER1.
16-
3) If you provide a PASA GFF file via `--pasa_gff` then Augustus will be trained using these PASA gene models.
17-
4) If you don't have PASA or a RNAseq BAM file, then Augustus will be trained using BUSCO2. The `--busco_seed_species` option is for passing the most closely related pre-trained Augustus species parameter to BUSCO2 to improve its de novo prediction. Funannotate uses a modified training regime where it takes BUSCO2 'Complete' models, de novo GeneMarkES models, and evidence in those regions and runs EvidenceModeler to predict gene models. The models are then confirmed using BUSCO2 and a subset are used for training Augustus.
14+
* If you pass a valid pre-trained species to `--augustus_species` or there is already one trained (`--species "Aspergillus nidulans"` will essentially be turned into `--augustus_species aspergillus_nidulans`) then the scripts will NOT train Augustus and will use the pre-trained parameters. Note you can check which species have been pretrained with `funannotate species`.
15+
* If you provide a coordinate sorted BAM file via `--rna_bam`, Augustus and GeneMark will be trained using BRAKER1.
16+
* If you provide a PASA GFF file via `--pasa_gff` then Augustus will be trained using these PASA gene models.
17+
* If you don't have PASA or a RNAseq BAM file, then Augustus will be trained using BUSCO2. The `--busco_seed_species` option is for passing the most closely related pre-trained Augustus species parameter to BUSCO2 to improve its de novo prediction. Funannotate uses a modified training regime where it takes BUSCO2 'Complete' models, de novo GeneMarkES models, and evidence in those regions and runs EvidenceModeler to predict gene models. The models are then confirmed using BUSCO2 and a subset are used for training Augustus.
1818

1919
###5) Funannotate said I should manually fix problematic gene models, how???
2020
In the 'predict_results' folder you will find the output from `funannotate predict` which is composed of a GenBank flatfile, feature table file, GFF3, proteins, transcripts, as well as 3 error reports from tbl2asn. Gene models that show up as ERROR in the error.summary.txt file MUST be fixed prior to submission to NCBI. All errors listed as FATAL in the discrepency.report.txt must also be fixed (with the exception of FATAL: DISC_BACTERIAL_PARTIAL_NONEXTENDABLE_PROBLEMS). I try to parse the errors where I can automatically provide fixes or removing the gene models, however there are lots of tbl2asn errors I've either never seen before or don't know how to fix automatically. Here is how you can fix those problematic gene models:

0 commit comments

Comments
 (0)