Releases: nextgenusfs/funannotate
funannotate v1.0.2
- update to GFF to TBL parser to catch some "common" errors in GFF files
- added
funannotate iprscan
which will run Docker InterProScan searches or also local searches. It will split the job into chunks and run those in parallel which seems to be a faster way to run InterProScan. By default it will chunk the proteins into 1000 protein bins and then run 4 cpus each up to as many cpus as you give the script. - fix to docker build (hopefully)
- bug fixes for parsing the ncbi error report, properly outputting which genes are causing errors
- fix for antiSMASH parsing of plantismash data
funannotate v1.0.1
- Wrote a new GFF to TBL parser to accommodate running
funannotate annotate
on a fasta + GFF file. - Added COGs output to
funannotate compare
, these annotations are parsed from eggnog-mapper data - several minor bug fixes
funannotate v1.0.0
Major update to funannotate with new RNA-seq modules, new database download and management, new gene name/product definition module, many bug fixes.
RNA-seq modules:
funannotate train
: Module will run RNA-seq mediated methods for training of GeneMark/Augustus in gene prediction. It will take single or PE RNA-seq FASTQ files, run Trimmomatic quality trimming, run Trinity-mediated read normalization, run Trinity genome-guided RNAseq assembly, run PASA alignment methods. Output is BAM file, trinity transcripts, and PASA GFF3 for use infunannotate predict
.funannotate update
: Module will run PASA mediated gene model updates. It can be run after running train --> predict --> update, which will add UTR models and refine gene models. The script can also be run on a pre-existing GenBank assembly where it will run thefunannotate train
methods (quality trimming, normalization, Trinity, PASA) and then followed by theupdate
specific methods to add UTRs, refine models, etc.
funannotate predict
enhancements:
- Dropped use of GAG to write NCBI tbl file and wrote functions to do this natively in funannotate --> which was making mistakes on some partial gene models.
- Simplified NCBI tbl generation and gene model filtering --> only running tbl2asn a single time now as bad gene models are properly filtered (previously a regex search was not working perfectly resulting in some gene models being removed arbitrarily)
- tRNA gene length filter is now in compliance with NCBI rules (you can safely ignore tbl2asn tRNA gene length warnings --> they will eventually update tbl2asn source code)
- Numbers of gene models for each "source" are now printed to terminal prior to running Evidence Modeler.
- Script parses the NCBI error reports and show user which gene models need to be manually fixed, after the tbl file is updated, the GBK output files can be regenerated with the new
funannotate fix
command.
funannotate annotate
enhancements:
- Diamond search has replaced Blast wherever possible, results in large increase in speed.
- HMMer searches are now split across multiple CPUs, results in increase speed.
- Gene names and product definitions are now parsed from UniProtKb/SwissProt results and EggNog-Mapper results. The product definitions are cross references to a community resource called gene2product which will serve as a database of curated gene product definitions.
- Native NCBI tbl generation results in proper annotation of partial gene models.
- Script will parse tbl2asn errors and alert user of gene models that need to be fixed.
New Database Management modules:
- Environmental variable addition:
FUNANNOTATE_DB
allows user to install databases locally, i.e. in a users home directly on an HPC. funannotate setup
script has been re-written from scratch to control the databases, keep track of versions, and allow user to update database.funannotate database
is a new command that shows you currently installed databases.- Databases have been trimmed down, occupy ~ 4 GB of space.
I would recommend that all users upgrade. After upgrading, you will need to re-download the databases from scratch. As always, many bugs have been fixed and likely some new ones introduced. Please let me know if you encounter errors.
Docs/Manual/Tutorials will be available soon at http://funannotate.readthedocs.io
funannotate v0.7.2
- fix bug in
funannotate compare
, string conversion to int failed on a check for number of genes - added better error message for duplicate locus_tag ids in
funannotate compare
funannotate v0.7.1
- fix menu in
funannotate annotate
that still had--email
as an option -> it is not longer an option, all remotes searches moved tofunannotate remote
- fix eggnog parsing issue where COG and Description are blank -> this happens if you run
diamond
search with eggnog-mapper. You should run HMM search with the appropriate EggNog database, i.e. for fungi that is the fuNOG database.
funannotate v0.7.0
Release v0.7.0 notes:
funannotate predict
- unified genbank conversion method
- added support for
repeatmasker_species
option - added support for strain flag for genbank conversion
- improved filtering of problematic gene models
funannotate annotate
- removed all remote searches from script (now
funannotate remote
see below) - dropped EggNog search, instead
—eggnog
option will parse the results from eggnog-mapper. Eggnog-mapper does a more comprehensive search and provides some more functional annotation information than the simple HMMer search of EggNog 4.5 database - now outputs a tsv annotation file into the
annotate_results
output folder - improved functional annotation for Gene and Product names
- added support for strain flag for genbank conversion
funannotate compare
- increased speed of parsing GBK files
- remove EggNog description mapping
- fix links to MEROPS database in html output
funannotate remote
- new sub command that will run remote searches
- currently support Phobius, antiSMASH, and InterProScan
- Note: these searches are a free service, don't abuse them. If you can install these software locally it will significantly decrease your run time. They are included here as some are Linux only and/or setup is very difficult.
funannotate setup
- Eggnog 4.5 database no longer required
funannotate v0.6.2
- added support to
funannotate predict
for an--other_gff
option that will pass annotation directly to EVM. You can control the weight for EVM, like this--other_gff my_predictions.gff3:10
, which would give the gene models a weight of 10 in EVM - better support for
--pasa_gff
passed tofunannotate predict
where now input is not hardcoded to havetransdecoder
in column 2 of the gff file. You can also control the EVM weight like this:--pasa_gff my_pasa.gff:10
to give it a weight of 10 - BRAKER1 method now pulls out high quality Augustus models (HiQ) that have >90% exon supported by evidence, these are given a weight of 5 in EVM
- Added a few stats for repeat masking genome as well as number of transcripts mapped
- updated funannotate so it is compatible with new version of GAG v2.01.
funannotate v0.6.1
-
Numerous bug fixes
- Strip asterisks from protein fasta files to avoid problems with InterProScan
- logfiles folder was not being created if
--genbank
was passed tofunannotate annotate
- Linux bug where last step of
funannotate predict
was terminating prematurely resulting in partial output files
-
Re-write of the InterProscan parsing scripts. Now script will parse IPR Domains and GO terms directly from XML file, instead of splitting XML file and then parsing 1 by 1.
-
Great update by John Longinotto on his pybam native BAM parser which is integrated into
funannotate predict
to quickly check BAM headers to make sure they match FASTA headers for input into Braker
funannotate v0.6.0
- fix tRNA gene model filtering to deal with the
tbl2asn
>150 error - improve XML parsing in
funannotate compare
- add
diamond
alternative forexonerate
pre-filtering infunannotate predict
- make
funannotate
docker compatible and create docker image - EggNog and BUSCO2 database are now not downloaded in the initial setup, but you can manage EggNog databases with
funannotate eggnog
. This was due to problems in building docker image downloaded the large databases. The scripts will download on the fly if default database is not available. - added some external dependency versions in
funannotate check
funannotate v0.5.7
- bug fixes for logging
- bug fix when multiple protein evidence files are passed
- add phobius to funannotate annotate to predict secreted proteins in combination with signalp
- add test data
genome4.fasta
that can be used to test the BUSCO2 augustus training method - added support for checking BAM reference sequence headers if they match the genome FASTA headers, this only happens if BAM file passed to
--rna_bam