Skip to content

Releases: nextgenusfs/funannotate

funannotate v1.0.2

17 Jan 01:41
Compare
Choose a tag to compare
  • update to GFF to TBL parser to catch some "common" errors in GFF files
  • added funannotate iprscan which will run Docker InterProScan searches or also local searches. It will split the job into chunks and run those in parallel which seems to be a faster way to run InterProScan. By default it will chunk the proteins into 1000 protein bins and then run 4 cpus each up to as many cpus as you give the script.
  • fix to docker build (hopefully)
  • bug fixes for parsing the ncbi error report, properly outputting which genes are causing errors
  • fix for antiSMASH parsing of plantismash data

funannotate v1.0.1

10 Jan 02:06
Compare
Choose a tag to compare
  • Wrote a new GFF to TBL parser to accommodate running funannotate annotate on a fasta + GFF file.
  • Added COGs output to funannotate compare, these annotations are parsed from eggnog-mapper data
  • several minor bug fixes

funannotate v1.0.0

02 Jan 02:49
Compare
Choose a tag to compare

Major update to funannotate with new RNA-seq modules, new database download and management, new gene name/product definition module, many bug fixes.

RNA-seq modules:

  1. funannotate train: Module will run RNA-seq mediated methods for training of GeneMark/Augustus in gene prediction. It will take single or PE RNA-seq FASTQ files, run Trimmomatic quality trimming, run Trinity-mediated read normalization, run Trinity genome-guided RNAseq assembly, run PASA alignment methods. Output is BAM file, trinity transcripts, and PASA GFF3 for use in funannotate predict.
  2. funannotate update: Module will run PASA mediated gene model updates. It can be run after running train --> predict --> update, which will add UTR models and refine gene models. The script can also be run on a pre-existing GenBank assembly where it will run the funannotate train methods (quality trimming, normalization, Trinity, PASA) and then followed by the update specific methods to add UTRs, refine models, etc.

funannotate predict enhancements:

  1. Dropped use of GAG to write NCBI tbl file and wrote functions to do this natively in funannotate --> which was making mistakes on some partial gene models.
  2. Simplified NCBI tbl generation and gene model filtering --> only running tbl2asn a single time now as bad gene models are properly filtered (previously a regex search was not working perfectly resulting in some gene models being removed arbitrarily)
  3. tRNA gene length filter is now in compliance with NCBI rules (you can safely ignore tbl2asn tRNA gene length warnings --> they will eventually update tbl2asn source code)
  4. Numbers of gene models for each "source" are now printed to terminal prior to running Evidence Modeler.
  5. Script parses the NCBI error reports and show user which gene models need to be manually fixed, after the tbl file is updated, the GBK output files can be regenerated with the new funannotate fix command.

funannotate annotate enhancements:

  1. Diamond search has replaced Blast wherever possible, results in large increase in speed.
  2. HMMer searches are now split across multiple CPUs, results in increase speed.
  3. Gene names and product definitions are now parsed from UniProtKb/SwissProt results and EggNog-Mapper results. The product definitions are cross references to a community resource called gene2product which will serve as a database of curated gene product definitions.
  4. Native NCBI tbl generation results in proper annotation of partial gene models.
  5. Script will parse tbl2asn errors and alert user of gene models that need to be fixed.

New Database Management modules:

  1. Environmental variable addition: FUNANNOTATE_DB allows user to install databases locally, i.e. in a users home directly on an HPC.
  2. funannotate setup script has been re-written from scratch to control the databases, keep track of versions, and allow user to update database.
  3. funannotate database is a new command that shows you currently installed databases.
  4. Databases have been trimmed down, occupy ~ 4 GB of space.

I would recommend that all users upgrade. After upgrading, you will need to re-download the databases from scratch. As always, many bugs have been fixed and likely some new ones introduced. Please let me know if you encounter errors.

Docs/Manual/Tutorials will be available soon at http://funannotate.readthedocs.io

funannotate v0.7.2

21 Jul 14:29
Compare
Choose a tag to compare
  • fix bug in funannotate compare, string conversion to int failed on a check for number of genes
  • added better error message for duplicate locus_tag ids in funannotate compare

funannotate v0.7.1

17 Jul 14:39
Compare
Choose a tag to compare
  • fix menu in funannotate annotate that still had --email as an option -> it is not longer an option, all remotes searches moved to funannotate remote
  • fix eggnog parsing issue where COG and Description are blank -> this happens if you run diamond search with eggnog-mapper. You should run HMM search with the appropriate EggNog database, i.e. for fungi that is the fuNOG database.

funannotate v0.7.0

30 Jun 15:21
Compare
Choose a tag to compare

Release v0.7.0 notes:

funannotate predict

  • unified genbank conversion method
  • added support for repeatmasker_species option
  • added support for strain flag for genbank conversion
  • improved filtering of problematic gene models

funannotate annotate

  • removed all remote searches from script (now funannotate remote see below)
  • dropped EggNog search, instead —eggnog option will parse the results from eggnog-mapper. Eggnog-mapper does a more comprehensive search and provides some more functional annotation information than the simple HMMer search of EggNog 4.5 database
  • now outputs a tsv annotation file into the annotate_results output folder
  • improved functional annotation for Gene and Product names
  • added support for strain flag for genbank conversion

funannotate compare

  • increased speed of parsing GBK files
  • remove EggNog description mapping
  • fix links to MEROPS database in html output

funannotate remote

  • new sub command that will run remote searches
  • currently support Phobius, antiSMASH, and InterProScan
  • Note: these searches are a free service, don't abuse them. If you can install these software locally it will significantly decrease your run time. They are included here as some are Linux only and/or setup is very difficult.

funannotate setup

  • Eggnog 4.5 database no longer required

funannotate v0.6.2

16 May 20:40
Compare
Choose a tag to compare
  • added support to funannotate predict for an --other_gff option that will pass annotation directly to EVM. You can control the weight for EVM, like this --other_gff my_predictions.gff3:10, which would give the gene models a weight of 10 in EVM
  • better support for --pasa_gff passed to funannotate predict where now input is not hardcoded to have transdecoder in column 2 of the gff file. You can also control the EVM weight like this: --pasa_gff my_pasa.gff:10 to give it a weight of 10
  • BRAKER1 method now pulls out high quality Augustus models (HiQ) that have >90% exon supported by evidence, these are given a weight of 5 in EVM
  • Added a few stats for repeat masking genome as well as number of transcripts mapped
  • updated funannotate so it is compatible with new version of GAG v2.01.

funannotate v0.6.1

27 Apr 15:52
Compare
Choose a tag to compare
  • Numerous bug fixes

    • Strip asterisks from protein fasta files to avoid problems with InterProScan
    • logfiles folder was not being created if --genbank was passed to funannotate annotate
    • Linux bug where last step of funannotate predict was terminating prematurely resulting in partial output files
  • Re-write of the InterProscan parsing scripts. Now script will parse IPR Domains and GO terms directly from XML file, instead of splitting XML file and then parsing 1 by 1.

  • Great update by John Longinotto on his pybam native BAM parser which is integrated into funannotate predict to quickly check BAM headers to make sure they match FASTA headers for input into Braker

funannotate v0.6.0

31 Mar 18:22
Compare
Choose a tag to compare
  • fix tRNA gene model filtering to deal with the tbl2asn >150 error
  • improve XML parsing in funannotate compare
  • add diamond alternative for exonerate pre-filtering in funannotate predict
  • make funannotate docker compatible and create docker image
  • EggNog and BUSCO2 database are now not downloaded in the initial setup, but you can manage EggNog databases with funannotate eggnog . This was due to problems in building docker image downloaded the large databases. The scripts will download on the fly if default database is not available.
  • added some external dependency versions in funannotate check

funannotate v0.5.7

27 Feb 20:23
Compare
Choose a tag to compare
  • bug fixes for logging
  • bug fix when multiple protein evidence files are passed
  • add phobius to funannotate annotate to predict secreted proteins in combination with signalp
  • add test data genome4.fasta that can be used to test the BUSCO2 augustus training method
  • added support for checking BAM reference sequence headers if they match the genome FASTA headers, this only happens if BAM file passed to --rna_bam