Skip to content

nextgenusfs/gfftk

Repository files navigation

Latest Github release Conda Code style: black Tests codecov

GFFtk: genome annotation tool kit

GFFtk is a comprehensive toolkit for working with genome annotation files in GFF3, GTF, and TBL formats. It provides powerful conversion, filtering, and manipulation capabilities for genomic data.

Features

  • Format Conversion: Convert between GFF3, GTF, TBL, and GenBank formats
  • Combined GFF3+FASTA: Support for combined files containing both annotations and sequences
  • Sequence Extraction: Extract protein and transcript sequences from annotations
  • Advanced Filtering: Filter annotations using flexible regex patterns
  • Consensus Models: Generate consensus gene models from multiple sources
  • Non-Standard Features: Support for intron, noncoding_exon, five_prime_UTR_intron, and pseudogenic_exon features
  • File Manipulation: Sort, sanitize, and rename features in annotation files

Installation

To install release versions use the pip package manager:

python -m pip install gfftk

To install the most updated code in master you can run:

python -m pip install git+https://github.com/nextgenusfs/gfftk.git

Quick Start

Basic Format Conversion

# Convert GFF3 to GTF
gfftk convert -i input.gff3 -f genome.fasta -o output.gtf

# Extract protein sequences
gfftk convert -i input.gff3 -f genome.fasta -o proteins.faa --output-format proteins

Combined GFF3+FASTA Format

# Create a combined file from separate GFF3 and FASTA files
gfftk convert -i input.gff3 -f genome.fasta -o combined.gff --output-format combined

# Read a combined file (no separate FASTA file needed)
gfftk convert -i combined.gff -o output.gff3 --output-format gff3

Advanced Filtering

# Keep only kinase genes
gfftk convert -i input.gff3 -f genome.fasta -o kinases.gff3 --grep product:kinase

# Remove augustus predictions
gfftk convert -i input.gff3 -f genome.fasta -o filtered.gff3 --grepv source:augustus

# Case-insensitive filtering with regex
gfftk convert -i input.gff3 -f genome.fasta -o results.gff3 --grep product:KINASE:i

# Combined filtering
gfftk convert -i input.gff3 -f genome.fasta -o filtered.gff3 \
    --grep product:kinase --grepv source:augustus

Filter Pattern Syntax

  • key:pattern - Basic string matching
  • key:pattern:i - Case-insensitive matching
  • key:regex - Regular expression patterns
  • Multiple --grep or --grepv flags for complex filtering

Common filter keys: product, source, name, note, contig, strand, type, db_xref, go_terms

For more examples and detailed documentation, see the tutorial.

Development

Code Formatting

This project uses pre-commit to ensure code quality and consistency. The pre-commit hooks run Black (code formatter), isort (import sorter), and flake8 (linter).

To set up pre-commit:

  1. Install pre-commit:
pip install pre-commit
  1. Install the git hooks:
pre-commit install
  1. (Optional) Run against all files:
pre-commit run --all-files

After installation, the pre-commit hooks will run automatically on each commit to ensure your code follows the project's style guidelines.