Skip to content

francicco/GenomeAnnotationWorkshop2025

Repository files navigation

GenomeAnnotationWorkshop2025

An introduction to Genome Annotation of non-model organisms

This is the github repo for the the Genome Annotation Workshop The workshop is focused on annotating genomes of non-model organisms using a custom pipeline of multiple tools. In the workshop different strategies, such as homology-based, ab initio, and de novo approaches are implemented, using a combination of short and long reads (Iso-Seq) available on NCBI. As examples, a single chromosome from three different organisms is used as a demonstration.

Installation

To install this site locally run the following commands:

Clone the repo and cd into it

git clone https://github.com/francicco/GenomeAnnotationWorkshop2024.git

Install the following software and their dependencies:

STAR | Minimap2 | Samtools | Bedtools | Diamond | Miniprot | BRAKER | gffread | Cufflinks | BUSCO | compleasm | Stringtie | Scallop | IsoQuant | Trinity | TransDecoder | Portcullis | Mikado | Miniprot2SplicedNucl.py | IsoQuantGTF2BED12v0.1.py | sam2psl.py | AnalyseTopHitCoverage.py | Analyze_Diamond_topHit_coverage.R | UniProtDB ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz

You also need ggplot2 and tidyverse libraries for R, dependecies of Analyze_Diamond_topHit_coverage.R.

The workshop

The workshop is divided into four section

  1. RNAseq mapping on the reference genome (Short-reads & Iso-Seq)
  2. Homology and evidence-based prediction of protein coding genes (PCGs) using BRAKER2
  3. De Novo annotation using Short-reads RNAseq using Trinity
  4. Ab Initio annotation using Short-reads RNAseq & Long-reads RNA-Seq
  5. Metrics to evaluate annotations, Splice-site filtering, and Annotation Consensus using Mikado

All data is available at this link, but don't forget to set up your environment!!! There is a conda env that has most of the tools, you can activate it with conda activate annotation25

I hope this will be useful, Have fun!

P.S. If you want to download the data directly to your machine from the link you need the python libray gdown. If it's not installed you can do it with pip install gdown --user.

Then simply execute: gdown --folder https://drive.google.com/drive/folders/1IreMRHaOa1kvOomyjoEm8xFw1fmOR-oK. To download a specific file use the flag --fuzzy.

For this workshop all the data should be in ~/Share.

About

An introduction to Genome Annotation of non-model organisms

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors