Skip to content

asadprodhan/Average-Nucleotide-Identity-ANI-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Average Nucleotide Identity Analysis for Diagnosis

M. Asaduzzaman Prodhan*

DPIRD Diagnostics and Laboratory Services, Department of Primary Industries and Regional Development
3 Baron-Hay Court, South Perth, WA 6151, Australia
*Correspondence: Asad.Prodhan@dpird.wa.gov.au

License GPL 3.0 ORCID


Average Nucleotide Identity (ANI) analysis calculates the percentage of nucleotide identity among the supplied nucleotide sequences. It produces a square matrix of the calculated values. This matrix allows for pairwise comparisons among the nucleotide sequences and helps determine their similarities.


Contents


There are several methods to calculate the ANI:

  • ANIb (based on BLAST algorithm)

  • ANIm (based on MUMmer algorithm)

  • TETRA (based on tetranucleotide signature occurrences)


ANI tools

There are several tools available for ANI analysis (Figueras et al., 2014). For example:


How to run pyani

  • If you are working on HPC Cluster, load the required version of python

    module load cray-python/3.10.10
    
  • Create a conda environment with the compatible version of python, matplotlib and pyani

    conda create -n pyani_env python=3.10 "matplotlib<=3.7" "pyani>=0.2.12" -c bioconda
    
  • Activate the ani environment

    conda activate pyani_env
    
  • Alternatively, you can use my conda environment for pyani

  • Download it HERE

  • Then activate it as follows

    conda env create -f pyani_env.yml 
    
  • Check it has been installed. Copy the following command and hit enter

    average_nucleotide_identity.py --help
    

The above command will show the flags/options of the pyani program

  • Install dos2unix for changing file format

    conda install conda-forge::dos2unix
    
  • Check it has been installed. Copy the following command and hit enter

    dos2unix
    
  • Make two metadata files and name them as ‘classes.txt’ (Fig. 1) and ‘labels.txt’ (Fig. 2)


Figure 1. Classes


Figure 2. Labels

Note, the first column is the nucleotide sequences names

Second column is the label of the nucleotide sequences


  • Make a directory and name it as ‘ANI’ for example

  • Within the ‘ANI’ directory, make another directory and name it as ‘genomes’ for example

  • Keep all the nucleotide sequences, ‘classes.txt’ and ‘labels.txt’ in the ‘genomes’ directory

  • Check the line terminator of the ‘classes.txt’ and ‘labels.txt’ files as follows

file *.txt
  • If ‘classes.txt’ and ‘labels.txt’ have CRLF (Windows) format, then convert them into Unix format as follows:
dos2unix *.txt
  • Run the following command from the ‘ANI’ directory
average_nucleotide_identity.py -i genomes -o output_ANI --labels genomes/labels.txt --classes genomes/classes.txt -g --gmethod seaborn --gformat pdf,png -v -l ba_ANI.log
  • Note that you do not make the output directory beforehand. Otherwise, the command will exit with an ‘overwriting’ error

  • Command reference: widdowquinn/pyani#56


Results

The final output of the ANI analysis looks like this (Fig. 3):


Figure 3. Results


References

Figueras, M.J., Beaz-Hidalgo, R., Hossain, M.J., Liles, M.R., 2014. Taxonomic Affiliation of New Genomes Should Be Verified Using Average Nucleotide Identity and Multilocus Phylogenetic Analysis. Genome Announc 2, e00927-14. https://doi.org/10.1128/genomeA.00927-14

Richter, M., Rossello´-Mo´ra, R., 2009. Shifting the genomic gold standard for the prokaryotic species definition | Proceedings of the National Academy of Sciences. PNAS 106, 19126–19131. https://doi.org/10.1073/pnas.0906412106

About

Average Nucleotide Identity (ANI) analysis

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published