Skip to content

sanger-tol/busco_painter

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

105 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Lep busco painter

Paints chromosomes of lepidopteran genomes with BUSCOs.

This repo is forked from charlottewright/lep_busco_painter to enable use of the scripts and data in sanger-tol pipelines, this also allows us to automate container building based on scripts that derive from this original work.

Additions are:

  • Moved scripts to the src folder
  • Script formats updated and versions added.
  • Added Python linting with Ruff as GitHub Action
  • Added R linting with LintR

Installation

conda env create -n buscopaint python=3.9 
conda activate buscopaint
conda install samtools 
conda install -c conda-forge r-base
conda install -c r r-tidyverse
conda install -c bioconda r-optparse

Running the scripts

1. Assign each BUSCO to a chromosome

buscopainter.py takes the full_table.tsv output file generated by BUSCOs for a "reference" genome and a "query" genome, along with an optional prefix (specified with -p, default "buscopainter") snf assigns each BUSCO to a chromosome and states whether it belongs to the dominant group of BUSCOs per chromosome ('self') or not.

buscopainter.py -r test_data/ilAglIoxx1_full_table.tsv -q test_data/ilApoTurb1_full_table.tsv
buscopainter.py -r test_data/Merian_elements_full_table.tsv -q test_data/ilApoTurb1_full_table.tsv

It will write three TSV files:

  • [PREFIX]_complete_summary.tsv which contains a summary of the chromosomal assignments
  • [PREFIX]_complete_location.tsv which contains the location and status of all shared complete BUSCOs.
  • [PREFIX]_duplicated_location.tsv which contains the location and status of all duplicated BUSCOs.

2. Plotting

The [PREFIX]_location.tsv files can be plotted using plot_buscopainter.R. This plots the chromosomes of the query genome as rectangles and paints the positions of complete/duplicated BUSCOs as lines which are coloured by their assigned chromosome in the reference genome. This script has one required argument - thelocation.tsv file. Optional arguments are:

  • Plot title (-p)
  • Index file (-i) - enables chromosomes to be drawn to size (rather than based on the last orthologs position)
  • Merian element mode (-m True) - paint chromosomes with Merian elements rather than query genome orthologs
  • Only plot differences mode (-d True) - only paint orthologs which do not belong to the dominant chromosome based on the reference
  • Custom threshold of orthologs (-n) - minimum number of orthologs on a given query chromosome for it to be displayed (this helps to filter out unplaced scaffolds). Default is >=3 orthologs.
plot_buscopainter.R -f ilAglIoxx1_complete_location.tsv -p 'ilAglIoxx1' 
plot_buscopainter.R -f ilAglIoxx1_complete_location.tsv -p 'ilAglIoxx1' -i ilAglIoxx1.fai -m True -d True

Full usage:

Options:
	-f CHARACTER, --file=CHARACTER
		location.tsv file

	-p CHARACTER, --prefix=CHARACTER
		prefix for plot title

	-i CHARACTER, --index=CHARACTER
		genome index file

	-m CHARACTER, --merians=CHARACTER
		use this flag if you are comparing a genome to Merian elements

	-d CHARACTER, --differences=CHARACTER
		only colour orthologs that have moved from the dominant chromosome

	-n NUMBER, --minimum=NUMBER
		minimum number of orthologs 

	-h, --help
		Show this help message and exit

NB: the index file can be generated via samtools faidx fasta.

Example output

Comparison of two genomes - painting all shared single-copy orthologs.

Comparison of one genome to Merian elements - painting only single-copy orthologs that have moved relative to Merian elements.

About

Paint chromosomes with BUSCOs

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

  • R 60.6%
  • Python 37.8%
  • Dockerfile 1.6%