Ortholog Synteny Assignment Tool - A Python tool to refine orthologous groups using synteny information inferred from genome
annotation files (BED converted from GFF3/GTF). This is a Python re-implementation of the OrthoRefine, which is written in C++
but had some issues with sample matching and memory usage. This tool is designed to be more efficient, easier to use, and more
flexible for custom analyses. In addition, to ensure optimal processing speed and memory usage, the core synteny analysis is
implemented in Rust. We also provide a companion visualization tool, orthosynassign-vis, for users to verify the results. This
refined version includes improved memory management and a streamlined workflow for assigning syntenic regions, addressing
limitations identified in the original OrthoRefine implementation. Specifically, it incorporates optimized data structures for
handling genomic ranges and utilizes more efficient algorithms for finding overlaps between ortholog groups and genomic segments.
First, install the package following the instruction below.
orthosynassign is the main program for running the analysis. It takes the OrthoFinder-style orthogroup.tsv or the N0.tsv
under phylogenetic hierarchical orthogroups directory and output the refined orthogroups with synteny information determined using
the genome annotation files.
Most genome annotation files are distributed in GFF3 or GTF formats. However, the high degree of flexibility in the 9th attribute column often makes it challenging to parse specific protein IDs and match them to entries in an orthogroup file.
To simplify this process, we provide a utility script, misc/gff2bed.bash, which converts GFF3 files into a standardized BED format. This script ensures that genomic coordinates are correctly linked to the protein IDs used by OrthoFinder. If a gene contains multiple isoforms, the script collapses them into a single entry. In this case, the 4th column of the BED file will contain all associated protein IDs, concatenated using a semicolon (;) as a delimiter.
[!IMPORTANT] If you choose to prepare your own BED files manually, you must use a semicolon (;) to separate protein IDs for multiple isoforms. orthoSynAssign is specifically programmed to use this delimiter to resolve isoform-related mapping issues automatically.
The orthogroup.tsv or N0.tsv file from OrthoFinder should be tab-separated with:
- First column: Orthogroup ID (e.g., OG0000001)
- Subsequent columns: Protein IDs for each species (column headers are species names)
Please use orthosynassign --help to see all available options and arguments:
Required arguments:
--og_file OG_FILE Path to OrthoFinder Orthogroups.tsv file
--bed file [files ...]
Path of BED formatted genome annotation files
Options:
-w, --window WINDOW Controls how many total genes are considered when determining synteny for a single gene (default: 8)
-r, --ratio_threshold THRESHOLD
Controls how many genes within a window must provide synteny support to classify the genes being compared as syntenous (default: 0.5)
-o, --output OUTPUT Output of results (default: Refined_SOGs-[YYYYMMDD-HHMMSS].tsv (UTC timestamp))
-t, --threads THREADS
Number of cpus to use (default: 4)
-v, --verbose Enable verbose logging
-V, --version show program's version number and exit
-h, --help show this help message and exit
We provided some example files in directory example, which contains three BED annotations and a orthogroup file:
FungiDB-68_AfumigatusA1163.bed
FungiDB-68_AfumigatusAf293.bed
FungiDB-68_AnovofumigatusIBT16806.bed
orthogroups.tsv
Use the following command to run the refinement process:
orthosynassign --og_file orthogroups.tsv --bed *.bed -o Refined_SOGs.tsvThe refined result will output to Refined_SOGs.tsv.
orthosynassign-vis is a companion visualization script to verify the refined results of orthosynassign. It utilizes the
pyGenomeViz to plot the orthogroups and their synteny relationships. It takes the original, unrefined orthogroup.tsv file along
with the refined orthogroup file to plot a certain set of refined orthogroups using their previous orthogroup IDs as the labels
for each gene in the plot. Please use orthosynassign-vis --help to see all available options and arguments:
Required arguments:
--og_file OG_FILE Path to the original orthogroups.tsv file
--sog_file SOG_FILE Path to the refined orthogroups.tsv file
--bed file [files ...]
Path of BED formatted genome annotation files
--sog SOG [SOG ...] Plot the SOG of the previous orthosynassign analysis
Options:
-w, --window WINDOW The window size applied to the previous orthosynassign analysis (default: 8)
-o, --output OUTPUT Output directory (default: visualize_[sog_file])
-f, --fmt {png,jpg,svg,pdf}
Output image format. (default: png)
-k, --keep_all_genes Keep genes that are not assigned to any orthogroup
-v, --verbose Enable verbose logging
-V, --version show program's version number and exit
-h, --help show this help message and exit
The example directory contains another refined orthogroup file - Refined_SOGs.tsv, say if we want to verify one of the refined
orthogroup SOG000039.OG0000040:
orthosynassign-vis --og_file orthogroups.tsv --sog_file Refined_SOGs.tsv --bed *.bed --sog SOG000039.OG0000040 -f svgThe figure will output to visualize_Refined_SOGs/SOG000039.OG0000040.svg. In this figure, the genes of the observed refined
orthogroup are labelled in yellow; genes assigned to the same orthogroup within this given window are labelled in other chromatic
colors; genes with orthologs in other genomes located outside the given window are labelled in gray.
- Python >= 3.9, < 3.14
- numpy >= 2.0.0
- pyGenomeViz >= 1.6.0
Clone through ssh
git clone git@github.com:stajichlab/orthoSynAssign.gitor https
git clone https://github.com/stajichlab/orthoSynAssign.gitNavigate to the project directory and install the package.
cd orthoSynAssign
pip install .Developers should clone the project directly and install the package with dev flag. Please also set up the pre-commit first before making commit.
pip install -e ".[dev]"
pre-commit installIf you use orthoSynAssign in your research, please cite:
Contributions are welcome! Please feel free to submit a Pull Request.