Repository for "Host origin is a determinant of parallel evolution between influenza virus gene segments" by Jones & Lakdawala (bioRxiv 2022). Includes raw data and source code for analysis of parallel evolution between gene segments of human and avian influenza viruses.
The code provided in this repository can be used to reconstruct and analyze convergence between phylogenetic trees of gene and protein sequences. Input FASTA files from H9 viruses are provided as examples. Source code for tree reconstruction and analysis is broadly generalizable to any set of alignments and is presented separately from sequence processing and selection.
This folder contains the raw data that were processed and analyzed in this study. This includes H9 virus FASTA files sourced from the Influenza Research Database. FASTA files are provided as raw unprocessed data in the 'Pre-processed FASTA Files' folder as well as fully aligned sequences ready for analysis in the 'Post-processing Alignments' folder. Human H3N2 virus FASTA files were sourced from an earlier study.
This folder contains source code for the initial processing of raw FASTA files and selection of sequences from different hosts. This code can be run directly with files provided in the Data folder under the sub-folder 'Pre-processed FASTA Files'.
This folder contains source code for tree reconstruction and tree distance calculation. This code can be run with avian H9 virus alignments generated using FASTA files provided in the 'Data' folder and source code included in the 'Sequence Selection' folder. Alternatively, this code can be run directly with alignments provided in the 'Post-processing Alignments' folder or with alignments generated by the user.