For more information, see: "Efficient detection and typing of phage-plasmids"
Here you can find the documentation, scripts, and data necessary for the identification of phage-plasmids (P-Ps):
Data and scripts required to reproduce analysis and figures of our study are in 'Publication_related_data'.
tyPPing is a user-friendly, fast and accurate method to detect P-Ps. Currently it finds P-Ps of the type AB_1, P1_1, P1_2, N15, SSU5, pMT1, pCAV, pSLy3, pKpn, and cp32. It uses protein profiles to search sequences for patterns (frequency and compositional sets) of conserved P-P proteins. If a match also fits the typical size range, it is predicted as a P-P with a distinct confidence.
MM-GRC (multi-model gene repertoire clustering) is our first method to classify P-Ps (see PMID: 33590101). It is an integrated approach combining functional annotation (with phage- and plasmid-specific HMM profiles), machine learning (random forest) models, and which was complemented with an exhaustive literature review. It relies on the gene repertoire relatedness to type P-Ps, and detects various types including diverse communities and unrelated putative P-Ps (singletons).
Here we describe how to use geNomad and vConTACT v2 for detecting and typing P-Ps.
geNomad classifies nucleotide sequences as phages, integrated prophages, or plasmids. In our study, we used it to analyze sequences of a plasmid database, cases classed as phages were considered as potential P-Ps.
vConTACT v2 is clusters viral genomes with a reference dataset using their shared gene content. Here, we used it to group the putative P-Ps identified by geNomad with 1416 P-Ps that we typed in previous work.
Scripts, data, and supplementary materials to reproduce the figures and analyses presented in the publication.
Further files are available at the Zenodo repository.
-
tyPPing_signature_profiles.hmm– 763 HMM profiles (concatenated) specific to the 10 P-P types. Used for protein-to-profile comparison (needed for tyPPing). -
phage.hmm– phage-specific HMM profiles (required for MM-GRC). -
models/– random forest models trained to detect P-Ps in plasmid datasets (used by MM-GRC). -
g2g_plot_tables/andtyPPing_criteria_tables/contain the tables required to produce figures of our study using the scriptsall_g_to_g_plots_filtered.Randfigures_methods.R(inPublication_related_data/). -
draft_genomes_analysis/– 12 complete P-P genomes that we detected in 9 draft genomes of carbapenem-resistant Enterobacteriales species. We used a modified version of tyPPing,tyPPing_for_draft_genomes.R, on drafts assembled by short and long reads, and on hybrid assemblies.
If you like tyPPing and you use it for your work, please cite: "Efficient detection and typing of phage-plasmids"