Skip to content

Add PhyloPhlAn tool#1639

Closed
neo417 wants to merge 4 commits intobgruening:masterfrom
neo417:phylophlan
Closed

Add PhyloPhlAn tool#1639
neo417 wants to merge 4 commits intobgruening:masterfrom
neo417:phylophlan

Conversation

@neo417
Copy link
Copy Markdown
Contributor

@neo417 neo417 commented Jun 16, 2025

This is the other tool I have been working on with @Minamehr for my Bachelor's project.

Because of the unexpected complexity and time constraints we agreed it would be enough for my Project to just wrap the main phylophlan script, but for reference I have included all my progress on the tool.

The main thing missing from all the tools is a way to access cached reference datasets, as I do not understand how to configure and test the .loc files. Your help with that would be much appreciated. Once I know how to add tool-data manually, I would be willing to at least finish up the parts I have started in my free time.

Relatedly, we decided early on that writing data managers for this tool would be out of scope or the project. PhyloPhlAn provides various scripts to download pre-identified core UniRef90 proteins, reference genomes from the Genbank repository and custom SGB databases. I am not sure if compatible references can already be downloaded by Galaxy or if these data sources are too large to cache with a data manager. I did not continue writing wrappers for them once I realized that even the indices are hundreds of MB in size.

Do you think it would be useful to publish just the main script now and add the remaining tools and data managers later?

neo417 added 4 commits June 16, 2025 16:00
PhyloPhlAn is an integrated pipeline for large-scale phylogenetic profiling of genomes and metagenomes.
PhyloPhlAn GitHub: https://github.com/biobakery/phylophlan

The tool suite consists of multiple scripts which need to be wrapped into tools and data managers, but this branch
is only concerned with creating the tool wrappers. As of this commit only phylophlan.xml is largely complete.
This is the main script to run the Concatenation and Gene-trees pipelines, and allows the user to configure
which external tools they want to use at every analysis step.

- Support for some preconfigured external tools is missing (Opal, UPP and astrid).
- Using aligned markers from StrainPhlAn (--strainphlan) is not supported.
- Code and .loc files to access cached datasets are missing.

phylophlan_assign_sgbs and phylophlan_draw_metagenomic are used to report and visualize the closest species-level
genome bins, for each bin from a metagenomic assembly analysis. My progress on them has recently stalled, because the scripts
rely entirely on the presence of a cached database, which I did not implement like above.
Besides that they are missing an expanded help section, and the current release of the assign_sgbs script
has a bug limiting some of the functionality (pairwise mash distances of the input).

phylophlan_strain_finder would be a tool to perform analysis on trees and mutation rate tables build with phylophlan,
but I have not wrapped it (yet?).

The test data have been created by cutting down example data from the tutorials on the PhyloPhlAn github.
@SaimMomin12
Copy link
Copy Markdown
Collaborator

@neo417 Great addition! Would you like to move this tool to IUC, as also have MetaPhlAn on that repo?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants