Add PhyloPhlAn tool by neo417 · Pull Request #1639 · bgruening/galaxytools

neo417 · 2025-06-16T16:43:22Z

This is the other tool I have been working on with @Minamehr for my Bachelor's project.

Because of the unexpected complexity and time constraints we agreed it would be enough for my Project to just wrap the main phylophlan script, but for reference I have included all my progress on the tool.

The main thing missing from all the tools is a way to access cached reference datasets, as I do not understand how to configure and test the .loc files. Your help with that would be much appreciated. Once I know how to add tool-data manually, I would be willing to at least finish up the parts I have started in my free time.

Relatedly, we decided early on that writing data managers for this tool would be out of scope or the project. PhyloPhlAn provides various scripts to download pre-identified core UniRef90 proteins, reference genomes from the Genbank repository and custom SGB databases. I am not sure if compatible references can already be downloaded by Galaxy or if these data sources are too large to cache with a data manager. I did not continue writing wrappers for them once I realized that even the indices are hundreds of MB in size.

Do you think it would be useful to publish just the main script now and add the remaining tools and data managers later?

PhyloPhlAn is an integrated pipeline for large-scale phylogenetic profiling of genomes and metagenomes. PhyloPhlAn GitHub: https://github.com/biobakery/phylophlan The tool suite consists of multiple scripts which need to be wrapped into tools and data managers, but this branch is only concerned with creating the tool wrappers. As of this commit only phylophlan.xml is largely complete. This is the main script to run the Concatenation and Gene-trees pipelines, and allows the user to configure which external tools they want to use at every analysis step. - Support for some preconfigured external tools is missing (Opal, UPP and astrid). - Using aligned markers from StrainPhlAn (--strainphlan) is not supported. - Code and .loc files to access cached datasets are missing. phylophlan_assign_sgbs and phylophlan_draw_metagenomic are used to report and visualize the closest species-level genome bins, for each bin from a metagenomic assembly analysis. My progress on them has recently stalled, because the scripts rely entirely on the presence of a cached database, which I did not implement like above. Besides that they are missing an expanded help section, and the current release of the assign_sgbs script has a bug limiting some of the functionality (pairwise mash distances of the input). phylophlan_strain_finder would be a tool to perform analysis on trees and mutation rate tables build with phylophlan, but I have not wrapped it (yet?). The test data have been created by cutting down example data from the tutorials on the PhyloPhlAn github.

SaimMomin12 · 2025-06-17T09:55:38Z

@neo417 Great addition! Would you like to move this tool to IUC, as also have MetaPhlAn on that repo?

neo417 added 4 commits June 16, 2025 16:00

Fix version suffix token

8ab61ce

Fix tool shed categories

b47b589

Fix linting issue

638aabb

neo417 mentioned this pull request Jun 19, 2025

New tool: PhyloPhlAn an integrated pipeline for phylogenetic profiling of genomes and metagenomes galaxyproject/tools-iuc#7067

Open

5 tasks

neo417 closed this Jul 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add PhyloPhlAn tool#1639

Add PhyloPhlAn tool#1639
neo417 wants to merge 4 commits intobgruening:masterfrom
neo417:phylophlan

neo417 commented Jun 16, 2025

Uh oh!

SaimMomin12 commented Jun 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

neo417 commented Jun 16, 2025

Uh oh!

SaimMomin12 commented Jun 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants