Enhancement improved snakemake functionality by adamcantor22 · Pull Request #474 · clemente-lab/mmeds-meta

adamcantor22 · 2025-03-12T16:24:51Z

Pull Request Template for MMEDS

This is currently the branch in use for production analyses which is not ideal given it has not been reviewed. Better late than never.

What has changed

This PR primarily focuses on improving the quality of the LEfSe snakemake workflow, and has so far saved us enormous amounts of time in the process. Detection of possible pairwise comparisons given input tables and metadata categories is fully automatic and robust against categories with insufficient values. Also included in this PR are several new analyses that will be incorporated into snakemake workflows once the future feature table upload update is complete. These analyses include differential abundance testing using benchdamic, automatic taxonomic barplots, and formatting of humann3 outputs to use in the humann_barplot pipeline.

Also incorporates small update made by Kevin to add picrust2 pipeline as a new workflow

Checklist of pre-requisites

Does the code run?
Does the code follow the repository style?
Is the code tested?

How to use the feature

Start a new analysis of type 'lefse' on an existing study, use the resulting config_file.yaml to specify tables and variables for the analysis to use.

With mmeds conda env active:
To generate taxa barplot use Rscript $REPO/mmeds/tools/plot_taxa_barplot.R.
To run benchdamic use Rscript $REPO/mmeds/tools/run_benchdamic.R.
To format humann3 table use format_humann.py

Additional notes:

The structure of the repo with respect to /scripts and /tools is getting slightly confusing with respect to what should go where. Would be open to hearing suggestions for better structuring.

…emente-lab/mmeds-meta into Enhancement-SnakemakeAnalysis

…ecov

…emente-lab/mmeds-meta into Enhancement-SnakemakeAnalysis

…ement-KB

Enhancement kb

kbpi314 · 2026-02-09T18:23:06Z

 """

-metadata = pd.read_csv("tables/qiime_mapping_file.tsv", sep='\t', header=[0], skiprows=[1])
+metadata = pd.read_csv("tables/qiime_mapping_file.tsv", sep='\t', header=[0], skiprows=[1], dtype='str')


I think you're ok to get rid of the brackets around [0] and use header=0, skiprows=1

kbpi314 · 2026-02-09T18:26:15Z

+        filtered_metadata = metadata.loc[metadata["#SampleID"].isin(table_df.columns)]
+        for var in vars:
+            categories = list(filtered_metadata[var].unique())
+            categories = [c for c in categories if str(c) != "nan"]


not sure if tables with this case can reach this stage of processing but will this be ok with other non-"nan" representations of NA?

kbpi314 · 2026-02-09T18:30:59Z

+    splits = pairwise_splits(wildcards, "lefse", config["classes"])
    formatted_splits = []
    for s in splits:
-        # Replace occurrences where class==subclass with subclass="NA", which is the default behavior, this handles the issue at the DAG level


perhaps replace this comment with the string of what the expected file name/string being parsed as separated should look like - something like # separated = lefse_plot.{feature_table}.{var}-{cat1}-or-{cat2}.{var}.NA though I forget the exact string that belongs here

kbpi314 · 2026-02-09T18:36:24Z

+    """
+    Studies from MSQ past id 90 require no golay error correction, all others runs require
+        rev-comp mapping barcodes. This is a poor generalization and will need to be improved in the future.
+    """


Eventually will change into a user-inputted parameter prior to the analysis step in the user-defined config file
-was initially confusing because wasn't sure where an MSQ-containing string was being parsed; seems like a folder is being read/scanned and then these params are being determined from that as opposed to centralized in a config

kbpi314 · 2026-02-09T18:37:31Z

    conda:
        "qiime2-2020.8.0"


Is qiime2-2025.4 or qiime2-2023.9 etc available now?

kbpi314 · 2026-02-09T18:41:18Z

    conda:
        "qiime2-2023.9"


this is a different env and thus qzas etc will be a different VERSION - might make compatibility issues down the road? maybe move everything to 2023.9 or later

kbpi314 · 2026-02-09T18:43:10Z

+        "--p-n-jobs {threads} "
        "--output-dir {output}"

 rule alpha_rarefaction_phylogenetic:


I take it this is without your q2-boots true rarefaction method :c

kbpi314 · 2026-02-09T18:51:35Z

+parser <- add_argument(parser, "output-file", nargs=1, help="Taxa barplot output")
+parser <- add_argument(parser, "--category", help="Optional metadata category by which to separate plot sections")
+parser <- add_argument(parser, "--sort", default="top", help="Options for sorting of columns. Choose from ['top' (default), 'all', 'dominant']")
+parser <- add_argument(parser, "--colors", default=1, help="Options for ordering of colorscheme. Choose from [1, 2, 3]")


[1,2,3] does not seem to match [1,2,4] shown in code below lines 58-67

kbpi314 · 2026-02-09T19:02:41Z

+            # Running strictly, remove more
+            while(i > 0) {
+                # Remove numeric components or short modifiers to species annotations
+                if (grepl("^[0-9]+$", lvl_split[i]) | (str_length(lvl_split[i]) < 4 & !lvl_split[i] %in% c("d", "k", "p", "c", "o", "f", "g"))) {


do we need 's' here for species if we're anticipating handling of t__ / strain level tables?

we do still need it, yeah, but this will probably break with a strain-level annotation... will have to think about how to handle

kbpi314 · 2026-02-09T19:51:45Z

+    return(fold_df)
+}
+
+get_qval_on_pval_scale <- function(plot_mat, qval_thresh) {


worth adding a comment or two for these functions to clarify or explain what they do

kbpi314 · 2026-02-09T19:52:38Z

+features_vec <- str_replace_all(features_vec, ";", "\\.")
+features_vec <- str_replace_all(features_vec, " - ", "-")
+features_vec <- str_replace_all(features_vec, " \\/ ", "_")
+features_vec <- str_replace_all(features_vec, " ", "_")
+features_vec <- str_replace_all(features_vec, "\\|", "\\.")
+features_vec <- str_replace_all(features_vec, ",|\\(|\\)|\\:", "")


Depending on how often we use this may be worth bundling into a 'remove_special_char' function in R_utils

kbpi314 · 2026-02-09T20:00:17Z

+    conda:
+        "picrust2"
+    shell:
+        "picrust2_pipeline.py "


Where does picrust2_pipeline.py live? should it be in tools? Not sure I see it anywhere

hahaha you wrote this! picrust2_pipeline.py is in the /bin of the picrust2 env, it's a built-in

kbpi314 · 2026-02-09T20:02:14Z

-# Not testing if running on Web01
-testing = not (gethostname() == 'web01')
+sys.path.append("/sc/arion/projects/MMEDS/.modules/mmeds_test/lib/python3.9/site-packages")
+import pandas as pd


this import has to go after this sys.path call and the rest of the imports because of a pandas-lib-location-specific reason?

yes, this is related to the attempt to bring the website back online, so it is still in flux. but the concept here is that when on the web node, it needs to specifically be told where to look for python packages for some reason

kbpi314 · 2026-02-09T20:04:13Z

@@ -0,0 +1,4 @@
+tables:


if pc2 rules don't require a qmf - what is the purpose of the qiime_mapping_file in picrust2/tables? is there a test of pc2 / lefse combined pipeline?

kbpi314

Much awaited!

adamcantor22 · 2026-02-09T21:29:59Z

@kbpi314 excellent review! will work on these comments and get back to you

adamcantor22 added 30 commits July 24, 2024 16:32

lefse readability

97b839c

Merge branch 'Enhancement-SnakemakeAnalysis' of https://github.com/cl…

778de4b

…emente-lab/mmeds-meta into Enhancement-SnakemakeAnalysis

testing analysis

48fac27

comment pdf summary setup

1ce36d4

testing sequencing runs in analysis

d1f868a

separate test users

df720d5

correct study name, wait for upload

4bca62d

reset permissions

2e85fc5

try to get concurrently running watcher process to be recorded in cod…

9bb6cfb

…ecov

test revert

168ae81

append to coverage

9397f48

not for server tests

657acd0

snakemake tests

8fb72b5

test

4c27630

remove snakemake logs

36d47b1

gitignore

b1a8ad2

add to pipeline

562a12a

install graphviz separately because it won't go into a /bin

8805679

add append to coverage run command

f2f2b03

try to get watcher to report its coverage

11ce90c

typo

096c26b

specify coverage file locations

d3d001f

run watcher as a subprocess on github actions

6ffca14

dont make an infinite loop

4eca2a4

get spawn to track

2e0c7f4

correct syntax

979423b

coveragerc path

93add91

folder breakdown by var

8872446

Merge branch 'Enhancement-SnakemakeAnalysis' of https://github.com/cl…

b855e00

…emente-lab/mmeds-meta into Enhancement-SnakemakeAnalysis

correct paths

9a474a9

adamcantor22 requested review from circlespie, cleme and kbpi314 February 4, 2026 19:04

adamcantor22 changed the title ~~Enhancement improved snakemake functionality TEST PR~~ Enhancement improved snakemake functionality Feb 5, 2026

adamcantor22 and others added 10 commits February 9, 2026 09:00

Merge branch 'Enhancement-ImprovedSnakemakeFunctionality' into Enhanc…

d19dec3

…ement-KB

PR modifications

27cce6a

add picrust pipeline to workflows

b8006d7

typo

4cea632

add basic test for new workflow

f9ca99d

remove curly braces from test version

722a690

add mapping file for picrust test

6349f7d

Merge pull request #475 from clemente-lab/Enhancement-KB

18c7c9c

Enhancement kb

add exclude string capabilities to plot lefse

a110c37

typo

575efba

kbpi314 reviewed Feb 9, 2026

View reviewed changes

kbpi314 requested changes Feb 9, 2026

View reviewed changes

Uh oh!

Conversation

adamcantor22 commented Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Template for MMEDS

What has changed

Checklist of pre-requisites

How to use the feature

Additional notes:

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kbpi314 Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kbpi314 left a comment

Choose a reason for hiding this comment

Uh oh!

adamcantor22 commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

adamcantor22 commented Mar 12, 2025 •

edited

Loading

kbpi314 Feb 9, 2026 •

edited

Loading