Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
151 commits
Select commit Hold shift + click to select a range
97b839c
lefse readability
adamcantor22 Jul 24, 2024
778de4b
Merge branch 'Enhancement-SnakemakeAnalysis' of https://github.com/cl…
adamcantor22 Jul 24, 2024
48fac27
testing analysis
adamcantor22 Jul 25, 2024
1ce36d4
comment pdf summary setup
adamcantor22 Jul 25, 2024
d1f868a
testing sequencing runs in analysis
adamcantor22 Jul 25, 2024
df720d5
separate test users
adamcantor22 Jul 25, 2024
4bca62d
correct study name, wait for upload
adamcantor22 Jul 25, 2024
2e85fc5
reset permissions
adamcantor22 Aug 5, 2024
9bb6cfb
try to get concurrently running watcher process to be recorded in cod…
adamcantor22 Aug 5, 2024
168ae81
test revert
adamcantor22 Aug 5, 2024
9397f48
append to coverage
adamcantor22 Aug 5, 2024
657acd0
not for server tests
adamcantor22 Aug 5, 2024
8fb72b5
snakemake tests
adamcantor22 Aug 6, 2024
4c27630
test
adamcantor22 Aug 6, 2024
36d47b1
remove snakemake logs
adamcantor22 Aug 6, 2024
b1a8ad2
gitignore
adamcantor22 Aug 6, 2024
562a12a
add to pipeline
adamcantor22 Aug 6, 2024
8805679
install graphviz separately because it won't go into a /bin
adamcantor22 Aug 6, 2024
f2f2b03
add append to coverage run command
adamcantor22 Aug 6, 2024
11ce90c
try to get watcher to report its coverage
adamcantor22 Aug 6, 2024
096c26b
typo
adamcantor22 Aug 6, 2024
d3d001f
specify coverage file locations
adamcantor22 Aug 6, 2024
6ffca14
run watcher as a subprocess on github actions
adamcantor22 Aug 6, 2024
4eca2a4
dont make an infinite loop
adamcantor22 Aug 6, 2024
2e0c7f4
get spawn to track
adamcantor22 Aug 7, 2024
979423b
correct syntax
adamcantor22 Aug 7, 2024
93add91
coveragerc path
adamcantor22 Aug 7, 2024
8872446
folder breakdown by var
adamcantor22 Aug 7, 2024
b855e00
Merge branch 'Enhancement-SnakemakeAnalysis' of https://github.com/cl…
adamcantor22 Aug 7, 2024
9a474a9
correct paths
adamcantor22 Aug 7, 2024
eab32cf
test files
adamcantor22 Aug 7, 2024
1e45a90
pathing
adamcantor22 Aug 7, 2024
b6379c9
NAs in filenames
adamcantor22 Aug 7, 2024
1bd98e6
pathing in DAA
adamcantor22 Aug 7, 2024
23dbe02
remove tool class
adamcantor22 Aug 7, 2024
e4070c6
dont test something that doesnt exist
adamcantor22 Aug 7, 2024
6fa1859
cleanup
adamcantor22 Aug 7, 2024
5c4d9c9
job template
adamcantor22 Aug 7, 2024
61e8ffe
moving tool tests to analysis tests
adamcantor22 Aug 8, 2024
21174d5
moving files
adamcantor22 Aug 8, 2024
c51eb0e
fix object control
adamcantor22 Aug 8, 2024
e33eaa8
update workflow
adamcantor22 Aug 8, 2024
8894169
pip not apt
adamcantor22 Aug 8, 2024
976ccb3
adding lefse tests
adamcantor22 Aug 8, 2024
527a2ea
deprecating summaries for now
adamcantor22 Aug 8, 2024
faeb16e
cleaning up development stuff
adamcantor22 Aug 8, 2024
cc19791
changing pipeline name
adamcantor22 Aug 8, 2024
82b3919
changing pipeline name
adamcantor22 Aug 8, 2024
8d80f94
generalizing filter to two classes for both lefse and ancombc
adamcantor22 Aug 14, 2024
cd5490c
diversity output as directory
adamcantor22 Aug 14, 2024
44a3352
dashes for table filtering
adamcantor22 Aug 14, 2024
1ee83ec
limit ancombc filterings
adamcantor22 Aug 14, 2024
e847c91
handle ancombc edge case with two classes + nan, also make filtered t…
adamcantor22 Aug 14, 2024
c35d3f0
fix building phylo tree
adamcantor22 Aug 14, 2024
fcacec6
typos
adamcantor22 Aug 14, 2024
a51ff5a
fix table extraction ambiguity
adamcantor22 Aug 14, 2024
fbaa65a
typo
adamcantor22 Aug 14, 2024
45c0da7
give more complex process more memory
adamcantor22 Aug 14, 2024
70b7cab
fixing ancombc, did not need all those splits
adamcantor22 Aug 15, 2024
2831e0c
unneccesary resource management
adamcantor22 Aug 15, 2024
50af8ab
missing conda env spec
adamcantor22 Aug 16, 2024
33ce251
give ancombc more memory
adamcantor22 Aug 16, 2024
0149bf9
remove previous
adamcantor22 Aug 16, 2024
79ee4a6
verbose
adamcantor22 Aug 16, 2024
2d1dba6
unique() fix and template updates
adamcantor22 Aug 30, 2024
5125c48
dual barcodes updates
adamcantor22 Sep 11, 2024
c2a6317
small changes from issues that came up in real analysis scenarios
adamcantor22 Jan 15, 2025
3b9c707
make lefse results clearer by adding headers to results tables and pr…
adamcantor22 Jan 29, 2025
ab6d27e
minor plot
adamcantor22 Jan 31, 2025
025c171
minor plot
adamcantor22 Jan 31, 2025
b9ab343
no negative flip with only one group
adamcantor22 Jan 31, 2025
2dd427b
edits to snakemake rules, lefse plotting, and formatting to humann_ba…
adamcantor22 Mar 11, 2025
63e750c
modify workflows to allow extraction of qza's prior to running Snakem…
adamcantor22 Mar 11, 2025
5dbc401
keep running snakemake rules where possible after some fail
adamcantor22 Mar 11, 2025
7eb75fd
make cleaning taxa strings optional
adamcantor22 Mar 12, 2025
c3b8f14
update github actions version
adamcantor22 Mar 12, 2025
2f38961
checking for taxa string special cases
adamcantor22 Mar 12, 2025
f25cfdf
add options for lefse plotting
adamcantor22 Mar 21, 2025
a025d50
taxa string updates
adamcantor22 Mar 21, 2025
6ae99b5
special taxa cases
adamcantor22 Mar 26, 2025
1f47f11
fix upload sequencing run bug
adamcantor22 Mar 28, 2025
224c19f
allow for optional workflow parameters
adamcantor22 Apr 2, 2025
8036f95
taxa strings
adamcantor22 Apr 7, 2025
8e37a94
production fixes
adamcantor22 Apr 7, 2025
ad807e9
much more plotting functionality
adamcantor22 May 29, 2025
fb40c41
Merge branch 'master' into Enhancement-ImprovedSnakemakeFunctionality
adamcantor22 May 30, 2025
4e05473
testing picrust functionality
kbpi314 Jun 10, 2025
a61277e
removed duplicate extract tsv
kbpi314 Jun 11, 2025
16aee6c
added comma and dependency
kbpi314 Jun 11, 2025
b03a29d
simplified output
kbpi314 Jun 11, 2025
9283bdf
simplified output
kbpi314 Jun 11, 2025
5a2b13d
fixed quotes around pc2
kbpi314 Jun 11, 2025
76a99af
fixed syntax bash
kbpi314 Jun 11, 2025
f35756f
fixed var syntax
kbpi314 Jun 11, 2025
e87160d
fixed var syntax
kbpi314 Jun 11, 2025
06f04ae
fixed var syntax
kbpi314 Jun 11, 2025
4bd4912
added whitespace
kbpi314 Jun 11, 2025
9d05743
added directory()
kbpi314 Jun 11, 2025
2f5cd1d
added directory()
kbpi314 Jun 11, 2025
425de4c
adding tsv to biom to qza rules
kbpi314 Jun 11, 2025
50ec200
commented out circular rules
kbpi314 Jun 11, 2025
505811e
added comment hashtag
kbpi314 Jun 13, 2025
c387e01
adam is awesome
adamcantor22 Jun 13, 2025
e59d8e0
updating some bugs
adamcantor22 Jun 13, 2025
b01d635
added export line
kbpi314 Jun 13, 2025
d0b85cb
fixed import and spacing
kbpi314 Jun 13, 2025
3c818b2
pytest version
adamcantor22 Jun 13, 2025
476180c
added directory for pc2
kbpi314 Jun 13, 2025
86b65d9
remove ancombc runs from core pipeline
adamcantor22 Jun 13, 2025
419c076
added stratified option
kbpi314 Jun 16, 2025
dd81041
Merge branch 'Enhancement-KB' of https://github.com/clemente-lab/mmed…
kbpi314 Jun 16, 2025
53bba79
function improvement
adamcantor22 Jun 20, 2025
71286ad
function tweak
adamcantor22 Jun 20, 2025
c062c8b
replace NaN columns from dividing by 0 with 0s
adamcantor22 Jul 16, 2025
eff819b
pytest version
adamcantor22 Jul 16, 2025
d1d6dc6
pylinting fixes
adamcantor22 Jul 23, 2025
1541929
remove ancombc from core pipeline
adamcantor22 Jul 23, 2025
202f81d
Merge branch 'Enhancement-ImprovedSnakemakeFunctionality' into Enhanc…
adamcantor22 Jul 23, 2025
e87bf76
add taxa tables to core pipeline
adamcantor22 Jul 23, 2025
4380c8e
taxa plotting improvement
adamcantor22 Aug 27, 2025
52ccda2
fixed -n arg for non phylo diversity core metrics
kbpi314 Sep 15, 2025
177d99d
Merge branch 'Enhancement-ImprovedSnakemakeFunctionality' of https://…
kbpi314 Sep 15, 2025
37f17df
fixed R libs and added volcano plots
adamcantor22 Nov 14, 2025
17d455d
script
adamcantor22 Nov 14, 2025
954c8c5
host of differential abundance analysis changes
adamcantor22 Dec 9, 2025
d1a4f07
use pip:
adamcantor22 Dec 9, 2025
af9f8c0
enforce pip version
adamcantor22 Dec 9, 2025
0137641
enforce pip version part 2
adamcantor22 Dec 9, 2025
43dffb2
test lefse format
adamcantor22 Dec 9, 2025
f014f7f
no secrets
adamcantor22 Dec 9, 2025
bf9678b
add import
adamcantor22 Dec 9, 2025
422fd21
format human test and deep file comp
adamcantor22 Dec 9, 2025
b4ff44b
correct params
adamcantor22 Dec 9, 2025
d3571d6
some tests
adamcantor22 Dec 9, 2025
c12a32d
revert root pass
adamcantor22 Dec 10, 2025
baf9f64
optimize foldchange
adamcantor22 Feb 4, 2026
f3743db
Merge branch 'Enhancement-ImprovedSnakemakeFunctionality' of https://…
adamcantor22 Feb 4, 2026
d125016
Merge branch 'master' into Enhancement-ImprovedSnakemakeFunctionality
adamcantor22 Feb 4, 2026
bd065a0
update pipe to newer implementation
adamcantor22 Feb 4, 2026
369d0d3
Merge branch 'Enhancement-ImprovedSnakemakeFunctionality' of https://…
adamcantor22 Feb 4, 2026
1b02243
snakemake rule documentation
adamcantor22 Feb 4, 2026
d19dec3
Merge branch 'Enhancement-ImprovedSnakemakeFunctionality' into Enhanc…
adamcantor22 Feb 9, 2026
27cce6a
PR modifications
adamcantor22 Feb 9, 2026
b8006d7
add picrust pipeline to workflows
adamcantor22 Feb 9, 2026
4cea632
typo
adamcantor22 Feb 9, 2026
f9ca99d
add basic test for new workflow
adamcantor22 Feb 9, 2026
722a690
remove curly braces from test version
adamcantor22 Feb 9, 2026
6349f7d
add mapping file for picrust test
adamcantor22 Feb 9, 2026
18c7c9c
Merge pull request #475 from clemente-lab/Enhancement-KB
adamcantor22 Feb 9, 2026
a110c37
add exclude string capabilities to plot lefse
adamcantor22 Feb 9, 2026
575efba
typo
adamcantor22 Feb 9, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ jobs:
run: echo "PYTHONPATH=." >> $GITHUB_ENV

- name: Install packages
run: pip install -U pip; pip install cryptography; pip install jupyter_client==6.1.12; pip install ipython_genutils==0.2.0; pip install nbconvert==5.6.1; pip install rpy2; pip install ipykernel; pip install pandas==1.2.3; pip install pillow; pip install -U Jinja2==3.0; pip install coverage;
run: python -m pip install pip==24.0; pip install cryptography; pip install jupyter_client==6.1.12; pip install ipython_genutils==0.2.0; pip install nbconvert==5.6.1; pip install rpy2; pip install ipykernel; pip install pandas==1.2.3; pip install pillow; pip install -U Jinja2==3.0; pip install coverage;

- name: install pandoc
run: sudo apt-get install pandoc;
Expand Down Expand Up @@ -89,7 +89,7 @@ jobs:
with:
python-version: 3.9
- name: Install packages
run: pip install -U pip; sudo apt-get install tidy environment-modules -y; pip install cryptography; pip install coverage;
run: python -m pip install pip==24.0; sudo apt-get install tidy environment-modules -y; pip install cryptography; pip install coverage;

- name: Set PYTHONPATH
run: echo "PYTHONPATH=." >> $GITHUB_ENV
Expand Down
31 changes: 25 additions & 6 deletions mmeds/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,8 @@
# Check where this code is being run
TESTING = not ('chimera' in getfqdn().split('.'))

# If not running on web01, can't connect to databases
IS_PRODUCTION = 'web01' in getfqdn().split('.')
# If not running on web03, can't connect to databases
IS_PRODUCTION = 'web03' in getfqdn().split('.')

# While this is false, users cannot be added, cannot upload, and cannot query from webpage
LIVE_PROD_ACCESS = True
Expand Down Expand Up @@ -49,7 +49,7 @@
IMAGE_PATH = str(CSS_DIR) + '/'

else:
# We're on web01 and using MMEDs out of if it's project diredctory
# We're on web03 and using MMEDs out of if it's project diredctory
# OR, we're in the folder /sc/arion/projects/MMEDS
DATABASE_DIR = Path('/sc/arion/projects/MMEDS/mmeds_server_data')

Expand Down Expand Up @@ -277,14 +277,27 @@
'taxonomic_database',
'sequencing_runs',
'taxa_levels'
]
],
"optional_parameters": []
},
"lefse": {
"parameters": [
"tables",
"classes",
"subclasses"
"classes"
],
"optional_parameters": [
"subclasses",
"clean_strings",
"plot_max_rows",
"include_string",
"exclude_string"
]
},
"picrust2": {
"parameters": [
"tables" # this is going to always be 'asv_table.qza' and 'rep_seqs_table.qza' TODO: default parameters?
],
"optional_parameters": []
}
}

Expand Down Expand Up @@ -429,6 +442,12 @@
TEST_CODE_MIXED = 'mixedstudy'
TEST_CODE_OTU = 'otutable'
TEST_CODE_LEFSE = 'lefsetable'
TEST_FORMAT_HUMANN_MAPPING = str(TEST_PATH / 'test_qiime_mapping_file_format_to_humann.tsv')
TEST_FORMAT_HUMANN_TABLE = str(TEST_PATH / 'test_BRITE_pathways_stratified_format_to_humann.tsv')
TEST_FORMAT_HUMANN_RESULT = str(TEST_PATH / 'test_formatted_humann_table.tsv')
TEST_FORMAT_LEFSE_MAPPING = str(TEST_PATH / 'test_qiime_mapping_file_format_to_lefse.tsv')
TEST_FORMAT_LEFSE_TABLE = str(TEST_PATH / 'test_taxa_table_L7_format_to_lefse.tsv')
TEST_FORMAT_LEFSE_RESULT = str(TEST_PATH / 'test_formatted_lefse_table.tsv')
TEST_MIXS = str(TEST_PATH / 'test_MIxS.tsv')
TEST_MIXS_MMEDS = str(TEST_PATH / 'MIxS_metadata.tsv')
TEST_OTU = str(TEST_PATH / 'test_otu_table.txt')
Expand Down
6 changes: 2 additions & 4 deletions mmeds/database/database.py
Original file line number Diff line number Diff line change
Expand Up @@ -1032,10 +1032,8 @@ def get_sequencing_run_locations(self, metadata, user, column=("RawDataProtocol"
df = pd.read_csv(metadata, sep='\t', header=[0, 1], skiprows=[2, 3, 4])

# Store run names from metadata
runs = []
for run in df[column]:
if run not in runs:
runs.append(run)
runs = list(df[column].unique())

# Get paths, these should exist due to already checking during validation
run_paths = {}
for run in runs:
Expand Down
1 change: 1 addition & 0 deletions mmeds/server.py
Original file line number Diff line number Diff line change
Expand Up @@ -717,6 +717,7 @@ def upload_specimen_metadata(self, uploadType, studyName):
cp.session['metadata_type'] = 'specimen'
cp.session['study_name'] = studyName
cp.session['upload_type'] = uploadType
cp.log(cp.session["upload_type"])

with Database(path='.', testing=self.testing, owner=self.get_user()) as db:
db.check_study_name(studyName)
Expand Down
160 changes: 117 additions & 43 deletions mmeds/snakemake/rules/common.smk
Original file line number Diff line number Diff line change
Expand Up @@ -2,85 +2,159 @@ import pandas as pd
from copy import deepcopy
from pathlib import Path
from mmeds.config import TOOLS_DIR
from subprocess import run

"""
This common.smk file, following snakemake conventions, contains all the python logic necessary for generating the snakemake rule DAG
This common.smk file, following snakemake conventions, contains all the python logic necessary
for generating the snakemake rule DAG
"""

metadata = pd.read_csv("tables/qiime_mapping_file.tsv", sep='\t', header=[0], skiprows=[1])
metadata = pd.read_csv("tables/qiime_mapping_file.tsv", sep='\t', header=[0], skiprows=[1], dtype='str')

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're ok to get rid of the brackets around [0] and use header=0, skiprows=1


def lefse_splits(wildcards):
""" Calculates all the pairwise splits that should be compared by LEfSe. Will not include groups with an insufficient number of comparisons """
splits = []
for lefse_class in config["classes"]:
# 'classes' in this case refer to metadata columns, whereas categories refer to the possible values of those columns
categories = list(metadata[lefse_class].unique())

# Discard samples with a 'nan' for the selected class. This will only work while the input has been run through MMEDS already
categories = [c for c in categories if str(c) != "nan"]
value_counts = metadata[lefse_class].value_counts()
def pairwise_splits(wildcards, tool, vars):
"""
When running differential analysis on any number of variables and tables, create all the possible pairwise splits
per-table and per-variable that have sufficient data to form a comparison
"""
if "tables" in config:
tables = config["tables"]
else:
tables = [f"taxa_table_L{x}" for x in config["taxa_levels"]]

subclasses = []
if "subclasses" in config and config["subclasses"]:
subclasses = deepcopy(config["subclasses"])
subclasses = False
if tool == "lefse" and "subclasses" in config and config["subclasses"]:
subclasses = deepcopy(config["subclasses"])

if len(categories) < 2:
# Only one value in the class, nothing to compare
continue
splits = []
for table in tables:
if not Path(f"tables/{table}.tsv").exists():
extract_feature_table_subprocess(table)
table_df = pd.read_csv(f"tables/{table}.tsv", sep='\t', header=[0], index_col=0)
filtered_metadata = metadata.loc[metadata["#SampleID"].isin(table_df.columns)]
for var in vars:
categories = list(filtered_metadata[var].unique())
categories = [c for c in categories if str(c) != "nan"]

@kbpi314 kbpi314 Feb 9, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if tables with this case can reach this stage of processing but will this be ok with other non-"nan" representations of NA?

value_counts = filtered_metadata[var].value_counts()

if len(categories) < 3:
# Exactly two values in the class, no pairwise checks needed
if not sufficient_values(value_counts, categories[0], categories[1]):
if len(categories) < 2: # Only one value in the class, nothing to compare
continue
splits += expand("results/{lefse_class}/lefse_plot.{feature_table}.{lefse_class}.NA.pdf",
feature_table=config["tables"], lefse_class=lefse_class)
if subclasses:
splits += expand("results/{lefse_class}/lefse_plot.{feature_table}.{lefse_class}.{subclass}.pdf",
feature_table=config["tables"], lefse_class=lefse_class, subclass=subclasses)
continue


splits += expand("results/{lefse_class}/lefse_plot_strict.{feature_table}.{lefse_class}.{subclass}.pdf",
feature_table=config["tables"], lefse_class=lefse_class, subclass=subclasses)
for i in range(len(categories)-1):
for j in range(i+1, len(categories)):
# Perform pairwise checks
if not sufficient_values(value_counts, categories[i], categories[j]):

if len(categories) < 3: # Exactly two values in the class, no pairwise checks needed
if not sufficient_values(value_counts, categories[0], categories[1]):
continue
splits += expand("results/{lefse_class}/lefse_plot.{feature_table}_{lefse_class}_{cat1}_or_{cat2}.{lefse_class}.NA.pdf",
feature_table=config["tables"], lefse_class=lefse_class, cat1=categories[i], cat2=categories[j])
if subclasses:
splits += expand("results/{lefse_class}/lefse_plot.{feature_table}_{lefse_class}_{cat1}_or_{cat2}.{lefse_class}.{subclass}.pdf",
feature_table=config["tables"], lefse_class=lefse_class, cat1=categories[i], cat2=categories[j], subclass=subclasses)
if tool == "lefse":
splits += expand("results/{var}/lefse_plot.{feature_table}.{var}.NA.pdf",
feature_table=table, var=var)
if subclasses:
splits += expand("results/{var}/lefse_plot.{feature_table}.{var}.{subclass}.pdf",
feature_table=table, var=var, subclass=subclasses)
elif tool == "ancombc":
splits += expand("differential_abundance/{var}/ancom-bc_barplot.{feature_table}.{var}::{cat}.qzv",
feature_table=table, var=var, cat=categories[0])
continue

for i in range(len(categories)-1):
if tool == "ancombc": # Do not need a separate comparison for each pairwise split with ANCOM-BD
splits += expand("differential_abundance/{var}/ancom-bc_barplot.{feature_table}.{var}::{cat}.qzv",
feature_table=table, var=var, cat=categories[i])

else: # Perform LEfSe strict analyses using all variable classes
splits += expand("results/{var}/lefse_plot_strict.{feature_table}.{var}.NA.pdf",
feature_table=table, var=var)
if subclasses:
splits += expand("results/{var}/lefse_plot_strict.{feature_table}.{var}.{subclass}.pdf",
feature_table=table, var=var, subclass=subclasses)

for j in range(i+1, len(categories)): # Perform pairwise checks
if not sufficient_values(value_counts, categories[i], categories[j]):
continue
if tool == "lefse":
splits += expand("results/{var}/lefse_plot.{feature_table}.{var}-{cat1}-or-{cat2}.{var}.NA.pdf",
feature_table=table, var=var, cat1=categories[i], cat2=categories[j])
if subclasses:
splits += expand(
"results/{var}/lefse_plot.{feature_table}.{var}-{cat1}-or-{cat2}.{var}.{subclass}.pdf",
feature_table=table, var=var, cat1=categories[i], cat2=categories[j],
subclass=subclasses)
return splits


def ancombc_splits(wildcards):
""" Get pairwise splits prepared in ANCOM-BC format """
return pairwise_splits(wildcards, "ancombc", config["metadata"])


def lefse_splits(wildcards):
""" Get pairwise splits prepared in LEfSe format """
splits = pairwise_splits(wildcards, "lefse", config["classes"])
formatted_splits = []
for s in splits:
# Replace occurrences where class==subclass with subclass="NA", which is the default behavior, this handles the issue at the DAG level

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps replace this comment with the string of what the expected file name/string being parsed as separated should look like - something like # separated = lefse_plot.{feature_table}.{var}-{cat1}-or-{cat2}.{var}.NA though I forget the exact string that belongs here

# e.g. separated: ["results/class/lefse_plot", "feature_table_class_cat1_or_cat2", "class", "subclass", "pdf"]
separated = s.split(".")
if separated[-2] == separated[-3]:
separated[-2] = "NA"
formatted_splits += [".".join(separated)]

return formatted_splits


def lefse_get_subclass(wildcards):
""" Handle class==subclass behavior at the rule level """
"""
Replace occurrences where class==subclass with subclass="NA", which is the default behavior,
this handles the issue at the DAG level e.g. separated:
["results/class/lefse_plot", "feature_table_class_cat1_or_cat2", "class", "subclass", "pdf"]
"""
subclass = wildcards["class"] if wildcards["subclass"] == "NA" else wildcards["subclass"]
return subclass


def sufficient_values(value_counts, cat1, cat2, threshold=2):
""" Check if two categories have enough samples for a comparison """
if value_counts[cat1] < threshold or value_counts[cat2] < threshold:
return False
return True


def demux_single_option(wildcards):
""" Studies from MSQ past their 90th run require no golay error correction, all others require rev comp mapping barcodes """
"""
Studies from MSQ past id 90 require no golay error correction, all others runs require
rev-comp mapping barcodes. This is a poor generalization and will need to be improved in the future.
"""
Comment on lines +119 to +122

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually will change into a user-inputted parameter prior to the analysis step in the user-defined config file
-was initially confusing because wasn't sure where an MSQ-containing string was being parsed; seems like a folder is being read/scanned and then these params are being determined from that as opposed to centralized in a config

components = wildcards.sequencing_run.split("_")
if "MSQ" in components and int(components[-1]) > 90:
return "--p-no-golay-error-correction"
return "--p-rev-comp-mapping-barcodes"


def get_lefse_plot_options():
""" Add various visualization options for LEfSe plot output """
opts = ""
if "clean_strings" in config and config["clean_strings"] is not None and not config["clean_strings"]:
opts += "--no-string-clean "
if "plot_max_rows" in config and type(config["plot_max_rows"]) is int and config["plot_max_rows"] > 0:
opts += f"--row-max {config['plot_max_rows']} "
if "include_string" in config and config["include_string"]:
opts += f"--include-string {config['include_string']} "
if "exclude_string" in config and config["exclude_string"]:
opts += f"--exclude-string {config['exclude_string']} "
return opts


def get_tool_dir():
""" Get the location of needed scripts """
return TOOLS_DIR


def extract_feature_table_subprocess(table):
""" Equal to the 'extract_feature_table.sh' script but done without the external call """
qza_file = Path(f"tables/{table}.qza")
tsv_file = Path(f"tables/{table}.tsv")
tmp_dir = Path("tables/tmp_unzip")

if not qza_file.exists():
raise FileNotFoundError(f"{qza_file.name} not found in tables folder")

run(["unzip", "-qq", "-jo", str(qza_file), "-d", str(tmp_dir)])
run(["biom", "convert", "--to-tsv", "-i", str(tmp_dir / "feature-table.biom"), "-o", str(tsv_file)])
run(["rm", "-rf", str(tmp_dir)])
run(["sed", "-i", "1d;2s/^#//", str(tsv_file)])
16 changes: 11 additions & 5 deletions mmeds/snakemake/rules/demux_denoise.smk
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@ rule demux_single_barcodes:
barcodes = "section_{sequencing_run}/qiime_mapping_file_{sequencing_run}.tsv"
output:
error_correction = "section_{sequencing_run}/error_correction.qza",
demux_file = "section_{sequencing_run}/demux_file.qza"
demux_file = "section_{sequencing_run}/demux_file.qza",
demux_viz = "section_{sequencing_run}/demux_viz.qza"
conda:
"qiime2-2020.8.0"
Comment on lines 10 to 11

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is qiime2-2025.4 or qiime2-2023.9 etc available now?

params:
Expand All @@ -17,17 +18,21 @@ rule demux_single_barcodes:
"--m-barcodes-column BarcodeSequence "
"{params.option} "
"--o-error-correction-details {output.error_correction} "
"--o-per-sample-sequences {output.demux_file}"
"--o-per-sample-sequences {output.demux_file}; "
"qiime demux summarize "
"--i-data {output.demux_file} "
"--o-visualization {output.demux_viz}"

rule demux_dual_barcodes_pheniqs:
""" Demultiplex a paired-end dual-barcoded sequencing run with Pheniqs """
input:
"section_{sequencing_run}/pheniqs_config.json"
output:
"section_{sequencing_run}/pheniqs_output"
directory("section_{sequencing_run}/pheniqs_output")
conda:
"pheniqs"
shell:
"mkdir {output}; "
"pheniqs mux --config {input}"

rule strip_error_barcodes:
Expand All @@ -36,10 +41,11 @@ rule strip_error_barcodes:
dir = "section_{sequencing_run}/pheniqs_output",
mapping_file = "section_{sequencing_run}/qiime_mapping_file_{sequencing_run}.tsv",
output:
dir = "section_{sequencing_run}/stripped_output"
dir = directory("section_{sequencing_run}/stripped_output")
conda:
"mmeds"
"mmeds_test"
shell:
"mkdir {output}; "
"strip_error_barcodes.py "
"--num-allowed-errors 1 "
"--m-mapping-file {input.mapping_file} "
Expand Down
Loading
Loading