Skip to content

Commit e11c474

Browse files
committed
benchmark: wire trim-galore-rs into the pipeline (#4)
trim-galore-rs was listed in performance.config.yaml and appeared throughout RESULTS.md, but the supporting wiring had been lost between checkouts: tools/trim-galore-rs/render.py was missing, and config/tools.yaml had no entry. snakemake therefore expanded the trim matrix to include trim-galore-rs jobs but only discovered the missing render at job-dispatch time, after minutes of simulate / unrelated trim work had already burned. This commit restores the wiring and adds a guard so the same kind of drift fails fast in the future: - config/tools.yaml: add trim-galore-rs entry (display name, version 2.2.0, version_cmd, pixi_env: trim-galore-rs). - tools/trim-galore-rs/render.py: argv + output-name moves for the Rust trim_galore. Drops the Perl render's 4-core cap since the Rust binary uses an in-process thread pool that scales past 4. - pixi.toml: new trim-galore-rs feature pinning trim-galore = "2.2.0.*" in its own environment + solve group, mirroring the fastp-nfcore pattern. Scoped to platforms with a bioconda 2.x build (linux-64 / linux-aarch64 / osx-arm64); osx-64 stays on Perl 0.6.x. - install.sh: pixi install --environment trim-galore-rs; skipped on osx-64 where no 2.x binary exists. - config/smoke.config.yaml: add trim-galore-rs to the smoke tool list so render.py / pixi env drift is caught in the ~15 min smoke run. - Snakefile: validate at config-load time that every tool in config['tools'] has a tools/<render_dir>/render.py, raising WorkflowError immediately instead of failing partway through a sweep. Smoke run on osx-arm64 (M1 Max, 4 threads, 1M PE pairs/sample) shows trim-galore-rs landing in the expected position relative to chelae, cutadapt, and fastp on both insert-size shapes.
1 parent e87b0fc commit e11c474

7 files changed

Lines changed: 207 additions & 5 deletions

File tree

benchmark-pipeline/Snakefile

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,20 @@ TOOL_NAMES = config["tools"]
105105
THREAD_COUNTS = config["thread_counts"]
106106
REPLICATES = list(range(1, int(config["replicates"]) + 1))
107107

108+
# Fail fast if any tool in the config lacks a render.py. Without this
109+
# guard the missing render only surfaces when `trim_one` actually runs
110+
# for that tool, which can be many minutes into a sweep (see issue #4).
111+
_missing_renders = [
112+
t for t in TOOL_NAMES
113+
if not (WORKDIR / "tools" / TOOLS.get(t, {}).get("render_dir", t) / "render.py").exists()
114+
]
115+
if _missing_renders:
116+
raise WorkflowError(
117+
"Tools listed in config['tools'] have no render.py: "
118+
f"{_missing_renders}. Add tools/<name>/render.py and a "
119+
"matching entry to config/tools.yaml."
120+
)
121+
108122

109123
def all_trim_outputs():
110124
out = []

benchmark-pipeline/config/smoke.config.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,7 @@ tools:
5151
- fastp-nfcore
5252
- cutadapt
5353
- trim-galore
54+
- trim-galore-rs
5455
- trimmomatic
5556
- bbduk
5657
- adapterremoval

benchmark-pipeline/config/tools.yaml

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,18 @@ trim-galore:
4848
version_cmd: "trim_galore --version | head -n 5"
4949
source: bioconda-latest
5050

51+
# Trim Galore 2.x (Rust rewrite, "Oxidized Edition"). Shares the
52+
# `trim-galore` bioconda package name with the Perl version pinned at
53+
# 0.6.11 in the default env, so it gets its own pixi env. The render
54+
# directory differs from the Perl render — the Rust binary scales past
55+
# 4 threads so we don't apply the Perl render's cores cap.
56+
trim-galore-rs:
57+
display_name: "TrimGalore 2.x (Rust)"
58+
version: "2.2.0"
59+
version_cmd: "trim_galore --version | head -n 5"
60+
pixi_env: trim-galore-rs
61+
source: bioconda-latest
62+
5163
trimmomatic:
5264
display_name: Trimmomatic
5365
version: "0.40"

benchmark-pipeline/install.sh

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -88,11 +88,19 @@ else
8888
fi
8989

9090
# ---- pixi environments ----------------------------------------------------
91-
log "Materializing pixi environments (default, run, plot, fastp-nfcore)"
91+
log "Materializing pixi environments (default, run, plot, fastp-nfcore, trim-galore-rs)"
9292
pixi install # default
9393
pixi install --environment run
9494
pixi install --environment plot
9595
pixi install --environment fastp-nfcore
96+
# trim-galore-rs is only built for linux-64 / linux-aarch64 / osx-arm64 on
97+
# bioconda; on osx-64 the feature's `platforms` restriction makes the env
98+
# unsolvable. Skip the install there with a warning rather than aborting.
99+
if [[ "$(uname -s)-$(uname -m)" == "Darwin-x86_64" ]]; then
100+
log "Skipping pixi env 'trim-galore-rs' on osx-64 (no bioconda 2.x build)"
101+
else
102+
pixi install --environment trim-galore-rs
103+
fi
96104

97105
# ---- rust toolchain -------------------------------------------------------
98106
RUST_DIR="$PIPELINE_DIR/.rust"

benchmark-pipeline/pixi.lock

Lines changed: 100 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

benchmark-pipeline/pixi.toml

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -51,8 +51,21 @@ r-scales = "*"
5151
[feature.fastp-nfcore.dependencies]
5252
fastp = "1.1.0.*"
5353

54+
# Trim Galore 2.x (Oxidized Edition): Rust rewrite of the Perl trim_galore,
55+
# shares the `trim-galore` bioconda package name with the Perl 0.6.x line.
56+
# Own environment so it doesn't collide with the trim-galore = "0.6.11.*"
57+
# pin in the default trimmers env. osx-64 has no 2.x build on bioconda
58+
# (latest there is 0.6.2), so the feature is scoped to the platforms that
59+
# actually have a 2.x package.
60+
[feature.trim-galore-rs]
61+
platforms = ["linux-64", "linux-aarch64", "osx-arm64"]
62+
63+
[feature.trim-galore-rs.dependencies]
64+
trim-galore = "2.2.0.*"
65+
5466
[environments]
55-
default = { features = ["trimmers", "plotting"], solve-group = "main" }
56-
run = { features = ["trimmers"], solve-group = "main" }
57-
plot = { features = ["plotting"], solve-group = "main" }
58-
fastp-nfcore = { features = ["fastp-nfcore"], no-default-feature = true, solve-group = "fastp-nfcore-solve" }
67+
default = { features = ["trimmers", "plotting"], solve-group = "main" }
68+
run = { features = ["trimmers"], solve-group = "main" }
69+
plot = { features = ["plotting"], solve-group = "main" }
70+
fastp-nfcore = { features = ["fastp-nfcore"], no-default-feature = true, solve-group = "fastp-nfcore-solve" }
71+
trim-galore-rs = { features = ["trim-galore-rs"], no-default-feature = true, solve-group = "trim-galore-rs-solve" }
Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
"""Trim Galore Oxidized Edition (Rust rewrite). CLI is a near-superset of
2+
the Perl trim_galore: same `--paired`, `--adapter`, `--adapter2`,
3+
`--stringency`, `--quality`, `--length`, `--output_dir`, `--cores`,
4+
`--gzip`. Output filenames also follow the Perl convention
5+
(`<basename>_val_1.fq.gz` for paired, `<basename>_trimmed.fq.gz` for SE),
6+
so the moves table reuses the same logic as the Perl render.
7+
8+
Difference from the Perl render: no `--cores` cap at 4. The Rust binary
9+
uses an in-process thread pool rather than the Perl version's multi-
10+
process spawn, so it scales cleanly past 4 threads."""
11+
12+
from pathlib import Path
13+
14+
15+
def render(ctx: dict) -> dict:
16+
cfg = ctx["trim_cfg"]
17+
workdir = Path(ctx["workdir"])
18+
19+
stringency = str(cfg.get("min_adapter_overlap", 5))
20+
argv = ["trim_galore",
21+
"--cores", str(ctx["threads"]),
22+
"--adapter", ctx["adapter_r1"],
23+
"--stringency", stringency,
24+
"--gzip",
25+
"--output_dir", str(workdir)]
26+
27+
if ctx["paired"]:
28+
argv += ["--paired", "--adapter2", ctx["adapter_r2"]]
29+
30+
argv += ["--quality", str(cfg.get("quality_threshold", 0)) if cfg.get("quality_trim") else "0"]
31+
argv += ["--length", str(cfg["min_length"]) if cfg.get("min_length", 0) > 0 else "1"]
32+
33+
argv.append(ctx["input_r1"])
34+
if ctx["paired"]:
35+
argv.append(ctx["input_r2"])
36+
37+
def basename_no_fqgz(path: str) -> str:
38+
n = Path(path).name
39+
for suffix in (".fastq.gz", ".fq.gz", ".fastq", ".fq"):
40+
if n.endswith(suffix):
41+
return n[: -len(suffix)]
42+
return Path(path).stem
43+
44+
moves: dict[str, str] = {}
45+
if ctx["paired"]:
46+
b1 = basename_no_fqgz(ctx["input_r1"])
47+
b2 = basename_no_fqgz(ctx["input_r2"])
48+
moves[str(workdir / f"{b1}_val_1.fq.gz")] = ctx["output_r1"]
49+
moves[str(workdir / f"{b2}_val_2.fq.gz")] = ctx["output_r2"]
50+
else:
51+
b1 = basename_no_fqgz(ctx["input_r1"])
52+
moves[str(workdir / f"{b1}_trimmed.fq.gz")] = ctx["output_r1"]
53+
54+
return {"argv": argv, "moves": moves}

0 commit comments

Comments
 (0)