-
Notifications
You must be signed in to change notification settings - Fork 195
fix!: remove parallel gzip, merge wrappers, added docs, and code refactoring. #4158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughThis change removes the previous modular single-end and paired-end Trimmomatic wrappers, their tests, and their environment files, consolidating them into a single, updated wrapper and metadata. A new test Snakefile covers all compression and mode combinations. Environment files are updated and pinned, and test logic is streamlined in the main test runner. The new wrapper directly invokes Trimmomatic with file names, addressing previous logging issues. Changes
Sequence Diagram(s)sequenceDiagram
participant Snakemake
participant Wrapper (wrapper.py)
participant Trimmomatic
Snakemake->>Wrapper (wrapper.py): Provide inputs, outputs, params, threads
Wrapper (wrapper.py)->>Trimmomatic: Invoke Trimmomatic with file names, params, threads
Trimmomatic-->>Wrapper (wrapper.py): Write logs with actual file names
Wrapper (wrapper.py)-->>Snakemake: Output trimmed files and logs
Assessment against linked issues
Suggested reviewers
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🔭 Outside diff range comments (1)
test_wrappers.py (1)
4552-4552
: 🛠️ Refactor suggestionUpdate function name to reflect its expanded scope.
The function name
test_trimmomatic_pe
is misleading since it now tests both paired-end (PE) and single-end (SE) modes. Consider renaming it to better reflect its unified testing approach.-def test_trimmomatic_pe(run): +def test_trimmomatic(run):
🧹 Nitpick comments (6)
bio/trimmomatic/meta.yaml (2)
12-12
: Fix trailing spaces.Remove trailing spaces to address the linting error.
- - r1: trimmed R1 fastq(gz) file + - r1: trimmed R1 fastq(gz) file🧰 Tools
🪛 YAMLlint (1.37.1)
[error] 12-12: trailing spaces
(trailing-spaces)
1-1
: Consider updating the name to reflect unified functionality.The name "trimmomatic pe" suggests this wrapper is only for paired-end data, but this is now a unified wrapper that handles both PE and SE modes. Consider updating to something like "trimmomatic" or "trimmomatic unified" for clarity.
-name: "trimmomatic pe" +name: "trimmomatic"test_wrappers.py (1)
4559-4559
: Consider reducing the core count for test environments.Increasing from 2 to 10 cores might be excessive for CI/test environments and could cause resource contention. Consider using a more conservative value like 4 cores unless the unified Trimmomatic wrapper specifically requires higher parallelization.
- "10", + "4",bio/trimmomatic/test/Snakefile (3)
42-42
: Document compression level variations.The compression levels vary inconsistently across rules:
- PE rules use
-9
(maximum compression)- SE rules use
-5
(moderate) and-9
(maximum)While this may be intentional for testing different scenarios, it should be documented for clarity.
Add comments explaining the compression level choices:
- compression_level="-9", + compression_level="-9", # Test maximum compression- compression_level="-5", + compression_level="-5", # Test moderate compressionAlso applies to: 91-91, 129-129, 171-171
18-18
: Consider parameterizing thread count for test flexibility.All rules use a fixed 32 threads, which may not be appropriate for all testing environments. Consider making this configurable.
Consider using a configurable thread count:
- threads: 32 + threads: config.get("trimmomatic_threads", 4)This allows users to override the thread count via config files while maintaining a reasonable default.
Also applies to: 43-43, 67-67, 92-92, 111-111, 132-132, 151-151, 172-172
128-129
: Clarify compression level comment.The comment mentions compression levels
-11
which is not a standard gzip compression level (valid range is typically-1
to-9
).Correct the comment to reflect standard gzip compression levels:
- # compression levels from -0 to -9 and -11 - compression_level="-5", + # compression levels from -1 to -9 (1=fast, 9=best compression) + compression_level="-5",
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (3)
bio/trimmomatic/test/reads/a.1.fastq.gz
is excluded by!**/*.gz
bio/trimmomatic/test/reads/a.2.fastq.gz
is excluded by!**/*.gz
bio/trimmomatic/test/reads/a.fastq.gz
is excluded by!**/*.gz
📒 Files selected for processing (22)
bio/trimmomatic/environment.linux-64.pin.txt
(1 hunks)bio/trimmomatic/environment.yaml
(1 hunks)bio/trimmomatic/meta.yaml
(1 hunks)bio/trimmomatic/pe/environment.yaml
(0 hunks)bio/trimmomatic/pe/meta.yaml
(0 hunks)bio/trimmomatic/pe/test/Snakefile
(0 hunks)bio/trimmomatic/pe/test/Snakefile_fq_fq
(0 hunks)bio/trimmomatic/pe/test/Snakefile_fq_gz
(0 hunks)bio/trimmomatic/pe/test/Snakefile_gz_fq
(0 hunks)bio/trimmomatic/pe/test/Snakefile_gz_gz
(0 hunks)bio/trimmomatic/pe/wrapper.py
(0 hunks)bio/trimmomatic/se/environment.yaml
(0 hunks)bio/trimmomatic/se/meta.yaml
(0 hunks)bio/trimmomatic/se/test/Snakefile
(0 hunks)bio/trimmomatic/se/test/Snakefile_fq_fq
(0 hunks)bio/trimmomatic/se/test/Snakefile_fq_gz
(0 hunks)bio/trimmomatic/se/test/Snakefile_gz_fq
(0 hunks)bio/trimmomatic/se/test/Snakefile_gz_gz
(0 hunks)bio/trimmomatic/se/wrapper.py
(0 hunks)bio/trimmomatic/test/Snakefile
(1 hunks)bio/trimmomatic/wrapper.py
(1 hunks)test_wrappers.py
(1 hunks)
💤 Files with no reviewable changes (16)
- bio/trimmomatic/se/meta.yaml
- bio/trimmomatic/pe/test/Snakefile
- bio/trimmomatic/se/environment.yaml
- bio/trimmomatic/pe/meta.yaml
- bio/trimmomatic/se/test/Snakefile_fq_fq
- bio/trimmomatic/se/test/Snakefile
- bio/trimmomatic/se/test/Snakefile_fq_gz
- bio/trimmomatic/pe/test/Snakefile_fq_gz
- bio/trimmomatic/pe/environment.yaml
- bio/trimmomatic/se/test/Snakefile_gz_gz
- bio/trimmomatic/pe/test/Snakefile_gz_gz
- bio/trimmomatic/se/wrapper.py
- bio/trimmomatic/pe/test/Snakefile_fq_fq
- bio/trimmomatic/pe/test/Snakefile_gz_fq
- bio/trimmomatic/se/test/Snakefile_gz_fq
- bio/trimmomatic/pe/wrapper.py
🧰 Additional context used
📓 Path-based instructions (2)
`**/*.py`: Do not try to improve formatting. Do not suggest type annotations for functions that are defined inside of functions or methods. Do not suggest type annotation of the `s...
**/*.py
: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theself
argument of methods.
Do not suggest type annotation of thecls
argument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturn
statement.
bio/trimmomatic/wrapper.py
test_wrappers.py
`**/wrapper.py`: Do not complain about use of undefined variable called `snakemake`.
**/wrapper.py
: Do not complain about use of undefined variable calledsnakemake
.
bio/trimmomatic/wrapper.py
🪛 Ruff (0.11.9)
bio/trimmomatic/wrapper.py
11-11: Undefined name snakemake
(F821)
12-12: Undefined name snakemake
(F821)
13-13: Undefined name snakemake
(F821)
14-14: Undefined name snakemake
(F821)
15-15: Undefined name snakemake
(F821)
18-18: Undefined name snakemake
(F821)
20-20: Undefined name snakemake
(F821)
20-20: Undefined name snakemake
(F821)
22-22: Undefined name snakemake
(F821)
23-23: Undefined name snakemake
(F821)
24-24: Undefined name snakemake
(F821)
25-25: Undefined name snakemake
(F821)
29-29: Undefined name snakemake
(F821)
30-30: Undefined name snakemake
(F821)
🪛 YAMLlint (1.37.1)
bio/trimmomatic/meta.yaml
[error] 12-12: trailing spaces
(trailing-spaces)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: testing
- GitHub Check: Summary
🔇 Additional comments (8)
bio/trimmomatic/environment.yaml (1)
1-7
: LGTM! Clean environment specification.The conda environment configuration follows best practices with proper channel ordering and includes the updated Trimmomatic version (0.39) along with the required snakemake-wrapper-utils dependency.
bio/trimmomatic/meta.yaml (1)
8-21
: Well-structured metadata with clear conditional requirements.The input/output specifications clearly indicate when certain files apply (for PE mode), and the parameters are well-documented. The conditional notation helps users understand the requirements for different modes.
🧰 Tools
🪛 YAMLlint (1.37.1)
[error] 12-12: trailing spaces
(trailing-spaces)
bio/trimmomatic/environment.linux-64.pin.txt (1)
1-79
: Excellent pinned environment for reproducibility.The pinned environment file provides exact package versions with build hashes, ensuring reproducible builds. The comprehensive dependency list includes all necessary components including Trimmomatic 0.39 and snakemake-wrapper-utils 0.7.2, matching the main environment specification.
bio/trimmomatic/wrapper.py (3)
18-31
: Well-designed mode detection and file assignment logic.The automatic detection of PE vs SE mode based on the presence of
r2
input is intuitive and correctly assigns the appropriate input/output files for each mode. The PE mode properly handles all four output files (paired and unpaired), while SE mode uses the simpler single input/output structure.🧰 Tools
🪛 Ruff (0.11.9)
18-18: Undefined name
snakemake
(F821)
20-20: Undefined name
snakemake
(F821)
20-20: Undefined name
snakemake
(F821)
22-22: Undefined name
snakemake
(F821)
23-23: Undefined name
snakemake
(F821)
24-24: Undefined name
snakemake
(F821)
25-25: Undefined name
snakemake
(F821)
29-29: Undefined name
snakemake
(F821)
30-30: Undefined name
snakemake
(F821)
11-15
: Good parameter extraction from snakemake context.The parameter extraction correctly handles optional parameters with sensible defaults and properly constructs the trimmer command line arguments. The use of
snakemake_wrapper_utils.java.get_java_opts
is appropriate for Java-based tools.🧰 Tools
🪛 Ruff (0.11.9)
11-11: Undefined name
snakemake
(F821)
12-12: Undefined name
snakemake
(F821)
13-13: Undefined name
snakemake
(F821)
14-14: Undefined name
snakemake
(F821)
15-15: Undefined name
snakemake
(F821)
1-42
: Note: Static analysis warnings about undefined 'snakemake' are expected.The static analysis tool correctly identifies that the
snakemake
variable is undefined, but this is expected behavior in Snakemake wrapper scripts where thesnakemake
object is provided by the Snakemake execution context.🧰 Tools
🪛 Ruff (0.11.9)
11-11: Undefined name
snakemake
(F821)
12-12: Undefined name
snakemake
(F821)
13-13: Undefined name
snakemake
(F821)
14-14: Undefined name
snakemake
(F821)
15-15: Undefined name
snakemake
(F821)
18-18: Undefined name
snakemake
(F821)
20-20: Undefined name
snakemake
(F821)
20-20: Undefined name
snakemake
(F821)
22-22: Undefined name
snakemake
(F821)
23-23: Undefined name
snakemake
(F821)
24-24: Undefined name
snakemake
(F821)
25-25: Undefined name
snakemake
(F821)
29-29: Undefined name
snakemake
(F821)
30-30: Undefined name
snakemake
(F821)
test_wrappers.py (1)
4560-4567
: LGTM! Comprehensive test coverage for the unified wrapper.The test targets effectively cover all combinations of single-end/paired-end modes with different compression formats (fq/fq, gz/fq, fq/gz, gz/gz). This aligns well with the PR's objective of consolidating the separate PE and SE wrappers into a unified solution.
bio/trimmomatic/test/Snakefile (1)
1-177
: LGTM: Comprehensive test coverage for compression combinations.The Snakefile provides excellent test coverage for all input/output compression combinations across both paired-end and single-end modes. The rule structure is well-organized and the consistent parameter patterns make the tests maintainable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
bio/trimmomatic/wrapper.py (1)
11-15
: Unused parameter detected.The
compression_level
parameter is extracted but never used in the shell command or elsewhere in the wrapper.Consider removing the unused parameter or incorporating it into the Trimmomatic command if compression level control is intended:
-compression_level = snakemake.params.get("compression_level", "-5")
🧰 Tools
🪛 Ruff (0.11.9)
11-11: Undefined name
snakemake
(F821)
12-12: Undefined name
snakemake
(F821)
13-13: Undefined name
snakemake
(F821)
14-14: Undefined name
snakemake
(F821)
15-15: Undefined name
snakemake
(F821)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
bio/trimmomatic/test/Snakefile
(1 hunks)bio/trimmomatic/wrapper.py
(1 hunks)test_wrappers.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- test_wrappers.py
- bio/trimmomatic/test/Snakefile
🧰 Additional context used
📓 Path-based instructions (2)
`**/*.py`: Do not try to improve formatting. Do not suggest type annotations for functions that are defined inside of functions or methods. Do not suggest type annotation of the `s...
**/*.py
: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theself
argument of methods.
Do not suggest type annotation of thecls
argument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturn
statement.
bio/trimmomatic/wrapper.py
`**/wrapper.py`: Do not complain about use of undefined variable called `snakemake`.
**/wrapper.py
: Do not complain about use of undefined variable calledsnakemake
.
bio/trimmomatic/wrapper.py
🪛 Ruff (0.11.9)
bio/trimmomatic/wrapper.py
11-11: Undefined name snakemake
(F821)
12-12: Undefined name snakemake
(F821)
13-13: Undefined name snakemake
(F821)
14-14: Undefined name snakemake
(F821)
15-15: Undefined name snakemake
(F821)
18-18: Undefined name snakemake
(F821)
20-20: Undefined name snakemake
(F821)
20-20: Undefined name snakemake
(F821)
22-22: Undefined name snakemake
(F821)
23-23: Undefined name snakemake
(F821)
24-24: Undefined name snakemake
(F821)
25-25: Undefined name snakemake
(F821)
29-29: Undefined name snakemake
(F821)
30-30: Undefined name snakemake
(F821)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: testing
- GitHub Check: docs
- GitHub Check: Summary
🔇 Additional comments (2)
bio/trimmomatic/wrapper.py (2)
1-8
: LGTM! Clean header and imports.The author information, imports, and overall structure look good for a unified Trimmomatic wrapper.
33-42
:❓ Verification inconclusive
Verify shell command execution with current variable formatting.
The shell command template looks correct, but it depends on the fix for the file variable formatting issue identified above. Once the file variables are properly formatted as strings, this should execute correctly.
After applying the fix for file formatting, test that the generated shell command produces valid Trimmomatic syntax for both PE and SE modes.
🏁 Script executed:
#!/bin/bash # Description: Search for any existing Trimmomatic wrapper tests to understand expected command format # Look for test files that might show expected Trimmomatic command structure fd -t f -e py test | xargs rg -l "trimmomatic" -iLength of output: 66
🏁 Script executed:
#!/bin/bash # Show context around 'trimmomatic' invocations in test_wrappers.py rg -n "trimmomatic" -A5 test_wrappers.pyLength of output: 317
🏁 Script executed:
#!/bin/bash # Display context for PE and SE tests to capture expected Trimmomatic invocations echo "=== PE Test ===" rg -n "def test_trimmomatic_pe" -A15 -B5 test_wrappers.py echo "=== SE Test ===" rg -n "def test_trimmomatic_se" -A15 -B5 test_wrappers.pyLength of output: 969
🏁 Script executed:
#!/bin/bash # Display the Trimmomatic wrapper implementation around the shell invocation rg -n -C5 "trimmomatic" -n bio/trimmomatic/wrapper.pyLength of output: 302
Verify Trimmomatic wrapper command generation in both SE and PE modes
- bio/trimmomatic/wrapper.py (lines 33–42): Ensure
in_files
andout_files
are converted to plain strings (or joined lists) so that{in_files}
/{out_files}
expand to valid space-separated file paths.- test_wrappers.py:
- We have a
test_trimmomatic_pe
covering the paired-end invocation.- Add a corresponding
test_trimmomatic_se
to assert the single-end command (trimmomatic SE <in> <out> …
) is formed correctly.Once file formatting is fixed, manually run the wrapper for both modes to confirm the generated shell commands match Trimmomatic’s expected syntax.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
bio/trimmomatic/wrapper.py (1)
18-30
:⚠️ Potential issueCritical issue: Lists cannot be directly interpolated in shell commands.
Both
in_files
andout_files
are assigned as lists in both PE and SE modes, but the shell command at lines 38-39 tries to interpolate them directly. This will result in shell commands liketrimmomatic PE ['file1', 'file2']
instead of the expectedtrimmomatic PE file1 file2
.Apply this fix to convert lists to space-separated strings:
if snakemake.input.get("r2", False): trim_mode = "PE" - in_files = [snakemake.input.r1, snakemake.input.r2] - out_files = [ - snakemake.output.r1, - snakemake.output.r1_unpaired, - snakemake.output.r2, - snakemake.output.r2_unpaired, - ] + in_files = f"{snakemake.input.r1} {snakemake.input.r2}" + out_files = f"{snakemake.output.r1} {snakemake.output.r1_unpaired} {snakemake.output.r2} {snakemake.output.r2_unpaired}" else: trim_mode = "SE" - in_files = [snakemake.input[0]] - out_files = [snakemake.output[0]] + in_files = str(snakemake.input[0]) + out_files = str(snakemake.output[0])🧰 Tools
🪛 Ruff (0.11.9)
18-18: Undefined name
snakemake
(F821)
20-20: Undefined name
snakemake
(F821)
20-20: Undefined name
snakemake
(F821)
22-22: Undefined name
snakemake
(F821)
23-23: Undefined name
snakemake
(F821)
24-24: Undefined name
snakemake
(F821)
25-25: Undefined name
snakemake
(F821)
29-29: Undefined name
snakemake
(F821)
30-30: Undefined name
snakemake
(F821)
🧹 Nitpick comments (1)
bio/trimmomatic/wrapper.py (1)
11-15
: Unused compression_level parameter.The
compression_level
parameter is extracted but never used in the shell command. Since this PR removes parallel gzip functionality, this parameter might be obsolete.Consider removing the unused parameter:
-compression_level = snakemake.params.get("compression_level", "-5")
🧰 Tools
🪛 Ruff (0.11.9)
11-11: Undefined name
snakemake
(F821)
12-12: Undefined name
snakemake
(F821)
13-13: Undefined name
snakemake
(F821)
14-14: Undefined name
snakemake
(F821)
15-15: Undefined name
snakemake
(F821)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
bio/trimmomatic/wrapper.py
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
`**/*.py`: Do not try to improve formatting. Do not suggest type annotations for functions that are defined inside of functions or methods. Do not suggest type annotation of the `s...
**/*.py
: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theself
argument of methods.
Do not suggest type annotation of thecls
argument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturn
statement.
bio/trimmomatic/wrapper.py
`**/wrapper.py`: Do not complain about use of undefined variable called `snakemake`.
**/wrapper.py
: Do not complain about use of undefined variable calledsnakemake
.
bio/trimmomatic/wrapper.py
🪛 Ruff (0.11.9)
bio/trimmomatic/wrapper.py
11-11: Undefined name snakemake
(F821)
12-12: Undefined name snakemake
(F821)
13-13: Undefined name snakemake
(F821)
14-14: Undefined name snakemake
(F821)
15-15: Undefined name snakemake
(F821)
18-18: Undefined name snakemake
(F821)
20-20: Undefined name snakemake
(F821)
20-20: Undefined name snakemake
(F821)
22-22: Undefined name snakemake
(F821)
23-23: Undefined name snakemake
(F821)
24-24: Undefined name snakemake
(F821)
25-25: Undefined name snakemake
(F821)
29-29: Undefined name snakemake
(F821)
30-30: Undefined name snakemake
(F821)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: testing
- GitHub Check: docs
- GitHub Check: Summary
🔇 Additional comments (2)
bio/trimmomatic/wrapper.py (2)
1-9
: LGTM!The imports and file header are properly structured. The use of
snakemake_wrapper_utils.java
for Java options handling is a good practice.
33-42
: Shell command structure looks correct.The shell command properly uses
snakemake.threads
directly and includes all necessary parameters. Once the file list formatting issue is resolved, this should work correctly.
Current parallel compression/decompression with
pigz
is no longer supported in latest versions ofTrimmomatic
and it breaks MultiQC reporting. Since it seems that the next version of Trimmomatic will have parallel compression, this issues removes it, bumps versions, refactors code, merges the two PE/SE wrappers, and iproves docs.Closes #778
Closes #869
Fixes #961
QC
snakemake-wrappers
.While the contributions guidelines are more extensive, please particularly ensure that:
test.py
was updated to call any added or updated example rules in aSnakefile
input:
andoutput:
file paths in the rules can be chosen arbitrarilyinput:
oroutput:
)tempfile.gettempdir()
points tometa.yaml
contains a link to the documentation of the respective tool or command underurl:
Summary by CodeRabbit
New Features
Refactor
Tests