-
Notifications
You must be signed in to change notification settings - Fork 194
feat: bwameth mem and mem2 #3728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 Walkthrough""" WalkthroughThis pull request introduces multiple new configuration and test files to set up and validate the bioinformatics tool bwa-meth memx. It adds conda environment definitions, metadata, and a Snakemake wrapper with corresponding rules to run various alignment and sorting configurations. Additionally, several test files—including a sample FASTQ and related genome reference and auxiliary files—are provided to ensure reproducibility and correct functionality. A new test function in Changes
Sequence Diagram(s)sequenceDiagram
participant S as Snakemake
participant W as Wrapper
participant A as BWA-Meth
participant ST as Sorting Tool
S->>W: Trigger alignment rule (e.g., picard/samtools)
W->>W: Validate inputs & parameters
alt No Sorting
W->>A: Execute alignment command
else Sorting Requested
W->>A: Execute alignment command
A->>ST: Pipe output for sorting
ST->>W: Return sorted BAM
end
W->>S: Output final BAM file
Suggested reviewers
Note ⚡️ AI Code Reviews for VS Code, Cursor, WindsurfCodeRabbit now has a plugin for VS Code, Cursor and Windsurf. This brings AI code reviews directly in the code editor. Each commit is reviewed immediately, finding bugs before the PR is raised. Seamless context handoff to your AI code agent ensures that you can easily incorporate review feedback. Note ⚡️ Faster reviews with cachingCodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 16th. To opt out, configure ✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (8)
bio/bwameth/memx/environment.linux-64.pin.txt (1)
1-92
: Comprehensive Environment Pinning File
This file lists explicit package URLs (with version hashes) to reproduce a Linux-64 conda environment. Please verify that each pinned dependency and its corresponding URL are exactly what your tests require and remain compatible with the rest of your workflow. Consider adding a brief note or script to help update these pins periodically.bio/bwameth/memx/test/mem/genome.fasta.bwameth.c2t.ann (1)
1-5
: Annotation File Documentation Enhancement
The file correctly contains numeric annotation data interleaved with identifiers and(null)
placeholders. For future maintainability and clarity (especially for collaborators or when revisiting this test data), consider adding a header or inline comment that explains the meaning and order of the fields.bio/bwameth/memx/test/mem2/genome.fasta.bwameth.c2t.ann (1)
1-5
: Annotation Data for mem2 – Consider Adding Context
The file provides the needed numerical annotations for the genome sequences in the mem2 test context. Similar to its counterpart in themem
directory, a brief comment or header describing the column structure would improve clarity and aid maintainers in understanding the structure at a glance.bio/bwameth/memx/environment.yaml (2)
1-4
: Channel Order ConsiderationThe channels are listed as “conda-forge”, “bioconda”, then “nodefaults”. While this ordering might be intentional, note that when using
nodefaults
the common practice is often to list it first (to entirely exclude the default channels) or confirm that the current order aligns with the intended precedence of packages. Please verify that this ordering meets your environment resolution strategy.
5-10
: Consistent Version Pinning SyntaxThe dependencies are clearly specified with pinned versions, which is excellent for reproducibility. As a nitpick, consider removing the spaces around the equals sign to conform with common conda conventions (e.g., using
bwameth=0.2.7
instead ofbwameth =0.2.7
).Below is a diff suggestion for consistency:
- - bwameth =0.2.7 - - bwa-mem2 =2.2.1 - - samtools =1.21 - - picard-slim =3.3.0 - - snakemake-wrapper-utils =0.7.0 + - bwameth=0.2.7 + - bwa-mem2=2.2.1 + - samtools=1.21 + - picard-slim=3.3.0 + - snakemake-wrapper-utils=0.7.0bio/bwameth/memx/wrapper.py (2)
14-14
: Remove unused importThe
time
module is imported but never used in the code.-import os.path -import time +import os.path🧰 Tools
🪛 Ruff (0.8.2)
14-14:
time
imported but unusedRemove unused import:
time
(F401)
127-130
: Consider using a ternary operator for more concise codeThe if-else block can be simplified using a ternary operator.
- if isinstance(fq, list): - fastq_files = ",".join(fq) - else: - fastq_files = fq + fastq_files = ",".join(fq) if isinstance(fq, list) else fq🧰 Tools
🪛 Ruff (0.8.2)
127-130: Use ternary operator
fastq_files = ",".join(fq) if isinstance(fq, list) else fq
instead ofif
-else
-blockReplace
if
-else
-block withfastq_files = ",".join(fq) if isinstance(fq, list) else fq
(SIM108)
bio/bwameth/memx/meta.yaml (1)
19-19
: Fix typo in notes sectionThere's a grammatical error in the notes section.
- Both SAM sorting and compression and handled by either samtools or picard. No duplication analysis is performed by this wrapper. + Both SAM sorting and compression are handled by either samtools or picard. No duplication analysis is performed by this wrapper.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (15)
bio/bwameth/memx/environment.linux-64.pin.txt
(1 hunks)bio/bwameth/memx/environment.yaml
(1 hunks)bio/bwameth/memx/meta.yaml
(1 hunks)bio/bwameth/memx/test/A.fastq
(1 hunks)bio/bwameth/memx/test/Snakefile
(1 hunks)bio/bwameth/memx/test/mem/genome.fasta
(1 hunks)bio/bwameth/memx/test/mem/genome.fasta.bwameth.c2t
(1 hunks)bio/bwameth/memx/test/mem/genome.fasta.bwameth.c2t.amb
(1 hunks)bio/bwameth/memx/test/mem/genome.fasta.bwameth.c2t.ann
(1 hunks)bio/bwameth/memx/test/mem2/genome.fasta
(1 hunks)bio/bwameth/memx/test/mem2/genome.fasta.bwameth.c2t
(1 hunks)bio/bwameth/memx/test/mem2/genome.fasta.bwameth.c2t.amb
(1 hunks)bio/bwameth/memx/test/mem2/genome.fasta.bwameth.c2t.ann
(1 hunks)bio/bwameth/memx/wrapper.py
(1 hunks)test_wrappers.py
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
`**/*.py`: Do not try to improve formatting. Do not suggest type annotations for functions that are defined inside of functions or methods. Do not suggest type annotation of the `s...
**/*.py
: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theself
argument of methods.
Do not suggest type annotation of thecls
argument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturn
statement.
bio/bwameth/memx/wrapper.py
test_wrappers.py
`**/wrapper.py`: Do not complain about use of undefined variable called `snakemake`.
**/wrapper.py
: Do not complain about use of undefined variable calledsnakemake
.
bio/bwameth/memx/wrapper.py
🪛 Ruff (0.8.2)
bio/bwameth/memx/wrapper.py
14-14: time
imported but unused
Remove unused import: time
(F401)
22-22: Undefined name snakemake
(F821)
23-23: Undefined name snakemake
(F821)
25-25: Undefined name snakemake
(F821)
26-26: Undefined name snakemake
(F821)
27-27: Undefined name snakemake
(F821)
28-28: Undefined name snakemake
(F821)
29-29: Undefined name snakemake
(F821)
31-31: Undefined name snakemake
(F821)
32-32: Undefined name snakemake
(F821)
54-54: Undefined name snakemake
(F821)
55-55: Undefined name snakemake
(F821)
78-78: Undefined name snakemake
(F821)
121-121: Undefined name snakemake
(F821)
122-122: Undefined name snakemake
(F821)
123-123: Undefined name snakemake
(F821)
127-130: Use ternary operator fastq_files = ",".join(fq) if isinstance(fq, list) else fq
instead of if
-else
-block
Replace if
-else
-block with fastq_files = ",".join(fq) if isinstance(fq, list) else fq
(SIM108)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: testing
- GitHub Check: Summary
🔇 Additional comments (8)
bio/bwameth/memx/test/mem2/genome.fasta.bwameth.c2t.amb (1)
1-2
: Verify Auxiliary File Format and Newline Handling
The file contains a single non-header line"40 2 0"
, followed by an extra (possibly empty) line. Please confirm that this format (including the trailing newline or blank line) exactly matches the requirements expected by bwameth.bio/bwameth/memx/test/mem/genome.fasta.bwameth.c2t.amb (1)
1-2
: Confirm Duplication of Auxiliary File
This file’s content is identical to the one in themem2
directory ("40 2 0"
). Please ensure that duplicating this file in both locations is intentional and that your workflow correctly distinguishes between them if they serve different test cases.bio/bwameth/memx/test/mem/genome.fasta (1)
1-2
: Clarify FASTA File Structure with Embedded File Reference
The file starts with a relative path ("../mem2/genome.fasta"
) on the first line and then provides the nucleotide sequence ("GCTAGCTCAGAAAAAAAAAA"
) on the next line. Standard FASTA files typically begin with a header line that starts with a>
character. Please verify that this mixed-format is intentional as part of the test setup. If this structure is meant to point to an external reference while also including a fallback, consider documenting the rationale within the file or associated documentation.bio/bwameth/memx/test/mem2/genome.fasta (1)
1-2
: FASTA Format is Correct
The file correctly defines a FASTA entry with the header>Sheila
and its sequence. This adheres to the expected FASTA format for downstream tools.bio/bwameth/memx/test/A.fastq (1)
1-4
: FASTQ File Format Validity
The FASTQ file follows the correct minimal format with a header (line 1), sequence (line 2), a '+' separator (line 3), and quality scores (line 4). Ensure that this test input meets all expectations for downstream processing in the wrapper.bio/bwameth/memx/test/mem/genome.fasta.bwameth.c2t (1)
1-4
: FASTA Content Integrity
The FASTA file is correctly formatted with proper headers and sequence lines. The entries (>rSheila
and>fSheila
) and the respective sequences look consistent with test inputs for thebwameth
alignment functionality.bio/bwameth/memx/test/mem2/genome.fasta.bwameth.c2t (1)
1-4
: mem2 FASTA File Consistency
This FASTA file mirrors the data in the ‘mem’ directory. Its proper formatting and consistent entries confirm that it fulfills the purpose of providing a reference for thebwameth
tests in a separate test scenario.test_wrappers.py (1)
511-556
: Well-structured test function for the bwameth memx wrapperThe implementation follows the established testing pattern in this file, properly testing four different output scenarios of the bwameth memx wrapper: standard mem alignment, mem2 alignment, and two different sorting methods (picard and samtools).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
bio/bwameth/memx/test/Snakefile (1)
1-107
: 💡 Verification agent🧩 Analysis chain
Verify memory and computational requirements
All test rules specify only 2 threads, which might be insufficient for real-world bwameth processing, especially with sorting operations. Consider documenting recommended resource requirements for production use.
🌐 Web query:
What are the recommended computational resources for running bwameth on real sequencing data?
💡 Result:
For running bwa-meth on real sequencing data, computational resources should be optimized based on workflow type (CPU- or GPU-based), input data size, and alignment efficiency. Here's a synthesis of recommendations from benchmarking studies and pipeline documentation:
1. CPU-Based Workflow (Standard bwa-meth)
CPU Cores:
- Utilize 16–32 threads for alignment acceleration ([7][8]). For example:
Multi-threading reduces runtime by parallelizing alignment steps.bwameth.py --threads 16 --reference $REF $FQ1 $FQ2Memory:
- 8–16 GB per thread is typically sufficient. Larger datasets (e.g., mammalian genomes) may require up to 72–100 GB total memory ([1][9]). The tool avoids writing temporary files to disk, streamlining memory use ([7][3]).
2. GPU-Accelerated Workflow (Parabricks fq2bam_meth)
- GPUs:
- Use 1–2 NVIDIA GPUs for significant speed improvements. Enable GPU acceleration with
--use_gpu
([4][12]).- CPU/GPU Configuration:
- Allocate 16 CPU threads per GPU (default) and adjust
--bwa-nstreams
for parallel streams per GPU ([4]). Example:fq2bam_meth --num-gpus 2 --bwa-cpu-thread-pool 16- Memory:
- GPU-based workflows reduce CPU memory pressure but require ≥32 GB system memory for large datasets.
3. Runtime and Scalability
- Performance:
- Bwa-meth is ~6–10x faster than Bismark on CPU-based workflows ([3][11]). For example:
- Human WGBS data: ~30–40 million reads/hour on 16 threads ([9][6]).
- GPU acceleration via Parabricks further reduces runtime by 50–70% ([4][14]).
4. Special Considerations
- Reference Genome Indexing:
- Requires a one-time conversion of the reference genome (CPU-bound). Allow 4–8 cores and 16–32 GB memory for indexing ([7][4]).
- Data Size:
- For paired-end datasets, storage requirements drop 3x due to on-the-fly read conversion ([3][7]).
- Pipeline Integration:
- When using nf-core/methylseq, resource defaults are auto-adjusted, but manual tuning may be needed for large-scale studies (e.g.,
--max_memory '100.GB'
) ([1][8]).
Benchmark Comparison
Metric Bwa-meth (CPU) Parabricks (GPU) Speed (reads/hour) ~30M ~50M+ Memory/Thread 8–16 GB 8–12 GB Typical Use Case Moderate-scale Large-scale For best practices, combine sufficient CPU threads with available GPU resources for optimal alignment efficiency ([4][9]). Always validate resource allocation using test datasets.
Citations:
- 1: https://nf-co.re/methylseq/latest/docs/usage/
- 2: https://www.biostars.org/p/240454/
- 3: https://github.com/brentp/bwa-meth/blob/master/paper/document.tex
- 4: https://docs.nvidia.com/clara/parabricks/4.3.0/documentation/tooldocs/man_fq2bam_meth.html
- 5: https://www.seqanswers.com/forum/bioinformatics/bioinformatics-aa/35157-bwa-meth-and-bs-seq-comparison
- 6: https://pmc.ncbi.nlm.nih.gov/articles/PMC8963483/
- 7: https://github.com/brentp/bwa-meth
- 8: https://nf-co.re/methylseq/1.3/docs/usage/
- 9: https://pmc.ncbi.nlm.nih.gov/articles/PMC8161933/
- 10: https://qcb.ucla.edu/wp-content/uploads/sites/14/2021/10/giab033.pdf
- 11: https://motbx.eatris.eu/resource/methylation-bisulfite-sequencing-analysis-pipeline-using-bismark-or-bwa-meth-methyldackel/
- 12: https://github.com/nf-core/methylseq/blob/master/docs/usage.md
- 13: https://nf-co.re/methylseq/1.4/docs/usage
- 14: https://academic.oup.com/biomethods/article/8/1/bpad012/7208866
- 15: https://academic.oup.com/bib/article-pdf/doi/10.1093/bib/bbab021/40261387/bbab021.pdf
- 16: https://hpc.nih.gov/apps/bwa-meth.html
- 17: https://resources.nvidia.com/en-us-hc-genomics/hc-solution-overview-4?lx=ssBGs2
- 18: https://arxiv.org/pdf/1401.1129
- 19: https://docs.nvidia.com/clara/parabricks/4.3.0/WhatsNew/NewTools_v4.3.0-1.html
Documentation Resource Recommendation Update
The test rules in
bio/bwameth/memx/test/Snakefile
are intentionally configured with only 2 threads to keep testing fast. However, for production use with real sequencing data, benchmarking and pipeline documentation indicate that:
- CPU-based workflows should leverage 16–32 threads with approximately 8–16 GB memory per thread (totaling up to 72–100 GB for large genomes).
- GPU-accelerated workflows require additional considerations (e.g., 1–2 NVIDIA GPUs with ~16 CPU threads per GPU).
Please update the pipeline documentation to clearly differentiate between the minimal settings used for tests and the recommended resource requirements for production environments.
♻️ Duplicate comments (1)
bio/bwameth/memx/wrapper.py (1)
149-150
:⚠️ Potential issueRemove duplicate fastq files parameter
The
fastq_files
variable appears twice in the command, which will cause bwameth to receive the same input files twice.- " {fastq_files} " + pipe_cmd + " ) {log}" + " " + pipe_cmd + " ) {log}"
🧹 Nitpick comments (5)
bio/bwameth/memx/wrapper.py (5)
14-14
: Remove unused importThe
time
module is imported but never used in the code.- import time
🧰 Tools
🪛 Ruff (0.8.2)
14-14:
time
imported but unusedRemove unused import:
time
(F401)
54-56
: Consider using a more robust method to extract index prefixThe current implementation assumes that indices follow a specific pattern. It might be better to handle edge cases where the file path doesn't end with the expected extension pattern.
- index = os.path.commonprefix(snakemake.input.idx) - index_extensions = set(idx[len(index) - 4 :] for idx in snakemake.input.idx) + # Extract the base name without the suffix + index = os.path.commonprefix(snakemake.input.idx) + # Make sure index does not end with a partial filename + if any(not os.path.exists(index + ext) for ext in {".c2t"}): + index = os.path.dirname(index) + "/" + index_extensions = set(idx[len(index):] for idx in snakemake.input.idx)🧰 Tools
🪛 Ruff (0.8.2)
54-54: Undefined name
snakemake
(F821)
55-55: Undefined name
snakemake
(F821)
58-65
: Improve error message clarityThe current error message logic is a bit confusing. It triggers when both sets of missing indices are non-empty, but the comment suggests it should trigger when neither set is empty (i.e., both aligners are missing indices).
- if (len(missing_mem2_idx) > 0) and (len(missing_mem_idx) > 0): - # If ONE of the sets is empty, then all the index for one of the aligners - # are present. - raise ValueError( - "Missing required indices for both bwa-mem and bwa-mem2. No aligner can " - f"be launched. bwa-mem misses {missing_mem_idx}, while bwa-mem2 misses " - f"{missing_mem2_idx}. Please provide missing files. {index_extensions}" - ) + if len(missing_mem2_idx) > 0 and len(missing_mem_idx) > 0: + # If both sets have missing indices, then neither aligner can be used + raise ValueError( + "Missing required indices for both bwa-mem and bwa-mem2. No aligner can " + f"be launched. bwa-mem misses {missing_mem_idx}, while bwa-mem2 misses " + f"{missing_mem2_idx}. Please provide complete indices for at least one aligner." + )
127-130
: Simplify code using ternary operatorThe if-else block for handling single-ended fastq files can be simplified using a ternary operator.
- if isinstance(fq, list): - fastq_files = ",".join(fq) - else: - fastq_files = fq + fastq_files = ",".join(fq) if isinstance(fq, list) else fq🧰 Tools
🪛 Ruff (0.8.2)
127-130: Use ternary operator
fastq_files = ",".join(fq) if isinstance(fq, list) else fq
instead ofif
-else
-blockReplace
if
-else
-block withfastq_files = ",".join(fq) if isinstance(fq, list) else fq
(SIM108)
134-140
: Simplify paired-end fastq file handlingSimilar to the single-ended case, the paired-end fastq file handling can also be simplified.
- if isinstance(fq1, list) and isinstance(fq2, list): - if len(fq1) == len(fq2): - fastq_files = f'{",".join(fq1)} {",".join(fq2)}' - else: - raise ValueError("Please provide as many R1 and R2 files") - else: - fastq_files = f"{fq1} {fq2}" + if isinstance(fq1, list) and isinstance(fq2, list): + if len(fq1) != len(fq2): + raise ValueError("Please provide the same number of R1 and R2 files") + fastq_files = f'{",".join(fq1)} {",".join(fq2)}' + else: + fastq_files = f"{fq1} {fq2}"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
bio/bwameth/memx/test/Snakefile
(1 hunks)bio/bwameth/memx/wrapper.py
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
`**/*.py`: Do not try to improve formatting. Do not suggest type annotations for functions that are defined inside of functions or methods. Do not suggest type annotation of the `s...
**/*.py
: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theself
argument of methods.
Do not suggest type annotation of thecls
argument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturn
statement.
bio/bwameth/memx/wrapper.py
`**/wrapper.py`: Do not complain about use of undefined variable called `snakemake`.
**/wrapper.py
: Do not complain about use of undefined variable calledsnakemake
.
bio/bwameth/memx/wrapper.py
🪛 Ruff (0.8.2)
bio/bwameth/memx/wrapper.py
14-14: time
imported but unused
Remove unused import: time
(F401)
22-22: Undefined name snakemake
(F821)
23-23: Undefined name snakemake
(F821)
25-25: Undefined name snakemake
(F821)
26-26: Undefined name snakemake
(F821)
27-27: Undefined name snakemake
(F821)
28-28: Undefined name snakemake
(F821)
29-29: Undefined name snakemake
(F821)
31-31: Undefined name snakemake
(F821)
32-32: Undefined name snakemake
(F821)
54-54: Undefined name snakemake
(F821)
55-55: Undefined name snakemake
(F821)
78-78: Undefined name snakemake
(F821)
121-121: Undefined name snakemake
(F821)
122-122: Undefined name snakemake
(F821)
123-123: Undefined name snakemake
(F821)
127-130: Use ternary operator fastq_files = ",".join(fq) if isinstance(fq, list) else fq
instead of if
-else
-block
Replace if
-else
-block with fastq_files = ",".join(fq) if isinstance(fq, list) else fq
(SIM108)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: testing
- GitHub Check: docs
- GitHub Check: Summary
🔇 Additional comments (4)
bio/bwameth/memx/test/Snakefile (4)
1-26
: LGTM - Test rule for bwameth with mem indicesThe test rule is well-structured with appropriate inputs, outputs, parameters, and logging. The rule correctly specifies the reference genome and its associated BWA-MEM indices.
28-53
: LGTM - Test rule for bwameth with mem2 indicesThe test rule is well-defined with the correct reference genome and BWA-MEM2 indices. It properly includes the additional
.0123
index file required by BWA-MEM2.
55-80
: LGTM - Test rule for bwameth with Picard sortingThe test configuration correctly sets up sorting with Picard, using the same reference and indices as the mem test.
82-107
: LGTM - Test rule for bwameth with Samtools sortingThe test configuration properly implements sorting with Samtools, using the BWA-MEM indices.
Last quality check failed with :
How do I solve this kind of issue ? |
Co-authored-by: Filipe G. Vieira <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
bio/bwameth/memx/wrapper.py (1)
149-150
:⚠️ Potential issueRemove duplicate fastq files parameter.
The fastq files appear to be passed twice to bwameth.py - once explicitly and once through the pipe command.
- " {fastq_files} " + pipe_cmd + " ) {log}" + " " + pipe_cmd + " ) {log}"This was previously identified in a review and you confirmed it would be fixed.
🧹 Nitpick comments (6)
bio/bwameth/memx/wrapper.py (6)
14-14
: Remove unused import.The
time
module is imported but never used in the code.- import os.path - import time + import os.path🧰 Tools
🪛 Ruff (0.8.2)
14-14:
time
imported but unusedRemove unused import:
time
(F401)
34-66
: Consider simplifying index validation.The current approach for validating indices is complex and may be unnecessary, as bwameth itself could handle index validation. This complexity increases maintenance overhead and may make the wrapper less adaptable to future changes in the tool's indexing requirements.
You might consider simplifying this to only check if index files exist rather than validating specific extensions, or letting bwameth handle this validation itself.
🧰 Tools
🪛 Ruff (0.8.2)
54-54: Undefined name
snakemake
(F821)
55-55: Undefined name
snakemake
(F821)
75-76
: Redundant conditional thread assignment.The thread assignment logic is duplicated in both the "none" and "samtools" sort branches, which creates maintenance overhead.
Consider extracting this logic to a separate section before the sort type conditionals:
# Set thread options for samtools if available if samtools_threads >= 1: samtools_opts += " --threads {samtools_threads} "Also applies to: 88-89
127-130
: Simplify with ternary operator.The if-else block can be simplified using a ternary operator.
- if isinstance(fq, list): - fastq_files = ",".join(fq) - else: - fastq_files = fq + fastq_files = ",".join(fq) if isinstance(fq, list) else fq🧰 Tools
🪛 Ruff (0.8.2)
127-130: Use ternary operator
fastq_files = ",".join(fq) if isinstance(fq, list) else fq
instead ofif
-else
-blockReplace
if
-else
-block withfastq_files = ",".join(fq) if isinstance(fq, list) else fq
(SIM108)
134-140
: Simplify paired-end file handling.Similar to the single-end case, the code for handling paired-end files can be simplified.
- if isinstance(fq1, list) and isinstance(fq2, list): - if len(fq1) == len(fq2): - fastq_files = f'{",".join(fq1)} {",".join(fq2)}' - else: - raise ValueError("Please provide as many R1 and R2 files") - else: - fastq_files = f"{fq1} {fq2}" + if isinstance(fq1, list) and isinstance(fq2, list): + if len(fq1) != len(fq2): + raise ValueError("Please provide as many R1 and R2 files") + fq1_str = ",".join(fq1) + fq2_str = ",".join(fq2) + else: + fq1_str = fq1 + fq2_str = fq2 + fastq_files = f"{fq1_str} {fq2_str}"
23-23
: Use correct log format for standard output and error.The current log formatting may not capture all relevant output.
- log = snakemake.log_fmt_shell(stdout=False, stderr=True) + log = snakemake.log_fmt_shell(stdout=True, stderr=True)Consider capturing both stdout and stderr to ensure all relevant output is logged.
🧰 Tools
🪛 Ruff (0.8.2)
23-23: Undefined name
snakemake
(F821)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
bio/bwameth/memx/wrapper.py
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
`**/*.py`: Do not try to improve formatting. Do not suggest type annotations for functions that are defined inside of functions or methods. Do not suggest type annotation of the `s...
**/*.py
: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theself
argument of methods.
Do not suggest type annotation of thecls
argument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturn
statement.
bio/bwameth/memx/wrapper.py
`**/wrapper.py`: Do not complain about use of undefined variable called `snakemake`.
**/wrapper.py
: Do not complain about use of undefined variable calledsnakemake
.
bio/bwameth/memx/wrapper.py
🪛 Ruff (0.8.2)
bio/bwameth/memx/wrapper.py
14-14: time
imported but unused
Remove unused import: time
(F401)
22-22: Undefined name snakemake
(F821)
23-23: Undefined name snakemake
(F821)
25-25: Undefined name snakemake
(F821)
26-26: Undefined name snakemake
(F821)
27-27: Undefined name snakemake
(F821)
28-28: Undefined name snakemake
(F821)
29-29: Undefined name snakemake
(F821)
31-31: Undefined name snakemake
(F821)
32-32: Undefined name snakemake
(F821)
54-54: Undefined name snakemake
(F821)
55-55: Undefined name snakemake
(F821)
78-78: Undefined name snakemake
(F821)
121-121: Undefined name snakemake
(F821)
122-122: Undefined name snakemake
(F821)
123-123: Undefined name snakemake
(F821)
127-130: Use ternary operator fastq_files = ",".join(fq) if isinstance(fq, list) else fq
instead of if
-else
-block
Replace if
-else
-block with fastq_files = ",".join(fq) if isinstance(fq, list) else fq
(SIM108)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: testing
- GitHub Check: docs
- GitHub Check: Summary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
bio/bwameth/memx/wrapper.py (3)
13-14
: Remove unused imports.The imports for
os.path
andtime
are not used anywhere in the code and can be safely removed to improve code cleanliness.-import os.path -import time🧰 Tools
🪛 Ruff (0.8.2)
13-13:
os.path
imported but unusedRemove unused import:
os.path
(F401)
14-14:
time
imported but unusedRemove unused import:
time
(F401)
93-96
: Consider using a ternary operator for better conciseness.The if-else block can be replaced with a more concise ternary operator expression for better readability.
- if isinstance(fq, list): - fastq_files = ",".join(fq) - else: - fastq_files = fq + fastq_files = ",".join(fq) if isinstance(fq, list) else fq🧰 Tools
🪛 Ruff (0.8.2)
93-96: Use ternary operator
fastq_files = ",".join(fq) if isinstance(fq, list) else fq
instead ofif
-else
-blockReplace
if
-else
-block withfastq_files = ",".join(fq) if isinstance(fq, list) else fq
(SIM108)
100-106
: Consider simplifying the paired-end fastq handling.Similar to the single-end case, you can use ternary operators here for more concise code. Also, the redundant type checking for both fq1 and fq2 can be simplified.
- if isinstance(fq1, list) and isinstance(fq2, list): - if len(fq1) == len(fq2): - fastq_files = f'{",".join(fq1)} {",".join(fq2)}' - else: - raise ValueError("Please provide as many R1 and R2 files") - else: - fastq_files = f"{fq1} {fq2}" + if isinstance(fq1, list) or isinstance(fq2, list): + if not (isinstance(fq1, list) and isinstance(fq2, list)): + # Convert the non-list into a single-item list + fq1 = [fq1] if not isinstance(fq1, list) else fq1 + fq2 = [fq2] if not isinstance(fq2, list) else fq2 + if len(fq1) != len(fq2): + raise ValueError("Please provide equal number of R1 and R2 files") + fastq_files = f'{",".join(fq1)} {",".join(fq2)}' + else: + fastq_files = f"{fq1} {fq2}"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
bio/bwameth/memx/environment.linux-64.pin.txt
(1 hunks)bio/bwameth/memx/wrapper.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- bio/bwameth/memx/environment.linux-64.pin.txt
🧰 Additional context used
📓 Path-based instructions (2)
`**/*.py`: Do not try to improve formatting. Do not suggest type annotations for functions that are defined inside of functions or methods. Do not suggest type annotation of the `s...
**/*.py
: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theself
argument of methods.
Do not suggest type annotation of thecls
argument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturn
statement.
bio/bwameth/memx/wrapper.py
`**/wrapper.py`: Do not complain about use of undefined variable called `snakemake`.
**/wrapper.py
: Do not complain about use of undefined variable calledsnakemake
.
bio/bwameth/memx/wrapper.py
🧠 Learnings (1)
bio/bwameth/memx/wrapper.py (1)
Learnt from: tdayris
PR: snakemake/snakemake-wrappers#3728
File: bio/bwameth/memx/wrapper.py:32-32
Timestamp: 2025-05-15T07:35:14.369Z
Learning: In the bwameth wrapper, when `snakemake.threads` equals 1, `samtools_threads` is intentionally set to 0. This design allows the wrapper to skip adding the `--threads` parameter to samtools, prioritizing the single available thread for the primary alignment task.
🪛 Ruff (0.8.2)
bio/bwameth/memx/wrapper.py
13-13: os.path
imported but unused
Remove unused import: os.path
(F401)
14-14: time
imported but unused
Remove unused import: time
(F401)
22-22: Undefined name snakemake
(F821)
23-23: Undefined name snakemake
(F821)
25-25: Undefined name snakemake
(F821)
26-26: Undefined name snakemake
(F821)
27-27: Undefined name snakemake
(F821)
28-28: Undefined name snakemake
(F821)
29-29: Undefined name snakemake
(F821)
31-31: Undefined name snakemake
(F821)
32-32: Undefined name snakemake
(F821)
44-44: Undefined name snakemake
(F821)
46-46: Undefined name snakemake
(F821)
49-49: Undefined name snakemake
(F821)
87-87: Undefined name snakemake
(F821)
88-88: Undefined name snakemake
(F821)
89-89: Undefined name snakemake
(F821)
93-96: Use ternary operator fastq_files = ",".join(fq) if isinstance(fq, list) else fq
instead of if
-else
-block
Replace if
-else
-block with fastq_files = ",".join(fq) if isinstance(fq, list) else fq
(SIM108)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Summary
🔇 Additional comments (2)
bio/bwameth/memx/wrapper.py (2)
32-32
: Appropriate thread allocation for samtools.The current implementation intentionally allows
samtools_threads
to be 0 when there's only 1 thread available. This is correct as it prioritizes the single available thread for the primary alignment task.🧰 Tools
🪛 Ruff (0.8.2)
32-32: Undefined name
snakemake
(F821)
117-117
: Shell command looks good.The shell command correctly uses the fastq files and pipe command format from previous refactoring discussions.
🤖 I have created a release *beep* *boop* --- ## [6.2.0](v6.1.1...v6.2.0) (2025-05-19) ### Features * add aria2c wrapper ([#2725](#2725)) ([a45763b](a45763b)) * bwameth mem and mem2 ([#3728](#3728)) ([63f5e87](63f5e87)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
Make
bwameth
available.QC
snakemake-wrappers
.While the contributions guidelines are more extensive, please particularly ensure that:
test_wrappers.py
was updated to call any added or updated example rules in aSnakefile
input:
andoutput:
file paths in the rules can be chosen arbitrarilyinput:
oroutput:
)tempfile.gettempdir()
points tometa.yaml
contains a link to the documentation of the respective tool or command underurl:
Summary by CodeRabbit
New Features
bwameth
tool, facilitating alignment with optional sorting and flexible input handling.bwameth
with different sorting options to validate diverse usage scenarios.Tests
bwameth
functionality across various configurations.