Skip to content

feat: bwameth mem and mem2 #3728

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
May 16, 2025
Merged

feat: bwameth mem and mem2 #3728

merged 12 commits into from
May 16, 2025

Conversation

tdayris
Copy link
Contributor

@tdayris tdayris commented Mar 19, 2025

Make bwameth available.

QC

While the contributions guidelines are more extensive, please particularly ensure that:

  • test_wrappers.py was updated to call any added or updated example rules in a Snakefile
  • input: and output: file paths in the rules can be chosen arbitrarily
  • wherever possible, command line arguments are inferred and set automatically (e.g. based on file extensions in input: or output:)
  • temporary files are either written to a unique hidden folder in the working directory, or (better) stored where the Python function tempfile.gettempdir() points to
  • the meta.yaml contains a link to the documentation of the respective tool or command under url:
  • conda environments use a minimal amount of channels and packages, in recommended ordering

Summary by CodeRabbit

  • New Features

    • Introduced new conda environment configuration files for Linux and bioinformatics workflows, ensuring reproducible setups.
    • Released a Snakemake wrapper for the bwameth tool, facilitating alignment with optional sorting and flexible input handling.
    • Added sample genomic references and sequencing data for testing and demonstration purposes.
    • Provided multiple test rules for bwameth with different sorting options to validate diverse usage scenarios.
  • Tests

    • Expanded test workflows with new functions to validate the bwameth functionality across various configurations.

Copy link
Contributor

coderabbitai bot commented Mar 19, 2025

📝 Walkthrough

"""

Walkthrough

This pull request introduces multiple new configuration and test files to set up and validate the bioinformatics tool bwa-meth memx. It adds conda environment definitions, metadata, and a Snakemake wrapper with corresponding rules to run various alignment and sorting configurations. Additionally, several test files—including a sample FASTQ and related genome reference and auxiliary files—are provided to ensure reproducibility and correct functionality. A new test function in test_wrappers.py is also added to validate the wrapper execution with different sorting options.

Changes

File(s) Change Summary
bio/bwameth/memx/environment.linux-64.pin.txt
bio/bwameth/memx/environment.yaml
New conda environment files: one with explicit package URLs for Linux 64-bit and one specifying channels and dependencies for the bioinformatics workflow.
bio/bwameth/memx/meta.yaml New metadata definition for bwa-meth memx, detailing tool information, input/output specifications, parameters, and usage notes.
bio/bwameth/memx/test/A.fastq
bio/bwameth/memx/test/Snakefile
New test files including a sample FASTQ and a Snakefile with four rules to test alignment execution with different sorting options (none, Picard, or Samtools).
bio/bwameth/memx/test/mem/...
bio/bwameth/memx/test/mem2/...
New genome reference FASTA files and associated auxiliary files (*.c2t, *.c2t.amb, *.c2t.ann) providing test data and annotations for genomic alignments.
bio/bwameth/memx/wrapper.py New Snakemake wrapper script for executing bwa-meth alignments with optional sorting, handling input validation, threading, and command construction.
test_wrappers.py New test function test_bwameth_mem added to run Snakemake with the bio/bwameth/memx wrapper, testing multiple BAM output scenarios with different sorting configurations.

Sequence Diagram(s)

sequenceDiagram
  participant S as Snakemake
  participant W as Wrapper
  participant A as BWA-Meth
  participant ST as Sorting Tool

  S->>W: Trigger alignment rule (e.g., picard/samtools)
  W->>W: Validate inputs & parameters
  alt No Sorting
    W->>A: Execute alignment command
  else Sorting Requested
    W->>A: Execute alignment command
    A->>ST: Pipe output for sorting
    ST->>W: Return sorted BAM
  end
  W->>S: Output final BAM file
Loading

Suggested reviewers

  • fgvieira
    """

Note

⚡️ AI Code Reviews for VS Code, Cursor, Windsurf

CodeRabbit now has a plugin for VS Code, Cursor and Windsurf. This brings AI code reviews directly in the code editor. Each commit is reviewed immediately, finding bugs before the PR is raised. Seamless context handoff to your AI code agent ensures that you can easily incorporate review feedback.
Learn more here.


Note

⚡️ Faster reviews with caching

CodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 16th. To opt out, configure Review - Disable Cache at either the organization or repository level. If you prefer to disable all data retention across your organization, simply turn off the Data Retention setting under your Organization Settings.
Enjoy the performance boost—your workflow just got faster.

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (8)
bio/bwameth/memx/environment.linux-64.pin.txt (1)

1-92: Comprehensive Environment Pinning File
This file lists explicit package URLs (with version hashes) to reproduce a Linux-64 conda environment. Please verify that each pinned dependency and its corresponding URL are exactly what your tests require and remain compatible with the rest of your workflow. Consider adding a brief note or script to help update these pins periodically.

bio/bwameth/memx/test/mem/genome.fasta.bwameth.c2t.ann (1)

1-5: Annotation File Documentation Enhancement
The file correctly contains numeric annotation data interleaved with identifiers and (null) placeholders. For future maintainability and clarity (especially for collaborators or when revisiting this test data), consider adding a header or inline comment that explains the meaning and order of the fields.

bio/bwameth/memx/test/mem2/genome.fasta.bwameth.c2t.ann (1)

1-5: Annotation Data for mem2 – Consider Adding Context
The file provides the needed numerical annotations for the genome sequences in the mem2 test context. Similar to its counterpart in the mem directory, a brief comment or header describing the column structure would improve clarity and aid maintainers in understanding the structure at a glance.

bio/bwameth/memx/environment.yaml (2)

1-4: Channel Order Consideration

The channels are listed as “conda-forge”, “bioconda”, then “nodefaults”. While this ordering might be intentional, note that when using nodefaults the common practice is often to list it first (to entirely exclude the default channels) or confirm that the current order aligns with the intended precedence of packages. Please verify that this ordering meets your environment resolution strategy.


5-10: Consistent Version Pinning Syntax

The dependencies are clearly specified with pinned versions, which is excellent for reproducibility. As a nitpick, consider removing the spaces around the equals sign to conform with common conda conventions (e.g., using bwameth=0.2.7 instead of bwameth =0.2.7).

Below is a diff suggestion for consistency:

-  - bwameth =0.2.7
-  - bwa-mem2 =2.2.1
-  - samtools =1.21
-  - picard-slim =3.3.0
-  - snakemake-wrapper-utils =0.7.0
+  - bwameth=0.2.7
+  - bwa-mem2=2.2.1
+  - samtools=1.21
+  - picard-slim=3.3.0
+  - snakemake-wrapper-utils=0.7.0
bio/bwameth/memx/wrapper.py (2)

14-14: Remove unused import

The time module is imported but never used in the code.

-import os.path
-import time
+import os.path
🧰 Tools
🪛 Ruff (0.8.2)

14-14: time imported but unused

Remove unused import: time

(F401)


127-130: Consider using a ternary operator for more concise code

The if-else block can be simplified using a ternary operator.

-    if isinstance(fq, list):
-        fastq_files = ",".join(fq)
-    else:
-        fastq_files = fq
+    fastq_files = ",".join(fq) if isinstance(fq, list) else fq
🧰 Tools
🪛 Ruff (0.8.2)

127-130: Use ternary operator fastq_files = ",".join(fq) if isinstance(fq, list) else fq instead of if-else-block

Replace if-else-block with fastq_files = ",".join(fq) if isinstance(fq, list) else fq

(SIM108)

bio/bwameth/memx/meta.yaml (1)

19-19: Fix typo in notes section

There's a grammatical error in the notes section.

-  Both SAM sorting and compression and handled by either samtools or picard. No duplication analysis is performed by this wrapper.
+  Both SAM sorting and compression are handled by either samtools or picard. No duplication analysis is performed by this wrapper.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7cf5fe0 and 129d045.

📒 Files selected for processing (15)
  • bio/bwameth/memx/environment.linux-64.pin.txt (1 hunks)
  • bio/bwameth/memx/environment.yaml (1 hunks)
  • bio/bwameth/memx/meta.yaml (1 hunks)
  • bio/bwameth/memx/test/A.fastq (1 hunks)
  • bio/bwameth/memx/test/Snakefile (1 hunks)
  • bio/bwameth/memx/test/mem/genome.fasta (1 hunks)
  • bio/bwameth/memx/test/mem/genome.fasta.bwameth.c2t (1 hunks)
  • bio/bwameth/memx/test/mem/genome.fasta.bwameth.c2t.amb (1 hunks)
  • bio/bwameth/memx/test/mem/genome.fasta.bwameth.c2t.ann (1 hunks)
  • bio/bwameth/memx/test/mem2/genome.fasta (1 hunks)
  • bio/bwameth/memx/test/mem2/genome.fasta.bwameth.c2t (1 hunks)
  • bio/bwameth/memx/test/mem2/genome.fasta.bwameth.c2t.amb (1 hunks)
  • bio/bwameth/memx/test/mem2/genome.fasta.bwameth.c2t.ann (1 hunks)
  • bio/bwameth/memx/wrapper.py (1 hunks)
  • test_wrappers.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
`**/*.py`: Do not try to improve formatting. Do not suggest type annotations for functions that are defined inside of functions or methods. Do not suggest type annotation of the `s...

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

  • bio/bwameth/memx/wrapper.py
  • test_wrappers.py
`**/wrapper.py`: Do not complain about use of undefined variable called `snakemake`.

**/wrapper.py: Do not complain about use of undefined variable called snakemake.

  • bio/bwameth/memx/wrapper.py
🪛 Ruff (0.8.2)
bio/bwameth/memx/wrapper.py

14-14: time imported but unused

Remove unused import: time

(F401)


22-22: Undefined name snakemake

(F821)


23-23: Undefined name snakemake

(F821)


25-25: Undefined name snakemake

(F821)


26-26: Undefined name snakemake

(F821)


27-27: Undefined name snakemake

(F821)


28-28: Undefined name snakemake

(F821)


29-29: Undefined name snakemake

(F821)


31-31: Undefined name snakemake

(F821)


32-32: Undefined name snakemake

(F821)


54-54: Undefined name snakemake

(F821)


55-55: Undefined name snakemake

(F821)


78-78: Undefined name snakemake

(F821)


121-121: Undefined name snakemake

(F821)


122-122: Undefined name snakemake

(F821)


123-123: Undefined name snakemake

(F821)


127-130: Use ternary operator fastq_files = ",".join(fq) if isinstance(fq, list) else fq instead of if-else-block

Replace if-else-block with fastq_files = ",".join(fq) if isinstance(fq, list) else fq

(SIM108)

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: testing
  • GitHub Check: Summary
🔇 Additional comments (8)
bio/bwameth/memx/test/mem2/genome.fasta.bwameth.c2t.amb (1)

1-2: Verify Auxiliary File Format and Newline Handling
The file contains a single non-header line "40 2 0", followed by an extra (possibly empty) line. Please confirm that this format (including the trailing newline or blank line) exactly matches the requirements expected by bwameth.

bio/bwameth/memx/test/mem/genome.fasta.bwameth.c2t.amb (1)

1-2: Confirm Duplication of Auxiliary File
This file’s content is identical to the one in the mem2 directory ("40 2 0"). Please ensure that duplicating this file in both locations is intentional and that your workflow correctly distinguishes between them if they serve different test cases.

bio/bwameth/memx/test/mem/genome.fasta (1)

1-2: Clarify FASTA File Structure with Embedded File Reference
The file starts with a relative path ("../mem2/genome.fasta") on the first line and then provides the nucleotide sequence ("GCTAGCTCAGAAAAAAAAAA") on the next line. Standard FASTA files typically begin with a header line that starts with a > character. Please verify that this mixed-format is intentional as part of the test setup. If this structure is meant to point to an external reference while also including a fallback, consider documenting the rationale within the file or associated documentation.

bio/bwameth/memx/test/mem2/genome.fasta (1)

1-2: FASTA Format is Correct
The file correctly defines a FASTA entry with the header >Sheila and its sequence. This adheres to the expected FASTA format for downstream tools.

bio/bwameth/memx/test/A.fastq (1)

1-4: FASTQ File Format Validity
The FASTQ file follows the correct minimal format with a header (line 1), sequence (line 2), a '+' separator (line 3), and quality scores (line 4). Ensure that this test input meets all expectations for downstream processing in the wrapper.

bio/bwameth/memx/test/mem/genome.fasta.bwameth.c2t (1)

1-4: FASTA Content Integrity
The FASTA file is correctly formatted with proper headers and sequence lines. The entries (>rSheila and >fSheila) and the respective sequences look consistent with test inputs for the bwameth alignment functionality.

bio/bwameth/memx/test/mem2/genome.fasta.bwameth.c2t (1)

1-4: mem2 FASTA File Consistency
This FASTA file mirrors the data in the ‘mem’ directory. Its proper formatting and consistent entries confirm that it fulfills the purpose of providing a reference for the bwameth tests in a separate test scenario.

test_wrappers.py (1)

511-556: Well-structured test function for the bwameth memx wrapper

The implementation follows the established testing pattern in this file, properly testing four different output scenarios of the bwameth memx wrapper: standard mem alignment, mem2 alignment, and two different sorting methods (picard and samtools).

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (1)
bio/bwameth/memx/test/Snakefile (1)

1-107: 💡 Verification agent

🧩 Analysis chain

Verify memory and computational requirements

All test rules specify only 2 threads, which might be insufficient for real-world bwameth processing, especially with sorting operations. Consider documenting recommended resource requirements for production use.


🌐 Web query:

What are the recommended computational resources for running bwameth on real sequencing data?

💡 Result:

For running bwa-meth on real sequencing data, computational resources should be optimized based on workflow type (CPU- or GPU-based), input data size, and alignment efficiency. Here's a synthesis of recommendations from benchmarking studies and pipeline documentation:


1. CPU-Based Workflow (Standard bwa-meth)

  • CPU Cores:

    • Utilize 16–32 threads for alignment acceleration ([7][8]). For example:
      bwameth.py --threads 16 --reference $REF $FQ1 $FQ2
      Multi-threading reduces runtime by parallelizing alignment steps.
  • Memory:

    • 8–16 GB per thread is typically sufficient. Larger datasets (e.g., mammalian genomes) may require up to 72–100 GB total memory ([1][9]). The tool avoids writing temporary files to disk, streamlining memory use ([7][3]).

2. GPU-Accelerated Workflow (Parabricks fq2bam_meth)

  • GPUs:
    • Use 1–2 NVIDIA GPUs for significant speed improvements. Enable GPU acceleration with --use_gpu ([4][12]).
  • CPU/GPU Configuration:
    • Allocate 16 CPU threads per GPU (default) and adjust --bwa-nstreams for parallel streams per GPU ([4]). Example:
      fq2bam_meth --num-gpus 2 --bwa-cpu-thread-pool 16
  • Memory:
    • GPU-based workflows reduce CPU memory pressure but require ≥32 GB system memory for large datasets.

3. Runtime and Scalability

  • Performance:
    • Bwa-meth is ~6–10x faster than Bismark on CPU-based workflows ([3][11]). For example:
      • Human WGBS data: ~30–40 million reads/hour on 16 threads ([9][6]).
    • GPU acceleration via Parabricks further reduces runtime by 50–70% ([4][14]).

4. Special Considerations

  • Reference Genome Indexing:
    • Requires a one-time conversion of the reference genome (CPU-bound). Allow 4–8 cores and 16–32 GB memory for indexing ([7][4]).
  • Data Size:
    • For paired-end datasets, storage requirements drop 3x due to on-the-fly read conversion ([3][7]).
  • Pipeline Integration:
    • When using nf-core/methylseq, resource defaults are auto-adjusted, but manual tuning may be needed for large-scale studies (e.g., --max_memory '100.GB') ([1][8]).

Benchmark Comparison

Metric Bwa-meth (CPU) Parabricks (GPU)
Speed (reads/hour) ~30M ~50M+
Memory/Thread 8–16 GB 8–12 GB
Typical Use Case Moderate-scale Large-scale

For best practices, combine sufficient CPU threads with available GPU resources for optimal alignment efficiency ([4][9]). Always validate resource allocation using test datasets.

Citations:


Documentation Resource Recommendation Update

The test rules in bio/bwameth/memx/test/Snakefile are intentionally configured with only 2 threads to keep testing fast. However, for production use with real sequencing data, benchmarking and pipeline documentation indicate that:

  • CPU-based workflows should leverage 16–32 threads with approximately 8–16 GB memory per thread (totaling up to 72–100 GB for large genomes).
  • GPU-accelerated workflows require additional considerations (e.g., 1–2 NVIDIA GPUs with ~16 CPU threads per GPU).

Please update the pipeline documentation to clearly differentiate between the minimal settings used for tests and the recommended resource requirements for production environments.

♻️ Duplicate comments (1)
bio/bwameth/memx/wrapper.py (1)

149-150: ⚠️ Potential issue

Remove duplicate fastq files parameter

The fastq_files variable appears twice in the command, which will cause bwameth to receive the same input files twice.

- " {fastq_files} " + pipe_cmd + " ) {log}"
+ " " + pipe_cmd + " ) {log}"
🧹 Nitpick comments (5)
bio/bwameth/memx/wrapper.py (5)

14-14: Remove unused import

The time module is imported but never used in the code.

- import time
🧰 Tools
🪛 Ruff (0.8.2)

14-14: time imported but unused

Remove unused import: time

(F401)


54-56: Consider using a more robust method to extract index prefix

The current implementation assumes that indices follow a specific pattern. It might be better to handle edge cases where the file path doesn't end with the expected extension pattern.

- index = os.path.commonprefix(snakemake.input.idx)
- index_extensions = set(idx[len(index) - 4 :] for idx in snakemake.input.idx)
+ # Extract the base name without the suffix
+ index = os.path.commonprefix(snakemake.input.idx)
+ # Make sure index does not end with a partial filename
+ if any(not os.path.exists(index + ext) for ext in {".c2t"}):
+     index = os.path.dirname(index) + "/"
+ index_extensions = set(idx[len(index):] for idx in snakemake.input.idx)
🧰 Tools
🪛 Ruff (0.8.2)

54-54: Undefined name snakemake

(F821)


55-55: Undefined name snakemake

(F821)


58-65: Improve error message clarity

The current error message logic is a bit confusing. It triggers when both sets of missing indices are non-empty, but the comment suggests it should trigger when neither set is empty (i.e., both aligners are missing indices).

- if (len(missing_mem2_idx) > 0) and (len(missing_mem_idx) > 0):
-     # If ONE of the sets is empty, then all the index for one of the aligners
-     # are present.
-     raise ValueError(
-         "Missing required indices for both bwa-mem and bwa-mem2. No aligner can "
-         f"be launched. bwa-mem misses {missing_mem_idx}, while bwa-mem2 misses "
-         f"{missing_mem2_idx}. Please provide missing files. {index_extensions}"
-     )
+ if len(missing_mem2_idx) > 0 and len(missing_mem_idx) > 0:
+     # If both sets have missing indices, then neither aligner can be used
+     raise ValueError(
+         "Missing required indices for both bwa-mem and bwa-mem2. No aligner can "
+         f"be launched. bwa-mem misses {missing_mem_idx}, while bwa-mem2 misses "
+         f"{missing_mem2_idx}. Please provide complete indices for at least one aligner."
+     )

127-130: Simplify code using ternary operator

The if-else block for handling single-ended fastq files can be simplified using a ternary operator.

- if isinstance(fq, list):
-     fastq_files = ",".join(fq)
- else:
-     fastq_files = fq
+ fastq_files = ",".join(fq) if isinstance(fq, list) else fq
🧰 Tools
🪛 Ruff (0.8.2)

127-130: Use ternary operator fastq_files = ",".join(fq) if isinstance(fq, list) else fq instead of if-else-block

Replace if-else-block with fastq_files = ",".join(fq) if isinstance(fq, list) else fq

(SIM108)


134-140: Simplify paired-end fastq file handling

Similar to the single-ended case, the paired-end fastq file handling can also be simplified.

- if isinstance(fq1, list) and isinstance(fq2, list):
-     if len(fq1) == len(fq2):
-         fastq_files = f'{",".join(fq1)} {",".join(fq2)}'
-     else:
-         raise ValueError("Please provide as many R1 and R2 files")
- else:
-     fastq_files = f"{fq1} {fq2}"
+ if isinstance(fq1, list) and isinstance(fq2, list):
+     if len(fq1) != len(fq2):
+         raise ValueError("Please provide the same number of R1 and R2 files")
+     fastq_files = f'{",".join(fq1)} {",".join(fq2)}'
+ else:
+     fastq_files = f"{fq1} {fq2}"
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 129d045 and 115ed64.

📒 Files selected for processing (2)
  • bio/bwameth/memx/test/Snakefile (1 hunks)
  • bio/bwameth/memx/wrapper.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
`**/*.py`: Do not try to improve formatting. Do not suggest type annotations for functions that are defined inside of functions or methods. Do not suggest type annotation of the `s...

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

  • bio/bwameth/memx/wrapper.py
`**/wrapper.py`: Do not complain about use of undefined variable called `snakemake`.

**/wrapper.py: Do not complain about use of undefined variable called snakemake.

  • bio/bwameth/memx/wrapper.py
🪛 Ruff (0.8.2)
bio/bwameth/memx/wrapper.py

14-14: time imported but unused

Remove unused import: time

(F401)


22-22: Undefined name snakemake

(F821)


23-23: Undefined name snakemake

(F821)


25-25: Undefined name snakemake

(F821)


26-26: Undefined name snakemake

(F821)


27-27: Undefined name snakemake

(F821)


28-28: Undefined name snakemake

(F821)


29-29: Undefined name snakemake

(F821)


31-31: Undefined name snakemake

(F821)


32-32: Undefined name snakemake

(F821)


54-54: Undefined name snakemake

(F821)


55-55: Undefined name snakemake

(F821)


78-78: Undefined name snakemake

(F821)


121-121: Undefined name snakemake

(F821)


122-122: Undefined name snakemake

(F821)


123-123: Undefined name snakemake

(F821)


127-130: Use ternary operator fastq_files = ",".join(fq) if isinstance(fq, list) else fq instead of if-else-block

Replace if-else-block with fastq_files = ",".join(fq) if isinstance(fq, list) else fq

(SIM108)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: testing
  • GitHub Check: docs
  • GitHub Check: Summary
🔇 Additional comments (4)
bio/bwameth/memx/test/Snakefile (4)

1-26: LGTM - Test rule for bwameth with mem indices

The test rule is well-structured with appropriate inputs, outputs, parameters, and logging. The rule correctly specifies the reference genome and its associated BWA-MEM indices.


28-53: LGTM - Test rule for bwameth with mem2 indices

The test rule is well-defined with the correct reference genome and BWA-MEM2 indices. It properly includes the additional .0123 index file required by BWA-MEM2.


55-80: LGTM - Test rule for bwameth with Picard sorting

The test configuration correctly sets up sorting with Picard, using the same reference and indices as the mem test.


82-107: LGTM - Test rule for bwameth with Samtools sorting

The test configuration properly implements sorting with Samtools, using the BWA-MEM indices.

@tdayris
Copy link
Contributor Author

tdayris commented Apr 30, 2025

Last quality check failed with :

2025-04-30T12:09:45.4026718Z Can we download a custom installer by URL?
2025-04-30T12:09:45.4027719Z Can we download Miniconda?
2025-04-30T12:09:45.4029568Z Can we download Miniforge?
2025-04-30T12:09:45.4031418Z ... will download Miniforge.
2025-04-30T12:09:45.4034004Z Will fetch Miniforge3 latest from https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh
2025-04-30T12:09:45.4035003Z Ensuring Installer...
2025-04-30T12:09:45.4038546Z Checking for cached Miniforge3@latest...
2025-04-30T12:09:45.4044946Z Did not find Miniforge3-Linux-x86_64.sh latest in cache
2025-04-30T12:09:45.4885294Z ##[endgroup]
2025-04-30T12:09:45.4908482Z ##[error]Unexpected HTTP response: 400

How do I solve this kind of issue ?

Co-authored-by: Filipe G. Vieira <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
bio/bwameth/memx/wrapper.py (1)

149-150: ⚠️ Potential issue

Remove duplicate fastq files parameter.

The fastq files appear to be passed twice to bwameth.py - once explicitly and once through the pipe command.

- " {fastq_files} " + pipe_cmd + " ) {log}"
+ " " + pipe_cmd + " ) {log}"

This was previously identified in a review and you confirmed it would be fixed.

🧹 Nitpick comments (6)
bio/bwameth/memx/wrapper.py (6)

14-14: Remove unused import.

The time module is imported but never used in the code.

- import os.path
- import time
+ import os.path
🧰 Tools
🪛 Ruff (0.8.2)

14-14: time imported but unused

Remove unused import: time

(F401)


34-66: Consider simplifying index validation.

The current approach for validating indices is complex and may be unnecessary, as bwameth itself could handle index validation. This complexity increases maintenance overhead and may make the wrapper less adaptable to future changes in the tool's indexing requirements.

You might consider simplifying this to only check if index files exist rather than validating specific extensions, or letting bwameth handle this validation itself.

🧰 Tools
🪛 Ruff (0.8.2)

54-54: Undefined name snakemake

(F821)


55-55: Undefined name snakemake

(F821)


75-76: Redundant conditional thread assignment.

The thread assignment logic is duplicated in both the "none" and "samtools" sort branches, which creates maintenance overhead.

Consider extracting this logic to a separate section before the sort type conditionals:

# Set thread options for samtools if available
if samtools_threads >= 1:
    samtools_opts += " --threads {samtools_threads} "

Also applies to: 88-89


127-130: Simplify with ternary operator.

The if-else block can be simplified using a ternary operator.

- if isinstance(fq, list):
-     fastq_files = ",".join(fq)
- else:
-     fastq_files = fq
+ fastq_files = ",".join(fq) if isinstance(fq, list) else fq
🧰 Tools
🪛 Ruff (0.8.2)

127-130: Use ternary operator fastq_files = ",".join(fq) if isinstance(fq, list) else fq instead of if-else-block

Replace if-else-block with fastq_files = ",".join(fq) if isinstance(fq, list) else fq

(SIM108)


134-140: Simplify paired-end file handling.

Similar to the single-end case, the code for handling paired-end files can be simplified.

- if isinstance(fq1, list) and isinstance(fq2, list):
-     if len(fq1) == len(fq2):
-         fastq_files = f'{",".join(fq1)} {",".join(fq2)}'
-     else:
-         raise ValueError("Please provide as many R1 and R2 files")
- else:
-     fastq_files = f"{fq1} {fq2}"
+ if isinstance(fq1, list) and isinstance(fq2, list):
+     if len(fq1) != len(fq2):
+         raise ValueError("Please provide as many R1 and R2 files")
+     fq1_str = ",".join(fq1)
+     fq2_str = ",".join(fq2)
+ else:
+     fq1_str = fq1
+     fq2_str = fq2
+ fastq_files = f"{fq1_str} {fq2_str}"

23-23: Use correct log format for standard output and error.

The current log formatting may not capture all relevant output.

- log = snakemake.log_fmt_shell(stdout=False, stderr=True)
+ log = snakemake.log_fmt_shell(stdout=True, stderr=True)

Consider capturing both stdout and stderr to ensure all relevant output is logged.

🧰 Tools
🪛 Ruff (0.8.2)

23-23: Undefined name snakemake

(F821)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8905e67 and bea7982.

📒 Files selected for processing (1)
  • bio/bwameth/memx/wrapper.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
`**/*.py`: Do not try to improve formatting. Do not suggest type annotations for functions that are defined inside of functions or methods. Do not suggest type annotation of the `s...

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

  • bio/bwameth/memx/wrapper.py
`**/wrapper.py`: Do not complain about use of undefined variable called `snakemake`.

**/wrapper.py: Do not complain about use of undefined variable called snakemake.

  • bio/bwameth/memx/wrapper.py
🪛 Ruff (0.8.2)
bio/bwameth/memx/wrapper.py

14-14: time imported but unused

Remove unused import: time

(F401)


22-22: Undefined name snakemake

(F821)


23-23: Undefined name snakemake

(F821)


25-25: Undefined name snakemake

(F821)


26-26: Undefined name snakemake

(F821)


27-27: Undefined name snakemake

(F821)


28-28: Undefined name snakemake

(F821)


29-29: Undefined name snakemake

(F821)


31-31: Undefined name snakemake

(F821)


32-32: Undefined name snakemake

(F821)


54-54: Undefined name snakemake

(F821)


55-55: Undefined name snakemake

(F821)


78-78: Undefined name snakemake

(F821)


121-121: Undefined name snakemake

(F821)


122-122: Undefined name snakemake

(F821)


123-123: Undefined name snakemake

(F821)


127-130: Use ternary operator fastq_files = ",".join(fq) if isinstance(fq, list) else fq instead of if-else-block

Replace if-else-block with fastq_files = ",".join(fq) if isinstance(fq, list) else fq

(SIM108)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: testing
  • GitHub Check: docs
  • GitHub Check: Summary

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
bio/bwameth/memx/wrapper.py (3)

13-14: Remove unused imports.

The imports for os.path and time are not used anywhere in the code and can be safely removed to improve code cleanliness.

-import os.path
-import time
🧰 Tools
🪛 Ruff (0.8.2)

13-13: os.path imported but unused

Remove unused import: os.path

(F401)


14-14: time imported but unused

Remove unused import: time

(F401)


93-96: Consider using a ternary operator for better conciseness.

The if-else block can be replaced with a more concise ternary operator expression for better readability.

-    if isinstance(fq, list):
-        fastq_files = ",".join(fq)
-    else:
-        fastq_files = fq
+    fastq_files = ",".join(fq) if isinstance(fq, list) else fq
🧰 Tools
🪛 Ruff (0.8.2)

93-96: Use ternary operator fastq_files = ",".join(fq) if isinstance(fq, list) else fq instead of if-else-block

Replace if-else-block with fastq_files = ",".join(fq) if isinstance(fq, list) else fq

(SIM108)


100-106: Consider simplifying the paired-end fastq handling.

Similar to the single-end case, you can use ternary operators here for more concise code. Also, the redundant type checking for both fq1 and fq2 can be simplified.

-    if isinstance(fq1, list) and isinstance(fq2, list):
-        if len(fq1) == len(fq2):
-            fastq_files = f'{",".join(fq1)} {",".join(fq2)}'
-        else:
-            raise ValueError("Please provide as many R1 and R2 files")
-    else:
-        fastq_files = f"{fq1} {fq2}"
+    if isinstance(fq1, list) or isinstance(fq2, list):
+        if not (isinstance(fq1, list) and isinstance(fq2, list)):
+            # Convert the non-list into a single-item list
+            fq1 = [fq1] if not isinstance(fq1, list) else fq1
+            fq2 = [fq2] if not isinstance(fq2, list) else fq2
+        if len(fq1) != len(fq2):
+            raise ValueError("Please provide equal number of R1 and R2 files")
+        fastq_files = f'{",".join(fq1)} {",".join(fq2)}'
+    else:
+        fastq_files = f"{fq1} {fq2}"
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6cc63b0 and 6e955a6.

📒 Files selected for processing (2)
  • bio/bwameth/memx/environment.linux-64.pin.txt (1 hunks)
  • bio/bwameth/memx/wrapper.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • bio/bwameth/memx/environment.linux-64.pin.txt
🧰 Additional context used
📓 Path-based instructions (2)
`**/*.py`: Do not try to improve formatting. Do not suggest type annotations for functions that are defined inside of functions or methods. Do not suggest type annotation of the `s...

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

  • bio/bwameth/memx/wrapper.py
`**/wrapper.py`: Do not complain about use of undefined variable called `snakemake`.

**/wrapper.py: Do not complain about use of undefined variable called snakemake.

  • bio/bwameth/memx/wrapper.py
🧠 Learnings (1)
bio/bwameth/memx/wrapper.py (1)
Learnt from: tdayris
PR: snakemake/snakemake-wrappers#3728
File: bio/bwameth/memx/wrapper.py:32-32
Timestamp: 2025-05-15T07:35:14.369Z
Learning: In the bwameth wrapper, when `snakemake.threads` equals 1, `samtools_threads` is intentionally set to 0. This design allows the wrapper to skip adding the `--threads` parameter to samtools, prioritizing the single available thread for the primary alignment task.
🪛 Ruff (0.8.2)
bio/bwameth/memx/wrapper.py

13-13: os.path imported but unused

Remove unused import: os.path

(F401)


14-14: time imported but unused

Remove unused import: time

(F401)


22-22: Undefined name snakemake

(F821)


23-23: Undefined name snakemake

(F821)


25-25: Undefined name snakemake

(F821)


26-26: Undefined name snakemake

(F821)


27-27: Undefined name snakemake

(F821)


28-28: Undefined name snakemake

(F821)


29-29: Undefined name snakemake

(F821)


31-31: Undefined name snakemake

(F821)


32-32: Undefined name snakemake

(F821)


44-44: Undefined name snakemake

(F821)


46-46: Undefined name snakemake

(F821)


49-49: Undefined name snakemake

(F821)


87-87: Undefined name snakemake

(F821)


88-88: Undefined name snakemake

(F821)


89-89: Undefined name snakemake

(F821)


93-96: Use ternary operator fastq_files = ",".join(fq) if isinstance(fq, list) else fq instead of if-else-block

Replace if-else-block with fastq_files = ",".join(fq) if isinstance(fq, list) else fq

(SIM108)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Summary
🔇 Additional comments (2)
bio/bwameth/memx/wrapper.py (2)

32-32: Appropriate thread allocation for samtools.

The current implementation intentionally allows samtools_threads to be 0 when there's only 1 thread available. This is correct as it prioritizes the single available thread for the primary alignment task.

🧰 Tools
🪛 Ruff (0.8.2)

32-32: Undefined name snakemake

(F821)


117-117: Shell command looks good.

The shell command correctly uses the fastq files and pipe command format from previous refactoring discussions.

@fgvieira fgvieira merged commit 63f5e87 into snakemake:master May 16, 2025
6 checks passed
johanneskoester pushed a commit that referenced this pull request May 19, 2025
🤖 I have created a release *beep* *boop*
---


##
[6.2.0](v6.1.1...v6.2.0)
(2025-05-19)


### Features

* add aria2c wrapper
([#2725](#2725))
([a45763b](a45763b))
* bwameth mem and mem2
([#3728](#3728))
([63f5e87](63f5e87))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants