-
Notifications
You must be signed in to change notification settings - Fork 205
feat: samtools_opts for picard markduplicates #4788
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughAdds passing of user-provided samtools options into the Picard markduplicates wrapper and a test Snakefile param that sets a samtools option for CRAM output. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
bio/picard/markduplicates/wrapper.py (1)
19-19: Consider cleaner string joining to avoid potential leading spaces.The current concatenation could produce a leading space when
samtools_optsis empty (e.g.,"" + " " + "..."). While this won't cause functional issues in shell commands, a cleaner approach would be to filter out empty strings.Apply this diff for cleaner string handling:
-samtools_opts = snakemake.params.get("samtools_opts", "") + " " + get_samtools_opts(snakemake) +samtools_opts = " ".join(filter(None, [snakemake.params.get("samtools_opts", ""), get_samtools_opts(snakemake)]))
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
bio/picard/markduplicates/test/Snakefile(1 hunks)bio/picard/markduplicates/wrapper.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
⚙️ CodeRabbit configuration file
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
Files:
bio/picard/markduplicates/wrapper.py
**/wrapper.py
⚙️ CodeRabbit configuration file
Do not complain about use of undefined variable called
snakemake.
Files:
bio/picard/markduplicates/wrapper.py
🧠 Learnings (2)
📚 Learning: 2024-11-21T10:23:03.427Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3478
File: bio/varlociraptor/estimate-alignment-properties/wrapper.py:5-12
Timestamp: 2024-11-21T10:23:03.427Z
Learning: In the Snakemake wrappers project, avoid suggesting extensive error handling or temporary file management in simple wrapper scripts when it may be unnecessary, to prevent overcomplicating the code.
Applied to files:
bio/picard/markduplicates/wrapper.py
📚 Learning: 2025-05-15T07:35:14.369Z
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3728
File: bio/bwameth/memx/wrapper.py:32-32
Timestamp: 2025-05-15T07:35:14.369Z
Learning: In the bwameth wrapper, when `snakemake.threads` equals 1, `samtools_threads` is intentionally set to 0. This design allows the wrapper to skip adding the `--threads` parameter to samtools, prioritizing the single available thread for the primary alignment task.
Applied to files:
bio/picard/markduplicates/wrapper.py
🪛 GitHub Actions: Code quality
bio/picard/markduplicates/wrapper.py
[error] 1-1: Black check failed. 1 file would be reformatted. Run 'black' to format code.
🪛 Ruff (0.14.8)
bio/picard/markduplicates/wrapper.py
19-19: Undefined name snakemake
(F821)
19-19: Undefined name snakemake
(F821)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: docs
- GitHub Check: testing
- GitHub Check: Summary
🔇 Additional comments (2)
bio/picard/markduplicates/test/Snakefile (1)
59-59: LGTM! Appropriate test for the new functionality.The test correctly exercises the new
samtools_optsparameter with a valid samtools option for controlling CRAM output version. This aligns with the PR's objective of ensuring compatibility with tools that don't support CRAM 3.1.bio/picard/markduplicates/wrapper.py (1)
1-64: > Likely an incorrect or invalid review comment.
Remove debug print statement from wrapper.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
bio/picard/markduplicates/wrapper.py (1)
19-21: Consider stripping to avoid extra spaces.The unconditional space separator can produce a leading space when
samtools_optsis empty or a trailing space whenget_samtools_opts(snakemake)returns an empty string. While harmless, stripping the result improves cleanliness.Apply this diff to strip extra whitespace:
-samtools_opts = ( - snakemake.params.get("samtools_opts", "") + " " + get_samtools_opts(snakemake) -) +samtools_opts = ( + snakemake.params.get("samtools_opts", "") + " " + get_samtools_opts(snakemake) +).strip()
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
bio/picard/markduplicates/wrapper.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
⚙️ CodeRabbit configuration file
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
Files:
bio/picard/markduplicates/wrapper.py
**/wrapper.py
⚙️ CodeRabbit configuration file
Do not complain about use of undefined variable called
snakemake.
Files:
bio/picard/markduplicates/wrapper.py
🧠 Learnings (8)
📓 Common learnings
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3496
File: bio/mtnucratio/test/Snakefile:2-6
Timestamp: 2024-11-26T08:31:00.099Z
Learning: In test files for Snakemake wrappers, such as `bio/mtnucratio/test/Snakefile`, hard-coded input and output paths are acceptable as examples and do not need to use wildcards to make paths flexible.
📚 Learning: 2024-11-21T10:23:03.427Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3478
File: bio/varlociraptor/estimate-alignment-properties/wrapper.py:5-12
Timestamp: 2024-11-21T10:23:03.427Z
Learning: In the Snakemake wrappers project, avoid suggesting extensive error handling or temporary file management in simple wrapper scripts when it may be unnecessary, to prevent overcomplicating the code.
Applied to files:
bio/picard/markduplicates/wrapper.py
📚 Learning: 2024-10-08T17:41:54.542Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3123
File: utils/datavzrd/wrapper.py:31-32
Timestamp: 2024-10-08T17:41:54.542Z
Learning: In `wrapper.py` scripts, do not flag the use of an undefined variable called `snakemake`.
Applied to files:
bio/picard/markduplicates/wrapper.py
📚 Learning: 2025-05-15T07:35:14.369Z
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3728
File: bio/bwameth/memx/wrapper.py:32-32
Timestamp: 2025-05-15T07:35:14.369Z
Learning: In the bwameth wrapper, when `snakemake.threads` equals 1, `samtools_threads` is intentionally set to 0. This design allows the wrapper to skip adding the `--threads` parameter to samtools, prioritizing the single available thread for the primary alignment task.
Applied to files:
bio/picard/markduplicates/wrapper.py
📚 Learning: 2025-05-15T09:46:23.189Z
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3728
File: bio/bwameth/memx/wrapper.py:55-55
Timestamp: 2025-05-15T09:46:23.189Z
Learning: In the `bio/bwameth/memx/wrapper.py` file, string interpolation for variables like `{samtools_threads}` is intentionally delayed using string placeholders that are resolved later with a single `format(**locals())` call rather than using f-strings for immediate interpolation.
Applied to files:
bio/picard/markduplicates/wrapper.py
📚 Learning: 2024-11-26T08:35:42.140Z
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3497
File: bio/sexdeterrmine/wrapper.py:23-26
Timestamp: 2024-11-26T08:35:42.140Z
Learning: In the `bio/sexdeterrmine/wrapper.py` file (Python), we rely on Samtools to handle input validation for the depth file provided by the user, so additional file existence checks are not necessary.
Applied to files:
bio/picard/markduplicates/wrapper.py
📚 Learning: 2024-10-08T17:41:54.542Z
Learnt from: johanneskoester
Repo: snakemake/snakemake-wrappers PR: 3123
File: utils/datavzrd/wrapper.py:31-32
Timestamp: 2024-10-08T17:41:54.542Z
Learning: The `snakemake` variable is inserted via a preamble during execution in `wrapper.py` scripts, so it doesn't need to be explicitly defined.
Applied to files:
bio/picard/markduplicates/wrapper.py
📚 Learning: 2024-11-26T08:30:23.818Z
Learnt from: tdayris
Repo: snakemake/snakemake-wrappers PR: 3496
File: bio/mtnucratio/wrapper.py:27-28
Timestamp: 2024-11-26T08:30:23.818Z
Learning: In Snakemake wrappers (e.g., `wrapper.py` files), it's unnecessary to verify the availability of tools like `mtnucratio` within the code, because Snakemake with Conda ensures that the required tools are installed and available.
Applied to files:
bio/picard/markduplicates/wrapper.py
🪛 Ruff (0.14.8)
bio/picard/markduplicates/wrapper.py
20-20: Undefined name snakemake
(F821)
20-20: Undefined name snakemake
(F821)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: testing
- GitHub Check: docs
- GitHub Check: Summary
🔇 Additional comments (1)
bio/picard/markduplicates/wrapper.py (1)
19-21: [rewritten comment]
[classification tag]
fxwiegand
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also update the docs with the additional parameters:
snakemake-wrappers/bio/picard/markduplicates/meta.yaml
Lines 13 to 17 in 21cbbdb
| params: | |
| - java_opts: allows for additional arguments to be passed to the java compiler, e.g. "-XX:ParallelGCThreads=10" (not for `-XmX` or `-Djava.io.tmpdir`, since they are handled automatically). | |
| - extra: allows for additional program arguments. | |
| - embed_ref: allows to embed the fasta reference into the cram | |
| - withmatecigar: allows to run `MarkDuplicatesWithMateCigar <https://broadinstitute.github.io/picard/command-line-overview.html#MarkDuplicatesWithMateCigar>`_ instead. |
This allows to pass samtools parameters to the wrapper allowing to e.g. specify the cram version created by samtools.
Doing so might be necessary as some tools like GATK and Picard do not yet support cram 3.1 as input files.
QC
snakemake-wrappers.While the contributions guidelines are more extensive, please particularly ensure that:
test.pywas updated to call any added or updated example rules in aSnakefileinput:andoutput:file paths in the rules can be chosen arbitrarilyinput:oroutput:)tempfile.gettempdir()points tometa.yamlcontains a link to the documentation of the respective tool or command underurl:Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.