Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
b89759c
feat/add_pytrf: add basic working code without complete wrapper just …
rohan-ibn-tariq Nov 26, 2025
f01dfd2
feat/add_pytrf: delete unified wrapper approach
rohan-ibn-tariq Nov 26, 2025
4e1e6bb
feat/add_pytrf: add findstr basic wrapper
rohan-ibn-tariq Nov 26, 2025
70294eb
feat/add_pytrf: add findgtr basic wrapper
rohan-ibn-tariq Nov 27, 2025
c2260fa
feat/add_pytrf: add in meta discalaimer note
rohan-ibn-tariq Nov 27, 2025
cfefde7
feat/add_pytrf: fix output docs
rohan-ibn-tariq Nov 27, 2025
ba3ca6f
feat/add_pytrf: add end line
rohan-ibn-tariq Nov 27, 2025
dcbefdf
feat/add_pytrf: add pytrf subcommand findatr
rohan-ibn-tariq Nov 27, 2025
9daf5f1
feat/add_pytrf: black fmt for test_wrappers.py and basic pytrf tests …
rohan-ibn-tariq Nov 27, 2025
e4b5aa0
feat/add_pytrf: update with expected results test info and doc-comments
rohan-ibn-tariq Nov 27, 2025
4c87765
feat/add_pytrf: update with expected results test info and defaults test
rohan-ibn-tariq Nov 27, 2025
6e357ed
feat/add_pytrf: finalize findgtr with expected results very basic min…
rohan-ibn-tariq Nov 27, 2025
fd0b7f4
feat/add_pytrf: add comparison for findgtr minimal
rohan-ibn-tariq Nov 27, 2025
0dd30f6
feat/add_pytrf: refactor doc
rohan-ibn-tariq Nov 27, 2025
0fe268f
feat/add_pytrf: refactor doc
rohan-ibn-tariq Nov 27, 2025
86adfd6
feat/add_pytrf: add expected test for findatr + doc refactor
rohan-ibn-tariq Nov 27, 2025
653e9c5
feat/add_pytrf: remove python pins not required
rohan-ibn-tariq Nov 27, 2025
5b505fe
feat/add_pytrf: fix extract test
rohan-ibn-tariq Nov 27, 2025
55b31be
feat/add_pytrf: fix url and add additional note
rohan-ibn-tariq Nov 27, 2025
6cc90bd
feat/add_pytrf: black fmt wrapper.py
rohan-ibn-tariq Nov 27, 2025
8c2eb7d
feat/add_pytrf: snakefile fmt findatr findstr
rohan-ibn-tariq Nov 27, 2025
59e569f
feat/add_pytrf: pylint fixes for pytrf findstr
rohan-ibn-tariq Nov 27, 2025
8bbe890
feat/add_pytrf: pylint fixes for pytrf findgtr
rohan-ibn-tariq Nov 27, 2025
b29a28a
feat/add_pytrf: pylint fixes for pytrf findatr
rohan-ibn-tariq Nov 27, 2025
19530be
feat/add_pytrf: add extract but test failing
rohan-ibn-tariq Nov 27, 2025
718846c
feat/add_pytrf: add extract command issue in pytest skip and meta.yaml
rohan-ibn-tariq Nov 27, 2025
7b43c71
feat/add_pytrf: refactor meta.yaml's of 4 commands
rohan-ibn-tariq Nov 27, 2025
04b9263
feat/add_pytrf: refactor meta.yaml for findatr
rohan-ibn-tariq Nov 27, 2025
75421c4
feat/add_pytrf: pin envoirnments for four subcommands
rohan-ibn-tariq Nov 27, 2025
e93f085
feat/add-pytrf: merge branch master
rohan-ibn-tariq Nov 27, 2025
c0f2aaf
feat/add-pytrf: refactor meta.yaml
rohan-ibn-tariq Nov 27, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 37 additions & 0 deletions bio/pytrf/extract/environment.linux-64.pin.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# This file may be used to create an environment using:
# $ conda create --name <env> --file <this file>
# platform: linux-64
# created-by: conda 25.11.0
@EXPLICIT
https://conda.anaconda.org/conda-forge/linux-64/_libgcc_mutex-0.1-conda_forge.tar.bz2#d7c89558ba9fa0495403155b64376d81
https://conda.anaconda.org/conda-forge/noarch/ca-certificates-2025.11.12-hbd8a1cb_0.conda#f0991f0f84902f6b6009b4d2350a83aa
https://conda.anaconda.org/conda-forge/linux-64/libgomp-15.2.0-he0feb66_14.conda#91349c276f84f590487e4c7f6e90e077
https://conda.anaconda.org/conda-forge/noarch/python_abi-3.12-8_cp312.conda#c3efd25ac4d74b1584d2f7a57195ddf1
https://conda.anaconda.org/conda-forge/noarch/tzdata-2025b-h78e105d_0.conda#4222072737ccff51314b5ece9c7d6f5a
https://conda.anaconda.org/conda-forge/linux-64/_openmp_mutex-4.5-2_gnu.tar.bz2#73aaf86a425cc6e73fcf236a5a46396d
https://conda.anaconda.org/conda-forge/linux-64/libgcc-15.2.0-he0feb66_14.conda#550dceb769d23bcf0e2f97fd4062d720
https://conda.anaconda.org/conda-forge/linux-64/bzip2-1.0.8-hda65f42_8.conda#51a19bba1b8ebfb60df25cde030b7ebc
https://conda.anaconda.org/conda-forge/linux-64/libexpat-2.7.3-hecca717_0.conda#8b09ae86839581147ef2e5c5e229d164
https://conda.anaconda.org/conda-forge/linux-64/libffi-3.5.2-h9ec8514_0.conda#35f29eec58405aaf55e01cb470d8c26a
https://conda.anaconda.org/conda-forge/linux-64/libgcc-ng-15.2.0-h69a702a_14.conda#6c13aaae36d7514f28bd5544da1a7bb8
https://conda.anaconda.org/conda-forge/linux-64/liblzma-5.8.1-hb9d3cd8_2.conda#1a580f7796c7bf6393fddb8bbbde58dc
https://conda.anaconda.org/conda-forge/linux-64/libnsl-2.0.1-hb9d3cd8_1.conda#d864d34357c3b65a4b731f78c0801dc4
https://conda.anaconda.org/conda-forge/linux-64/libstdcxx-15.2.0-h934c35e_14.conda#8e96fe9b17d5871b5cf9d312cab832f6
https://conda.anaconda.org/conda-forge/linux-64/libuuid-2.41.2-he9a06e4_0.conda#80c07c68d2f6870250959dcc95b209d1
https://conda.anaconda.org/conda-forge/linux-64/libzlib-1.3.1-hb9d3cd8_2.conda#edb0dca6bc32e4f4789199455a1dbeb8
https://conda.anaconda.org/conda-forge/linux-64/ncurses-6.5-h2d0b736_3.conda#47e340acb35de30501a76c7c799c41d7
https://conda.anaconda.org/conda-forge/linux-64/openssl-3.6.0-h26f9b46_0.conda#9ee58d5c534af06558933af3c845a780
https://conda.anaconda.org/conda-forge/linux-64/libstdcxx-ng-15.2.0-hdf11a46_14.conda#9531f671a13eec0597941fa19e489b96
https://conda.anaconda.org/conda-forge/linux-64/libxcrypt-4.4.36-hd590300_1.conda#5aa797f8787fe7a17d1b0821485b5adc
https://conda.anaconda.org/conda-forge/linux-64/readline-8.2-h8c095d6_2.conda#283b96675859b20a825f8fa30f311446
https://conda.anaconda.org/conda-forge/linux-64/tk-8.6.13-noxft_ha0e22de_103.conda#86bc20552bf46075e3d92b67f089172d
https://conda.anaconda.org/conda-forge/linux-64/zstd-1.5.7-hb8e6e7a_2.conda#6432cb5d4ac0046c3ac0a8a0f95842f9
https://conda.anaconda.org/conda-forge/linux-64/icu-75.1-he02047a_0.conda#8b189310083baabfb622af68fd9d3ae3
https://conda.anaconda.org/conda-forge/linux-64/ld_impl_linux-64-2.45-default_hbd61a6d_104.conda#a6abd2796fc332536735f68ba23f7901
https://conda.anaconda.org/conda-forge/linux-64/libsqlite-3.51.0-hee844dc_0.conda#729a572a3ebb8c43933b30edcc628ceb
https://conda.anaconda.org/conda-forge/linux-64/python-3.12.12-hd63d673_1_cpython.conda#5c00c8cea14ee8d02941cab9121dce41
https://conda.anaconda.org/bioconda/linux-64/pyfastx-2.2.0-py312h4711d71_1.tar.bz2#0c029565f5abbf1c3349a4abc0b4c63c
https://conda.anaconda.org/bioconda/linux-64/pytrf-1.4.2-py312h0fa9677_0.tar.bz2#11c47fcb88ad7fe0ab94dcf11b8bebb9
https://conda.anaconda.org/conda-forge/noarch/setuptools-80.9.0-pyhff2d567_0.conda#4de79c071274a53dcaf2a8c749d1499e
https://conda.anaconda.org/conda-forge/noarch/wheel-0.45.1-pyhd8ed1ab_1.conda#75cb7132eb58d97896e173ef12ac9986
https://conda.anaconda.org/conda-forge/noarch/pip-25.3-pyh8b19718_0.conda#c55515ca43c6444d2572e0f0d93cb6b9
8 changes: 8 additions & 0 deletions bio/pytrf/extract/environment.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
channels:
- conda-forge
- bioconda
- nodefaults

dependencies:
- pytrf =1.4
- pyfastx =2.2
28 changes: 28 additions & 0 deletions bio/pytrf/extract/meta.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
name: pytrf extract (NOT WORKING, see notes below)
description: >
Extract tandem repeat sequences with flanking regions from DNA sequences.
Requires output from pytrf findstr, findgtr, or findatr as input.
url: https://pytrf.readthedocs.io/en/latest/usage.html#commandline-interface
authors:
- Muhammad Rohan Ali Asmat
input:
- FASTA or FASTQ file (supports gzip compression)
output:
- Output file (default -> stdout, will be redirected to the log file).
params:
repeat_file: >
**Required.** Path to TSV or CSV file from pytrf findstr/findgtr/findatr.
out_format: >
Output format. Options: 'tsv' (default), 'csv', or 'fasta'.
Note: Only extract command supports FASTA output.
flank_length: >
Length of flanking sequence (default: 100).
notes: >
**Bioconda package:** https://bioconda.github.io/recipes/pytrf/README.html |nl|
**GitHub repository:** https://github.com/lmdu/pytrf |nl|
**License:** MIT License |nl|
**Disclaimer:** This is a minimal implementation supporting basic functionality.
pytrf is not a Python binding to TRF - it's an independent tool. |nl|
**Known issue:** PyTRF 1.4.2 has a bug in the `extract` command (delimiter error). |nl|
See: https://github.com/lmdu/pytrf/issues/6 |nl|
This wrapper skips extract tests until upstream patch is released.
17 changes: 17 additions & 0 deletions bio/pytrf/extract/test/Snakefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# SAMPLE RULE: Extract tandem repeat sequences with flanking regions
# The pytrf extract wrapper requires output from findstr, findgtr, or findatr.
#
# Output:
# - If output file is specified, results are written to that file
# - If output is omitted, pytrf writes to stdout (redirected to log file)
rule pytrf_extract:
input:
"demo_data/{sample}.fasta",
output:
"results/{sample}_extract.tsv",
params:
repeat_file="demo_data/{sample}.tsv",
log:
"logs/{sample}.log",
wrapper:
"master/bio/pytrf/extract"
6 changes: 6 additions & 0 deletions bio/pytrf/extract/test/demo_data/small_test.fasta
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
>seq1
TCATCGGTCATCGGTCATCGGTCATCGGTCATCGG
>seq2
ACCCCTCAGGGTACCCCTCAGGGTACCCCTCAGGGTACCCCTCAGGGTACCCCTCAGGGTACCCCTCAGGGTACCCCTCAGGGT
>seq3
TGACTATATCCGCAAATGAAGGCTGTTCTCTGACATGACTATATCCGCAAATGAAGGCTGTTCTCTGACATGACTATATCCGCAAATGAAGGCTGTTCTCTGACATGACTATATCCGCAAATGAAGGCTGTTCTCTGACA
85 changes: 85 additions & 0 deletions bio/pytrf/extract/test/demo_data/small_test.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
seq1 1 3 TCA 3 1 3
seq1 4 6 TCG 3 1 3
seq1 7 9 GTC 3 1 3
seq1 10 12 ATC 3 1 3
seq1 13 15 GGT 3 1 3
seq1 16 18 CAT 3 1 3
seq1 19 21 CGG 3 1 3
seq1 22 24 TCA 3 1 3
seq1 25 27 TCG 3 1 3
seq1 28 30 GTC 3 1 3
seq1 31 33 ATC 3 1 3
seq1 34 36 GG 3 1 3
seq2 1 3 ACC 3 1 3
seq2 4 6 CCT 3 1 3
seq2 7 9 CAG 3 1 3
seq2 10 12 GGT 3 1 3
seq2 13 15 ACC 3 1 3
seq2 16 18 CCT 3 1 3
seq2 19 21 CAG 3 1 3
seq2 22 24 GGT 3 1 3
seq2 25 27 ACC 3 1 3
seq2 28 30 CCT 3 1 3
seq2 31 33 CAG 3 1 3
seq2 34 36 GGT 3 1 3
seq2 37 39 ACC 3 1 3
seq2 40 42 CCT 3 1 3
seq2 43 45 CAG 3 1 3
seq2 46 48 GGT 3 1 3
seq2 49 51 ACC 3 1 3
seq2 52 54 CCT 3 1 3
seq2 55 57 CAG 3 1 3
seq2 58 60 GGT 3 1 3
seq2 61 63 ACC 3 1 3
seq2 64 66 CCT 3 1 3
seq2 67 69 CAG 3 1 3
seq2 70 72 GGT 3 1 3
seq2 73 75 ACC 3 1 3
seq2 76 78 CCT 3 1 3
seq2 79 81 CAG 3 1 3
seq2 82 84 GGT 3 1 3
seq3 1 3 TGA 3 1 3
seq3 4 6 CTA 3 1 3
seq3 7 9 TAT 3 1 3
seq3 10 12 CCG 3 1 3
seq3 13 15 CAA 3 1 3
seq3 16 18 ATG 3 1 3
seq3 19 21 AAG 3 1 3
seq3 22 24 GCT 3 1 3
seq3 25 27 GTT 3 1 3
seq3 28 31 CT 2 2 4
seq3 32 34 GAC 3 1 3
seq3 35 37 ATG 3 1 3
seq3 38 40 ACT 3 1 3
seq3 41 44 AT 2 2 4
seq3 45 47 CCG 3 1 3
seq3 48 50 CAA 3 1 3
seq3 51 53 ATG 3 1 3
seq3 54 56 AAG 3 1 3
seq3 57 59 GCT 3 1 3
seq3 60 62 GTT 3 1 3
seq3 63 66 CT 2 2 4
seq3 67 69 GAC 3 1 3
seq3 70 72 ATG 3 1 3
seq3 73 75 ACT 3 1 3
seq3 76 79 AT 2 2 4
seq3 80 82 CCG 3 1 3
seq3 83 85 CAA 3 1 3
seq3 86 88 ATG 3 1 3
seq3 89 91 AAG 3 1 3
seq3 92 94 GCT 3 1 3
seq3 95 97 GTT 3 1 3
seq3 98 101 CT 2 2 4
seq3 102 104 GAC 3 1 3
seq3 105 107 ATG 3 1 3
seq3 108 110 ACT 3 1 3
seq3 111 114 AT 2 2 4
seq3 115 117 CCG 3 1 3
seq3 118 120 CAA 3 1 3
seq3 121 123 ATG 3 1 3
seq3 124 126 AAG 3 1 3
seq3 127 129 GCT 3 1 3
seq3 130 132 GTT 3 1 3
seq3 133 136 CT 2 2 4
seq3 137 139 GAC 3 1 3
seq3 140 142 A 3 1 3
Empty file.
55 changes: 55 additions & 0 deletions bio/pytrf/extract/wrapper.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
"""
Snakemake Wrapper for PyTRF extract
------------------------------------------------------
Extract tandem repeat sequences with flanking regions.
"""

from pathlib import Path
from snakemake.shell import shell

# Logging
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# Get input file
try:
input_file = Path(snakemake.input[0]).resolve()
except (IndexError, TypeError) as e:
raise ValueError(f"Input specification error: {e}") from e

# Get output file if specified
OUTPUT_FILE = None
if snakemake.output:
OUTPUT_FILE = Path(snakemake.output[0]).resolve()

# Get repeat_file (required)
try:
if not hasattr(snakemake.params, "repeat_file"):
raise ValueError("Parameter 'repeat_file' is required for extract")
repeat_file = Path(snakemake.params.repeat_file).resolve()
except (AttributeError, ValueError) as e:
raise RuntimeError(f"Parameter validation failed: {e}") from e

# Build parameters
params = [f"-r {repeat_file}"]

try:
if hasattr(snakemake.params, "out_format"):
params.append(f"-f {snakemake.params.out_format}")

if hasattr(snakemake.params, "flank_length"):
params.append(f"-l {snakemake.params.flank_length}")
Comment on lines +36 to +40
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these parameters mandatory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flank length and output format are not mandatory

except (AttributeError, ValueError) as e:
raise RuntimeError(f"Parameter processing failed: {e}") from e

# Build command
CMD = f"pytrf extract {input_file}"
if params:
CMD += " " + " ".join(params)
if OUTPUT_FILE:
CMD += f" -o {OUTPUT_FILE}"

# Execute
try:
shell(f"{CMD} {log}")
except Exception as e:
raise RuntimeError(f"PyTRF extract execution failed: {e}") from e
37 changes: 37 additions & 0 deletions bio/pytrf/findatr/environment.linux-64.pin.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# This file may be used to create an environment using:
# $ conda create --name <env> --file <this file>
# platform: linux-64
# created-by: conda 25.11.0
@EXPLICIT
https://conda.anaconda.org/conda-forge/linux-64/_libgcc_mutex-0.1-conda_forge.tar.bz2#d7c89558ba9fa0495403155b64376d81
https://conda.anaconda.org/conda-forge/noarch/ca-certificates-2025.11.12-hbd8a1cb_0.conda#f0991f0f84902f6b6009b4d2350a83aa
https://conda.anaconda.org/conda-forge/linux-64/libgomp-15.2.0-he0feb66_14.conda#91349c276f84f590487e4c7f6e90e077
https://conda.anaconda.org/conda-forge/noarch/python_abi-3.12-8_cp312.conda#c3efd25ac4d74b1584d2f7a57195ddf1
https://conda.anaconda.org/conda-forge/noarch/tzdata-2025b-h78e105d_0.conda#4222072737ccff51314b5ece9c7d6f5a
https://conda.anaconda.org/conda-forge/linux-64/_openmp_mutex-4.5-2_gnu.tar.bz2#73aaf86a425cc6e73fcf236a5a46396d
https://conda.anaconda.org/conda-forge/linux-64/libgcc-15.2.0-he0feb66_14.conda#550dceb769d23bcf0e2f97fd4062d720
https://conda.anaconda.org/conda-forge/linux-64/bzip2-1.0.8-hda65f42_8.conda#51a19bba1b8ebfb60df25cde030b7ebc
https://conda.anaconda.org/conda-forge/linux-64/libexpat-2.7.3-hecca717_0.conda#8b09ae86839581147ef2e5c5e229d164
https://conda.anaconda.org/conda-forge/linux-64/libffi-3.5.2-h9ec8514_0.conda#35f29eec58405aaf55e01cb470d8c26a
https://conda.anaconda.org/conda-forge/linux-64/libgcc-ng-15.2.0-h69a702a_14.conda#6c13aaae36d7514f28bd5544da1a7bb8
https://conda.anaconda.org/conda-forge/linux-64/liblzma-5.8.1-hb9d3cd8_2.conda#1a580f7796c7bf6393fddb8bbbde58dc
https://conda.anaconda.org/conda-forge/linux-64/libnsl-2.0.1-hb9d3cd8_1.conda#d864d34357c3b65a4b731f78c0801dc4
https://conda.anaconda.org/conda-forge/linux-64/libstdcxx-15.2.0-h934c35e_14.conda#8e96fe9b17d5871b5cf9d312cab832f6
https://conda.anaconda.org/conda-forge/linux-64/libuuid-2.41.2-he9a06e4_0.conda#80c07c68d2f6870250959dcc95b209d1
https://conda.anaconda.org/conda-forge/linux-64/libzlib-1.3.1-hb9d3cd8_2.conda#edb0dca6bc32e4f4789199455a1dbeb8
https://conda.anaconda.org/conda-forge/linux-64/ncurses-6.5-h2d0b736_3.conda#47e340acb35de30501a76c7c799c41d7
https://conda.anaconda.org/conda-forge/linux-64/openssl-3.6.0-h26f9b46_0.conda#9ee58d5c534af06558933af3c845a780
https://conda.anaconda.org/conda-forge/linux-64/libstdcxx-ng-15.2.0-hdf11a46_14.conda#9531f671a13eec0597941fa19e489b96
https://conda.anaconda.org/conda-forge/linux-64/libxcrypt-4.4.36-hd590300_1.conda#5aa797f8787fe7a17d1b0821485b5adc
https://conda.anaconda.org/conda-forge/linux-64/readline-8.2-h8c095d6_2.conda#283b96675859b20a825f8fa30f311446
https://conda.anaconda.org/conda-forge/linux-64/tk-8.6.13-noxft_ha0e22de_103.conda#86bc20552bf46075e3d92b67f089172d
https://conda.anaconda.org/conda-forge/linux-64/zstd-1.5.7-hb8e6e7a_2.conda#6432cb5d4ac0046c3ac0a8a0f95842f9
https://conda.anaconda.org/conda-forge/linux-64/icu-75.1-he02047a_0.conda#8b189310083baabfb622af68fd9d3ae3
https://conda.anaconda.org/conda-forge/linux-64/ld_impl_linux-64-2.45-default_hbd61a6d_104.conda#a6abd2796fc332536735f68ba23f7901
https://conda.anaconda.org/conda-forge/linux-64/libsqlite-3.51.0-hee844dc_0.conda#729a572a3ebb8c43933b30edcc628ceb
https://conda.anaconda.org/conda-forge/linux-64/python-3.12.12-hd63d673_1_cpython.conda#5c00c8cea14ee8d02941cab9121dce41
https://conda.anaconda.org/bioconda/linux-64/pyfastx-2.2.0-py312h4711d71_1.tar.bz2#0c029565f5abbf1c3349a4abc0b4c63c
https://conda.anaconda.org/bioconda/linux-64/pytrf-1.4.2-py312h0fa9677_0.tar.bz2#11c47fcb88ad7fe0ab94dcf11b8bebb9
https://conda.anaconda.org/conda-forge/noarch/setuptools-80.9.0-pyhff2d567_0.conda#4de79c071274a53dcaf2a8c749d1499e
https://conda.anaconda.org/conda-forge/noarch/wheel-0.45.1-pyhd8ed1ab_1.conda#75cb7132eb58d97896e173ef12ac9986
https://conda.anaconda.org/conda-forge/noarch/pip-25.3-pyh8b19718_0.conda#c55515ca43c6444d2572e0f0d93cb6b9
8 changes: 8 additions & 0 deletions bio/pytrf/findatr/environment.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
channels:
- conda-forge
- bioconda
- nodefaults

dependencies:
- pytrf =1.4
- pyfastx =2.2
49 changes: 49 additions & 0 deletions bio/pytrf/findatr/meta.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
name: pytrf findatr
description: >
Find approximate/imperfect tandem repeats from DNA sequences.
url: https://pytrf.readthedocs.io/en/latest/usage.html#commandline-interface
authors:
- Muhammad Rohan Ali Asmat
input:
- FASTA or FASTQ file (supports gzip compression)
output:
- Output file (default -> stdout, will be redirected to the log file).
params:
out_format: >
Output format. Options: 'tsv' (default), 'csv', 'bed', or 'gff'.
min_motif: >
Minimum motif size in bp (default: 1).
max_motif: >
Maximum motif size in bp (default: 6).
min_seedrep: >
Minimum repeat number for seed (default: 3).
min_seedlen: >
Minimum length for seed (default: 10).
max_errors: >
Maximum number of continuous alignment errors (default: 3).
min_identity: >
Minimum identity for extending, 0 to 100 (default: 70).
max_extend: >
Maximum length allowed to extend (default: 2000).
notes: >
**Output columns (TSV/CSV/BED/GFF):** sequence or chromosome name, start position,
end position, motif sequence, motif length, repeat number, repeat length, seed start
position, seed end position, seed repeat number, seed length, number of matches,
number of substitutions, number of insertions, number of deletions, extend alignment
identity between imperfect repeat and its perfect counterpart.
|nl| |nl|
**Example:** |nl|
Example row in record: 0 1 32 T 1 32.0 32 1 1 1 1 10 22 0 0 31.25 |nl|
This indicates that in sequence '0', from position 1 to 32, there is a tandem repeat
with motif 'T' (length 1) repeated 32 times, resulting in a repeat length of 32 bp.
The seed repeat started at position 1 and ended at position 1, with a seed repeat number of 1
and seed length of 1 bp. The alignment of the imperfect repeat to its perfect counterpart
has 10 matches, 22 substitutions, 0 insertions, and 0 deletions, yielding an identity of 31.25%.
|nl|
|nl|
**Bioconda package:** https://bioconda.github.io/recipes/pytrf/README.html |nl|
**GitHub repository:** https://github.com/lmdu/pytrf |nl|
**License:** MIT License |nl|
**Disclaimer:** This is a minimal implementation supporting basic functionality.
pytrf is not a Python binding to TRF - it's an independent tool.

20 changes: 20 additions & 0 deletions bio/pytrf/findatr/test/Snakefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# SAMPLE RULE: Find approximate/imperfect tandem repeats
#
# Output:
# - If output file is specified, results are written to that file
# - If output is omitted, PyTRF writes to stdout (redirected to log file)
#
# This example searches for approximate repeats with motif sizes between 3-10 bp,
# allowing detection of imperfect short/medium tandem repeats with mismatches.
rule pytrf_findatr:
input:
"demo_data/{sample}.fasta",
output:
"results/{sample}.tsv",
log:
"logs/{sample}.log",
params:
min_motif=3,
max_motif=10,
wrapper:
"master/bio/pytrf/findatr"
2 changes: 2 additions & 0 deletions bio/pytrf/findatr/test/demo_data/small_test.fasta
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
>seq1
TCATCGGTCATCGGTCATCGGTCATCGGTCATCGG
1 change: 1 addition & 0 deletions bio/pytrf/findatr/test/expected/findatr_basic.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
seq1 1 35 TCATCGG 7 5.0 35 1 35 5 35 35 0 0 0 100.0
Loading
Loading