-
Notifications
You must be signed in to change notification settings - Fork 504
add wrapper for telogator2 #7481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 8 commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
cc172e9
add wrapper for telogator2
Smeds 1386697
handle review comments and fix failing tests
Smeds cd552a1
add wrapper for telogator2 make ref command
Smeds 8c49b59
add wrapper for telogator2
Smeds 6f2f9ec
minor fixes
Smeds 07c6434
set correct format for kmer file
Smeds 1df89dd
handle review comments
Smeds 1b1fada
Apply suggestions from code review
Smeds c2780e4
Apply suggestions from code review
bernt-matthias b016e91
Update tools/telogator/telogator.xml
bernt-matthias File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,15 @@ | ||
| categories: | ||
| - Sequence Analysis | ||
| description: Measure allele-specific telomere length from long-read sequencing data | ||
| long_description: | | ||
| Telogator measures allele-specific telomere length (ATL) and characterizes telomere | ||
| variant repeat (TVR) sequences from PacBio HiFi and Oxford Nanopore long-read sequencing | ||
| data. The tool identifies individual telomere alleles through TVR characterization, | ||
| providing chromosome-level resolution of telomere lengths. Supports multiple input | ||
| formats (FASTA, FASTQ, BAM, CRAM) and includes built-in T2T human reference with | ||
| support for custom references. | ||
| homepage_url: https://github.com/zstephens/telogator2 | ||
| name: telogator | ||
| owner: iuc | ||
| remote_repository_url: https://github.com/galaxyproject/tools-iuc/tree/main/tools/telogator2 | ||
| type: unrestricted |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,37 @@ | ||
| <macros> | ||
| <xml name="requirements"> | ||
| <requirements> | ||
| <requirement type="package" version="@VERSION@">telogator2</requirement> | ||
| <requirement type="package" version="2.28">minimap2</requirement> | ||
| <requirement type="package" version="2.03">winnowmap</requirement> | ||
| <requirement type="package" version="1.13.1">pbmm2</requirement> | ||
| <yield/> | ||
| </requirements> | ||
| </xml> | ||
| <xml name="version_command"> | ||
| <version_command><![CDATA[telogator2 --version]]></version_command> | ||
| </xml> | ||
| <token name="@VERSION@">2.2.3</token> | ||
| <token name="@PROFILE@">24.2</token> | ||
| <xml name="edam_ontology"> | ||
| <edam_topics> | ||
| <edam_topic>topic_0622</edam_topic> | ||
| <edam_topic>topic_0196</edam_topic> | ||
| <edam_topic>topic_3673</edam_topic> | ||
| </edam_topics> | ||
| <edam_operations> | ||
| <edam_operation>operation_3227</edam_operation> | ||
| <edam_operation>operation_3192</edam_operation> | ||
| </edam_operations> | ||
| </xml> | ||
| <xml name="xrefs"> | ||
| <xrefs> | ||
| <xref type="bio.tools">telogator2</xref> | ||
Smeds marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| </xrefs> | ||
| </xml> | ||
| <xml name="citations"> | ||
| <citations> | ||
| <citation type="doi">10.1186/s12859-024-05807-5</citation> | ||
| </citations> | ||
| </xml> | ||
| </macros> | ||
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,187 @@ | ||
| <tool id="telogator_make_ref" name="Telogator Make Reference" version="@VERSION@+galaxy0" profile="@PROFILE@" license="MIT"> | ||
bernt-matthias marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| <description>Create custom telogator reference from a T2T assembly</description> | ||
| <macros> | ||
| <import>macros.xml</import> | ||
| </macros> | ||
| <expand macro="edam_ontology"/> | ||
| <expand macro="xrefs"/> | ||
| <expand macro="requirements"/> | ||
| <expand macro="version_command"/> | ||
| <command detect_errors="exit_code"><![CDATA[ | ||
| #import re | ||
| #set $identifier = str($input_fasta.element_identifier) | ||
| #set $safe_name = re.sub('[^\w\-\.]', '_', $identifier) | ||
| #if $input_fasta.is_of_type('fasta.gz') and not ($safe_name.endswith('.fa.gz') or $safe_name.endswith('.fasta.gz')) | ||
| #set $safe_name = $safe_name + '.fa.gz' | ||
| #elif $input_fasta.is_of_type('fasta') and not ($safe_name.endswith('.fa') or $safe_name.endswith('.fasta')) | ||
| #set $safe_name = $safe_name + '.fa' | ||
| #end if | ||
| mkdir -p output_dir && | ||
| ln -sf '${input_fasta}' '${safe_name}' && | ||
| make_telogator_ref | ||
| -i '${safe_name}' | ||
| -o output_dir/output_ref.fa | ||
| -s '${sample_name}' | ||
| -c '${contig_list}' | ||
| ## Optional kmer file | ||
| #if $kmer_file | ||
| -k '${kmer_file}' | ||
| #end if | ||
| ## Minimum telomere length | ||
| -m '${min_tel_length}' | ||
| ## Optional flags | ||
| ${add_tel} | ||
| ${plot} | ||
| ## Move outputs | ||
| && mv output_dir/output_ref.fa '${output_fasta}' | ||
| ]]></command> | ||
| <inputs> | ||
| <param name="input_fasta" type="data" format="fasta,fasta.gz" label="Input T2T reference FASTA" help="Telomere-to-telomere reference genome assembly in FASTA format (gzipped supported)"/> | ||
| <param name="sample_name" argument="-s" type="text" value="sample" label="Sample name" help="Sample name to prepend to contig identifiers in the output"> | ||
| <validator type="regex" message="Sample name must contain only alphanumeric characters and hyphens">^[a-zA-Z0-9-]+$</validator> | ||
| </param> | ||
| <param name="contig_list" argument="-c" type="text" value="chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22,chrX,chrY" label="List of contigs" help="Comma-delimited list of contigs to include. Default is all human chromosomes."> | ||
| <validator type="empty_field"/> | ||
| <sanitizer> | ||
| <valid initial="string.printable"> | ||
| <remove value="""/> | ||
| </valid> | ||
| </sanitizer> | ||
| </param> | ||
| <param name="kmer_file" argument="-k" type="data" format="tsv" optional="true" value="" label="Telomere kmers file" help="Optional telomere k-mers file. If omitted, a built-in human telomere k-mers file is used."/> | ||
| <param name="min_tel_length" argument="-m" type="integer" value="0" min="0" label="Minimum telomere length" help="Minimum telomere length required at contig ends (in base pairs)"/> | ||
| <param name="add_tel" type="boolean" truevalue="--add-tel" falsevalue="" checked="false" label="Include masked telomeres" help="Include masked telomeres as separate contigs in the output"/> | ||
| <param name="plot" type="boolean" truevalue="--plot" falsevalue="" checked="false" label="Generate telomere signal plots" help="Generate PNG plots showing telomere signals for each chromosome arm"/> | ||
| </inputs> | ||
| <outputs> | ||
| <data name="output_fasta" format="fasta" label="${tool.name} on ${on_string}: Reference FASTA"/> | ||
| <collection name="plots" type="list" label="${tool.name} on ${on_string}: Telomere signal plots"> | ||
| <discover_datasets pattern="(?P<designation>.+)\.png$" directory="output_dir" format="png"/> | ||
| <filter>plot</filter> | ||
| </collection> | ||
| </outputs> | ||
| <tests> | ||
| <!-- Test 1: Basic usage with minimal parameters --> | ||
| <test expect_num_outputs="1"> | ||
| <param name="input_fasta" value="t2t_subset_with_telomeres.fa.gz"/> | ||
| <param name="sample_name" value="test-sample1"/> | ||
| <param name="contig_list" value="t2t-i002c-mat_chr11p,t2t-i002c-mat_chr11q,t2t-i002c-mat_chr12p,t2t-i002c-mat_chr12q,t2t-i002c-mat_chr13p,t2t-i002c-mat_chr13q"/> | ||
| <output name="output_fasta"> | ||
| <assert_contents> | ||
| <has_text text=">test-sample"/> | ||
| <has_line_matching expression="^>.*"/> | ||
| <has_line_matching expression="^[ACGTN]+$"/> | ||
| <has_size value="6100428" delta="100000"/> | ||
| <not_has_text text=">test-sample1_tel-"/> | ||
| </assert_contents> | ||
| </output> | ||
| </test> | ||
| <!-- Test 2: With plot generation --> | ||
| <test expect_num_outputs="2"> | ||
| <param name="input_fasta" value="t2t_subset_with_telomeres.fa.gz"/> | ||
| <param name="sample_name" value="test-sample2"/> | ||
| <param name="plot" value="true"/> | ||
| <param name="contig_list" value="t2t-i002c-mat_chr11p,t2t-i002c-mat_chr11q,t2t-i002c-mat_chr12p,t2t-i002c-mat_chr12q,t2t-i002c-mat_chr13p,t2t-i002c-mat_chr13q"/> | ||
| <output name="output_fasta"> | ||
| <assert_contents> | ||
| <has_text text=">test-sample2"/> | ||
| </assert_contents> | ||
| </output> | ||
| <output_collection name="plots" type="list"> | ||
| <element name="test-sample2_telsignal_t2t-i002c-mat_chr11pp"> | ||
| <assert_contents> | ||
| <has_size min="10000"/> | ||
| </assert_contents> | ||
| </element> | ||
| <element name="test-sample2_telsignal_t2t-i002c-mat_chr11qq"> | ||
| <assert_contents> | ||
| <has_size min="10000"/> | ||
| </assert_contents> | ||
| </element> | ||
| <element name="test-sample2_telsignal_t2t-i002c-mat_chr12pp"> | ||
| <assert_contents> | ||
| <has_size min="10000"/> | ||
| </assert_contents> | ||
| </element> | ||
| <element name="test-sample2_telsignal_t2t-i002c-mat_chr12qq"> | ||
| <assert_contents> | ||
| <has_size min="10000"/> | ||
| </assert_contents> | ||
| </element> | ||
| <element name="test-sample2_telsignal_t2t-i002c-mat_chr13pp"> | ||
| <assert_contents> | ||
| <has_size min="10000"/> | ||
| </assert_contents> | ||
| </element> | ||
| <element name="test-sample2_telsignal_t2t-i002c-mat_chr13qq"> | ||
| <assert_contents> | ||
| <has_size min="10000"/> | ||
| </assert_contents> | ||
| </element> | ||
| </output_collection> | ||
| </test> | ||
| <!-- Test 3: use telomere parameters --> | ||
| <test expect_num_outputs="1"> | ||
| <param name="input_fasta" value="t2t_subset_with_telomeres.fa.gz" /> | ||
| <param name="sample_name" value="test-sample3"/> | ||
| <param name="min_tel_length" value="1000"/> | ||
| <param name="add_tel" value="true"/> | ||
| <param name="contig_list" value="t2t-i002c-mat_chr11p,t2t-i002c-mat_chr11q,t2t-i002c-mat_chr12p,t2t-i002c-mat_chr12q,t2t-i002c-mat_chr13p,t2t-i002c-mat_chr13q"/> | ||
| <output name="output_fasta"> | ||
| <assert_contents> | ||
| <has_text text=">test-sample3"/> | ||
| <has_line_matching expression="^>.*"/> | ||
| <has_line_matching expression="^[ACGTN]+$"/> | ||
| <has_size value="4066952" delta="100000"/> | ||
| <has_text text=">test-sample3_tel-"/> | ||
| </assert_contents> | ||
| </output> | ||
| </test> | ||
| </tests> | ||
| <help><![CDATA[ | ||
| **What it does** | ||
|
|
||
| Telogator Make Reference creates a custom telogator reference database from a telomere-to-telomere (T2T) reference genome assembly. This tool is essential for analyzing telomeres in non-human organisms or custom genome assemblies. | ||
|
|
||
| The tool performs the following steps: | ||
|
|
||
| 1. Reads the input T2T reference FASTA file | ||
| 2. Identifies telomeric sequences at contig ends | ||
| 3. Optionally filters and remaps contigs | ||
| 4. Creates a processed reference suitable for telogator analysis | ||
| 5. Generates an index file (.fai) for the reference | ||
| 6. Optionally generates visualization plots of telomere signals | ||
|
|
||
| **When to use this tool** | ||
|
|
||
| Use this tool when you need to: | ||
|
|
||
| - Analyze telomeres in non-human organisms (e.g., mouse, maize, other species) | ||
| - Work with custom or newly assembled T2T genomes | ||
| - Create a reference from alternative human T2T assemblies (T2T-yao, T2T-cn1, etc.) | ||
| - Prepare references with specific contig selections or naming conventions | ||
|
|
||
| **Inputs** | ||
|
|
||
| - **T2T reference FASTA**: A telomere-to-telomere reference genome assembly | ||
| - **Sample name**: Identifier prepended to contig names (use organism/assembly name) | ||
| - **Contig list**: Comma-delimited list of contigs to include (defaults to all human chromosomes) | ||
| - **Telomere kmers file** (optional): Custom telomere repeat patterns for non-human organisms | ||
| - **Minimum telomere length**: Filter contigs by minimum telomere length at ends | ||
|
|
||
| **Outputs** | ||
|
|
||
| 1. **Reference FASTA**: Processed telogator reference file ready for use with telogator | ||
| 2. **Reference index (.fai)**: Index file for the created reference FASTA | ||
| 3. **Telomere signal plots** (optional): PNG plots showing telomere signals for each chromosome arm | ||
|
|
||
| **Important Notes** | ||
|
|
||
| - The input FASTA should be a high-quality T2T assembly with telomeres at contig ends | ||
| - The sample name should be descriptive (e.g., organism name, assembly version), may not contain underscores | ||
| - The contig list defaults to human chromosomes; modify it for other organisms or custom assemblies | ||
| - For non-human organisms, provide a telomere kmers file matching the species' telomere repeats | ||
|
|
||
| ]]></help> | ||
| <expand macro="citations"/> | ||
| </tool> | ||
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.