Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
b0244cd
Update all
DLBPointon Feb 6, 2026
dedbff9
Merge branch 'dev' into subworkflow-update
DLBPointon Feb 6, 2026
74b6e81
Update all
DLBPointon Feb 6, 2026
ae31750
Update to ensure matching meta
DLBPointon Feb 10, 2026
2455ec8
fix(functions): fn_get_validated_channel should return a channel
prototaxites Feb 11, 2026
75cfe5a
Adding pairs file for the mapping and snapshot of the mapped reads
DLBPointon Feb 13, 2026
7e8d5cb
Update to add the mapping and snapshot
DLBPointon Feb 13, 2026
8d7a447
Modules included by sanger-tol subworkflow
DLBPointon Feb 13, 2026
f3fd480
Updating the conf for the new subworkflows
DLBPointon Feb 13, 2026
3dfbfff
updates for the new modules
DLBPointon Feb 13, 2026
7765f50
Update on the modules
DLBPointon Feb 13, 2026
3f3cb47
Update Everything!
DLBPointon Feb 16, 2026
8502ce7
Adding modules
DLBPointon Feb 16, 2026
2ccedd5
Adding modules
DLBPointon Feb 16, 2026
641a9ad
Update prettier linting
DLBPointon Feb 16, 2026
8feb38f
Linting
DLBPointon Feb 16, 2026
a4bdcd5
Update for linting
DLBPointon Feb 16, 2026
c172550
Update changelog
DLBPointon Feb 16, 2026
758a5fc
Better params use
DLBPointon Feb 16, 2026
3bc9f2d
Update Changelog
DLBPointon Feb 16, 2026
e27aeb6
Forgot to change 1 params
DLBPointon Feb 16, 2026
b82f53f
Update linting and params and moving before/after text
DLBPointon Feb 16, 2026
5f25939
Update tests
DLBPointon Feb 17, 2026
11ae136
Update tests to remove fake file
DLBPointon Feb 17, 2026
3f56618
Update Changelog and READM
DLBPointon Feb 17, 2026
7864dfe
Update Changelog and README
DLBPointon Feb 17, 2026
850690d
Update Changelog and README
DLBPointon Feb 17, 2026
c8ba9b2
Update modules and test-full
DLBPointon Feb 17, 2026
0e2b908
Update CHANGELOG
DLBPointon Feb 17, 2026
f9d1328
Add logic note
DLBPointon Feb 17, 2026
4741367
Update tests
DLBPointon Feb 19, 2026
058b5ba
Move to sanger-tol modules
DLBPointon Feb 24, 2026
f99e693
Update paths
DLBPointon Feb 24, 2026
456e4a9
Remove and replace images
DLBPointon Feb 24, 2026
ac61fcc
Update files
DLBPointon Feb 24, 2026
5cf8b43
Add python re-write of script
DLBPointon Feb 24, 2026
aad8b19
Move to Sanger-tol repo
DLBPointon Feb 24, 2026
a91674a
Include sanger-tol modules
DLBPointon Feb 24, 2026
136ebe8
Change telomere_windows mem
DLBPointon Feb 24, 2026
0ab27ab
Correcting assignment
DLBPointon Feb 25, 2026
707750e
Forgot to commit the modules.config for new TELOMERE_WINDOWS
DLBPointon Feb 25, 2026
d683584
update naming in test snapshot
DLBPointon Feb 25, 2026
f3e695c
update naming in test snapshot
DLBPointon Feb 25, 2026
2d4ec2f
Replace local/telo_finder with sanger-tol/telo_finder
DLBPointon Feb 26, 2026
aa15733
Adding sanger-tol/telomere_extract
DLBPointon Feb 26, 2026
d0c6a81
Update telomere modules
DLBPointon Feb 26, 2026
8360194
Update to subworkflow to validate output of telo_finder
DLBPointon Feb 26, 2026
593fdbf
Update workflow to move align_cram into if else statement
DLBPointon Feb 26, 2026
e47c132
Update tests for new modules/subworkflows and file outputs
DLBPointon Feb 26, 2026
25a8ea1
change pre_mapped to pre_mapped_bam
DLBPointon Feb 26, 2026
ec543ea
change pre_mapped to pre_mapped_bam and remove cram from required
DLBPointon Feb 26, 2026
b5d6fd8
Update
DLBPointon Feb 26, 2026
5ff9cb0
change pre_mapped to pre_mapped_bam
DLBPointon Feb 26, 2026
dcc014a
Again changed the align_cram section to support mapped_reads
DLBPointon Feb 26, 2026
7b25e61
Added warnings for params and field for mapped reads
DLBPointon Feb 26, 2026
7ecbd16
Add bam to file exclusion
DLBPointon Feb 26, 2026
911e06e
Update image
DLBPointon Feb 26, 2026
5828a83
Update all to remove quotes, support pre_mapped_bam and update conf f…
DLBPointon Feb 26, 2026
b3b765c
Linting!
DLBPointon Feb 26, 2026
751e695
Update logic to kill pipeline if both pre_mapped and cram
DLBPointon Feb 27, 2026
024cfde
Update based on comments
DLBPointon Feb 27, 2026
6a78b7a
Update expected values
DLBPointon Feb 27, 2026
3f13976
Change schema for the mapped bam file input
DLBPointon Feb 27, 2026
d35429f
Fix from PR
DLBPointon Feb 27, 2026
7d281fc
Update comment
DLBPointon Feb 27, 2026
a8eaba9
Add sanger-tol gap_finder
DLBPointon Feb 27, 2026
ea681da
Update accessory file subworkflow to use sanger-tol subworkflow
DLBPointon Feb 27, 2026
45e9761
Update files
DLBPointon Feb 27, 2026
4515561
Update to modules for gap_finder
DLBPointon Feb 27, 2026
96c8cde
Update for gap_finder
DLBPointon Feb 27, 2026
e4196de
REVERT
DLBPointon Feb 27, 2026
aa8f9ec
Revert the revert from the revert to fix the issue from gap_finder
DLBPointon Feb 27, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .nf-core.yml
Original file line number Diff line number Diff line change
Expand Up @@ -48,4 +48,4 @@ template:
- seqera_platform
- multiqc
- rocrate
version: 1.5.1
version: 1.6.0
2 changes: 2 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ repos:
(?x)^(
.*ro-crate-metadata.json$|
modules/nf-core/.*|
modules/sanger-tol/.*|
subworkflows/nf-core/.*|
.*\.snap$
)$
Expand All @@ -22,6 +23,7 @@ repos:
(?x)^(
.*ro-crate-metadata.json$|
modules/nf-core/.*|
modules/sanger-tol/.*|
subworkflows/nf-core/.*|
.*\.snap$
)$
70 changes: 66 additions & 4 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,67 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [[1.5.1]](https://github.com/sanger-tol/curationpretext/releases/tag/1.5.1)] - UNSC Punic (H1) - [2025-10-01]
## [[1.6.0](https://github.com/sanger-tol/curationpretext/releases/tag/1.6.0)] - UNSC Trafalgar - [2025-02-19]

## Added and Fixed

- Template update to 3.5.2.
- - The previous `GENERATE_MAPS` subworkflow has been replaced with `ALIGN_CRAM` and `CREATE_MAPS_{STDRD,HIRES}` (renamed from `CRAM_MAP_ILLUMINA_HIC` and `PAIRS_CREATE_CONTACT_MAPS`, from the [`sanger-tol/nf-core-modules`](https://github.com/sanger-tol/nf-core-modules) repository, respectively)
- Files can now be given explicitly in the `--reads` parameter in the format of `[<file1>, <file2>, ...]`, alternatively it can accept a FOFN (File of file names).
- Files can now be given explicitly in the `--cram` parameter in the format of `[<file1>, <file2>, ...]`, alternatively it can accept a FOFN (File of file names).
- `--pre_mapped_bam` parameter added in order to supply 1 pre-mapped BAM file, in this case `--cram` would be empty.
- Warnings have been added to ensure:
- Only 1 pre-mapped BAM file is provided if `--pre_mapped_bam` is used.
- Only 1 of `--pre_mapped_bam` or `--cram` is used`
- `--cram_chunk_size` parameter added by `ALIGN_CRAM` to make cram chunking configurable, defaulting to 10000.
- `LONGREAD_COVERAGE` subworkflow has been updated to accept an array list of files.
- Major Update to modules coinciding with changes to use Nextflow topics
- Update to move all modules/subworkflows to version topics.
- Required a small change to the template topic collection otherwise it would fail as there is no ch_versions channel.
- Update docs to include the features from the past few releases.
- Remove duplicated `selected_aligner` code from `PIPELINE_INITIALISATION`.
- Change install for `TELOMERE` modules so that we use the `SANGER-TOL` repository rather than local.
- Removed now unused `bin` files.
- Removed and replaced pipeline graph with new version.

### Parameters

| Old Version | New Versions |
| ----------- | ----------------- |
| NA | --pre_mapped |
| NA | --cram_chunk_size |

### Software Dependencies

Note, since the pipeline is using Nextflow DSL2, each process will be run with its own Biocontainer. This means that on occasion it is entirely possible for the pipeline to be using different versions of the same tool. However, the overall software dependency changes compared to the last release have been listed below for reference.

| Module | Old Version | New Versions |
| ------------------------------ | ---------------- | -------------------------------------------------------------- |
| `BEDTOOLS_BAMTOBED` | 2.30.0 | 2.31.1 |
| `BEDTOOLS_GENOMECOV` | 2.30.0 | 2.31.1 |
| `BEDTOOLS_INTERSECT` | 2.30.0 | 2.31.1 |
| `BEDTOOLS_MAKEWINDOWS` | 2.30.0 | 2.31.1 |
| `BEDTOOLS_MAP` | 2.30.0 | 2.31.1 |
| `CRAMALIGN_BWAMEM2ALIGNHIC` | NEW_ADDITION | bwamem2: 2.2.1, samtools: 1.22.1 |
| `GAWK` | 5.2.0 | 5.3.0 |
| `GNU_SORT` | 9.1 | 9.5 |
| `MINIMAP2_ALIGN` | 2.28--he4a0461_0 | 2.29-r1283 |
| `PRETEXTMAP` | 0.1.9 | 0.1.9 (Temporary Patch, to be updated to 0.2.4 once available) |
| `SAMTOOLS_FAIDX` | 1.21.2 | 1.22.1 |
| `SAMTOOLS_MERGE` | 1.21.2 | 1.22.1 |
| `SAMTOOLS_SORT` | 1.21.2 | 1.22.1 |
| `SAMTOOLS_SPLITHEADER` | 1.21.2 | 1.22.1 |
| `SAMTOOLS_VIEW_FILTER_PRIMARY` | 1.21.2 | 1.22.1 |
| `SAMTOOLS_MERGEDUP` | NEW_ADDITION | 1.23.0 |
| `FIND_TELOMERE_WINDOWS` | 1.0.0 | REMOVED |
| `TELOMERE_WINDOWS` | NEW_ADDITION | 1.0.0 |
| `FIND_TELOMERE_REGIONS` | 1.0.0 | REMOVED |
| `TELOMERE_REGIONS` | NEW_ADDITION | 1.0.0 |
| `EXTRACT_TELOMERE` | 1.0.0 | REMOVED |
| `TELOMERE_EXTRACT` | NEW_ADDITION | 1.0.0 |
| `UCSC_BEDGRAPHTOBIGWIG` | 447 | 469 |

## [[1.5.1](https://github.com/sanger-tol/curationpretext/releases/tag/1.5.1)] - UNSC Punic (H1) - [2025-10-01]

### Added and Fixed

Expand Down Expand Up @@ -114,8 +174,10 @@ Note, since the pipeline is using Nextflow DSL2, each process will be run with i
- Updated all modules, versions which are the same indicate that the nf-core modules `.nf` has been updated without updating the tool.
- Update modules and base config files for parity with TreeVal (large genome optimisations).
- Update the PretextGraph version.
- Change how params are used in the pipeline, now passed down from main workflow rather than used when ever needed.
- Removed unnecessary schema file.

### Paramters
### Parameters

| Old Version | New Versions |
| ----------- | ------------- |
Expand All @@ -135,8 +197,8 @@ Note, since the pipeline is using Nextflow DSL2, each process will be run with i
| BAMTOBEDSORT | 2.31.1 + 1.17 | REMOVED |
| TABIX_BGZIPTABIX | 1.20--h5efdd21_2 | REMOVED |
| BWAMEM2_INDEX | 2.2.1 | 2.2.1 (samtools=1.2.1, htslib=1.2.1) |
| SAMTOOLS_FAIDX | 1.2.1 | 1.2.1 |
| SAMTOOLS_VIEW | 1.2.1 | 1.2.1 |
| SAMTOOLS_FAIDX | 1.22.1 | 1.22.1 |
| SAMTOOLS_VIEW | 1.22.1 | 1.22.1 |
| PRETEXT_GRAPH | 0.0.8 | 0.0.9 |

## [[1.3.2](https://github.com/sanger-tol/curationpretext/releases/tag/1.3.2)] - UNSC Pillar-of-Autumn (H2) - [2025-04-05]
Expand Down
26 changes: 21 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,15 @@

This is intended as a supplementary pipeline for the [treeval](https://github.com/sanger-tol/treeval) project. This pipeline can be simply used to generate pretext maps, information on how to run this pipeline can be found in the [usage documentation](https://pipelines.tol.sanger.ac.uk/curationpretext/usage).

![Workflow Diagram](./docs/images/CurationPretext_1_3_0.png)
![Workflow Diagram](./docs/images/CurationPretext-1.6.0.jpeg)

1. Generate Maps - Generates pretext maps as well as a static image.
The above image shows the use of this pipeline inside of the manual curation process and follows the below major steps.

2. Accessory files - Generates the repeat density, gap, telomere, and coverage tracks.
1. CRAM_MAP_ILLUMINA_HIC (ALIGN_CRAM) + PAIRS_CREATE_CONTACT_MAPS (CREATE_MAPS) - Generates pretext maps as well as a static image.

2. ACCESSORY_FILES - Generates the repeat density, gap, telomere, and coverage tracks.

3. PRETEXT_INGEST - Imports the generated tracks into pretext for visualisation.

## Usage

Expand All @@ -44,7 +48,7 @@ Currently, the pipeline uses the following flags:
- The type of longread data you are utilising, e.g., ont, illumina, hifi.

- `--aligner`
- The aligner yopu wish to use for the coverage generation, defaults to bwamem2 but minimap2 is also supported.
- The aligner you wish to use for the coverage generation, defaults to `AUTO` but options include `bwamem2` and `minimap2`.

- `--cram`
- The directory of the cram _and_ cram.crai files, e.g., `/path/to/cram/`
Expand All @@ -61,6 +65,18 @@ Currently, the pipeline uses the following flags:
- `--all_output`
- An option to output all maps + accessory files, the default will only output the pretextmaps where ingestion has occured.

- `--skip_tracks`
- A csv list of accessory tracks to skip, options are: `ALL`, `gap`, `coverage`, `telo`, `repeats`, `NONE`. Default is `NONE`. Please note that capitalization matters.

- `--split_telomere`
- A boolean to also generate the telomere track in 5Prime and 3Prime styles, this is also include the original telomere track.

- `--pre_mapped_bam`
- A boolean option to use `--cram` as input for _A_ pre-mapped bam file.

- `--cram_chunk_size`
- The number of records in a cram file should be chunked into, defaults to 10000.

Now, you can run the pipeline using:

```bash
Expand All @@ -72,7 +88,7 @@ nextflow run sanger-tol/curationpretext \
--sample { default is "pretext_rerun" } \
--teloseq { default is "TTAGGG" } \
--map_order { default is "unsorted" } \
--multi_mapping { default is "0" (for no mapping)} \
--multi_mapping { default is "0" (for no filtering of multi-mapping reads)} \
--all_output <true/false> \
--outdir { OUTDIR } \
-profile <docker/singularity/{institute}>
Expand Down
33 changes: 0 additions & 33 deletions assets/schema_input.json

This file was deleted.

1 change: 0 additions & 1 deletion bin/awk_filter_reads.sh

This file was deleted.

37 changes: 37 additions & 0 deletions bin/extract_repeat.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
#!/usr/bin/env python3

"""
A rewrite of the extract_repeat.pl (PERL) script.
Original script written by: Yumi Sims (yy5)
Rewritten by: Damon-Lee Pointon (dp24)

Move through repeats file, line by line, and extract repeat information.
"""

import re
import sys

def main() -> None:
if len(sys.argv) < 2:
sys.exit("Usage: extract_repeat.py <file>")

file_path = sys.argv[1]
last = None

with open(file_path, "r", encoding="utf-8") as fh:
for line in fh:
line = line.rstrip("\n")
matched = re.match(r">(\S+)", line)
if matched:
last = matched.group(1)
continue

matched = re.match(r"(\d+)\s+-\s+(\d+)", line)
if matched:
print(f"{last}\t{matched.group(1)}\t{matched.group(2)}")
continue

sys.exit(f"Error --> {line}")

if __name__ == "__main__":
main()
101 changes: 0 additions & 101 deletions bin/generate_cram_csv.sh

This file was deleted.

10 changes: 0 additions & 10 deletions bin/grep_pg.sh

This file was deleted.

Loading
Loading