Skip to content

Commit b005e19

Browse files
authored
Merge pull request #111 from sanger-tol/split_telo
Split telo
2 parents 863135e + f4b52e7 commit b005e19

File tree

37 files changed

+872
-503
lines changed

37 files changed

+872
-503
lines changed

.github/workflows/ci.yml

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -67,11 +67,6 @@ jobs:
6767
mkdir -p $NXF_SINGULARITY_CACHEDIR
6868
mkdir -p $NXF_SINGULARITY_LIBRARYDIR
6969
70-
- name: Download test data
71-
# Download A fungal test data set that is full enough to show some real output.
72-
run: |
73-
curl https://tolit.cog.sanger.ac.uk/test-data/resources/treeval/TreeValTinyData.tar.gz | tar xzf -
74-
7570
- name: Install nf-test
7671
uses: nf-core/setup-nf-test@v1
7772

.nf-core.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,4 +48,4 @@ template:
4848
- seqera_platform
4949
- multiqc
5050
- rocrate
51-
version: 1.4.2
51+
version: 1.5.0

CHANGELOG.md

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,51 @@
33
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
44
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
55

6+
## [[1.5.0](https://github.com/sanger-tol/curationpretext/releases/tag/1.5.0)] - UNSC Punic - [2025-08-04]
7+
8+
### Added and Fixed
9+
10+
- Template update to 3.3.3. <TODO in next PR>.
11+
- Addition of the `--split_telomere` boolean flag, this is false by default.
12+
- When `true` the pipeline will split the telomere file into a 5 and 3 prime file.
13+
- Update `ACCESSORY_FILES` subworkflow:
14+
- Remove `GET_LARGEST_SCAFFOLD` as we no longer need it, this was needed for TABIX so that the correct index file was used. This was used by the `TELO_FINDER` and `GAP_FINDER` subworkflows.
15+
- Update `TELO_FINDER` subworkflow:
16+
- Remove `GAWK_MAP_TELO` as it is no longer needed.
17+
- Remove `GAWK_CLEAN_TELOMERE` as it is no longer needed. The reason for its inclusion has been fixed.
18+
- Update `EXTRACT_TELO` to `EXTRACT_TELOMERE` which also removed the use of the `cat {file} | awk` pattern, replacing it with just `awk`. This was supposed to happen in `1.4.0`, but was forgotten with the files lying dormant in the repo.
19+
- Refactor of the `TELO_FINDER` subworkflow, introducing the `TELO_EXTRACTION` subworkflow which is run per telo file. With the introduction of `split_telomere` this can be 3 files.
20+
- Update `LONGREAD_COVERAGE` subworkflow:
21+
- Remove `GRAPH_OVERALL_COVERAGE` as it is not in use.
22+
- Better formatting in some files.
23+
- Moved `GAWK_UPPER_SEQUENCE` from the `TELO_FINDER` subworkflow to the first step of the main `curationpretext` workflow, this simply makes more sense.
24+
- Removed no longer needed scripts from bin.
25+
- Added the module `GAWK_SPLIT_DIRECTIONS` module, a local copy of the nf-core `GAWK` module.
26+
- Added the `gawk_split_directions.awk` script for split telomere.
27+
- Addition of GUNZIP for the input reference genome.
28+
- Update tests.
29+
30+
### Paramters
31+
32+
| Old Version | New Versions |
33+
| ----------- | ---------------- |
34+
| NA | --split_telomere |
35+
36+
### Software Dependencies
37+
38+
Note, since the pipeline is using Nextflow DSL2, each process will be run with its own Biocontainer. This means that on occasion it is entirely possible for the pipeline to be using different versions of the same tool. However, the overall software dependency changes compared to the last release have been listed below for reference.
39+
40+
| Module | Old Version | New Versions |
41+
| ------------------------ | ------------- | ------------- |
42+
| `GRAPH_OVERALL_COVERAGE` | perl=5.26.2 | REMOVED |
43+
| `EXTRACT_TELO` | coreutils=9.1 | REMOVED |
44+
| `EXTRACT_TELOMERE` | NA | coreutils=9.1 |
45+
| `GAWK_CLEAN_TELOMERE` | 5.3.0 | REMOVED |
46+
| `GAWK_MAP_TELO` | 5.3.0 | REMOVED |
47+
| `GET_LARGEST_SCAFF` | coreutils=9.1 | REMOVED |
48+
| `GUNZIP` | NA | 1.13 |
49+
| `GAWK_SPLIT_DIRECTIONS` | NA | 5.3.0 |
50+
651
## [[1.4.2](https://github.com/sanger-tol/curationpretext/releases/tag/1.4.2)] - UNSC Nereid (H2) - [2025-07-28]
752

853
### Added and Fixed

CITATION.cff

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,6 @@ identifiers:
3030
value: 10.5281/zenodo.12773958
3131
repository-code: "https://github.com/sanger-tol/curationpretext"
3232
license: MIT
33-
version: 1.4.2
34-
date-released: "2025-07-28"
33+
version: 1.5.0
34+
date-released: "2025-08-04"
3535
url: "https://pipelines.tol.sanger.ac.uk/curationpretext"

bin/findHalfcoverage.py

Lines changed: 0 additions & 177 deletions
This file was deleted.

bin/gawk_split_directions.awk

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
## Split telomere file based on column 4 contents
2+
## Date: 03/07/2025
3+
4+
BEGIN {
5+
FS="\t"; OFS="\t"
6+
} {
7+
print > "direction."$3".telomere"
8+
}

bin/get_avgcov.sh

Lines changed: 0 additions & 17 deletions
This file was deleted.

bin/graph_overall_coverage.pl

Lines changed: 0 additions & 34 deletions
This file was deleted.

bin/longread_cov_log.py

Lines changed: 0 additions & 43 deletions
This file was deleted.

conf/modules.config

Lines changed: 15 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -17,9 +17,18 @@ process {
1717
//
1818
withName: 'PRETEXT_INGEST_SNDRD|PRETEXT_INGEST_HIRES' {
1919
publishDir = [
20-
path: { "${params.outdir}/pretext_maps_processed" },
21-
mode: params.publish_dir_mode,
22-
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
20+
[
21+
path: { "${params.outdir}/pretext_maps_processed" },
22+
pattern: "*normal.pretext",
23+
mode: params.publish_dir_mode,
24+
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
25+
],
26+
[
27+
path: { "${params.outdir}/pretext_maps_processed" },
28+
pattern: "*hr.pretext",
29+
mode: params.publish_dir_mode,
30+
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
31+
],
2332
]
2433
}
2534

@@ -85,10 +94,9 @@ process {
8594
ext.suffix = 'fasta'
8695
}
8796

88-
withName: 'GAWK_CLEAN_TELOMERE' {
89-
ext.args2 = "'/^>/'"
90-
ext.prefix = { "${meta.id}_CLEAN" }
91-
ext.suffix = 'telomere'
97+
withName: 'GAWK_SPLIT_DIRECTIONS' {
98+
ext.prefix = { "${input}_telo" }
99+
ext.suffix = 'telomere'
92100
}
93101

94102
//

0 commit comments

Comments
 (0)