Update to use bioawk by DLBPointon · Pull Request #216 · sanger-tol/nf-core-modules

DLBPointon · 2026-03-26T12:52:04Z

Input is now a fasta formatted file
BIOAWK is used to return a reverse compliment is G% is greater than 30%
Result is pulled from file and fed into downstream processes.
Implementation has not required any changes to files already produced.
also fixes strict syntax issue

TreeVal and CurationPretext will need to dump the params.telomotif into a fasta formatted file with a fake >seq header.

prototaxites · 2026-03-26T15:43:04Z

subworkflows/sanger-tol/telo_finder/main.nf

+    corrected_telomere = BIOAWK.out.output
+        .map { _meta, file ->
+            def lines = file.toFile().readLines()
+            // Lines from bioawk are:
+            // corrected_sequence  G_count  G_percentage  reversed?  original_sequence
+            lines[0].split('\t')[0]
+        }
+        .filter { it != null }
+


Should this keep meta in the output? When would it return null and what should the behaviour be in this case?

in all fairness there should never be a null. If there's a null you've forgotten to include a telomere and will have other issues.

subworkflows/sanger-tol/telo_finder/main.nf

yumisims

this looks ok, I am just wondering if the output can be in json or yaml?

DLBPointon · 2026-03-26T15:52:27Z

this looks ok, I am just wondering if the output can be in json or yaml?

Why? What benefits would that offer? Especially when it would have such little data.

subworkflows/sanger-tol/telo_finder/main.nf

prototaxites · 2026-03-26T16:03:19Z

subworkflows/sanger-tol/telo_finder/nextflow.config

+    withName: BIOAWK {
+        ext.args = { "-c fastx \'{s = toupper($seq); copy_s = s; g = gsub(/G/, \"\", s); pct = 100*g/length(copy_s); rev = (pct < 30); out = rev ? revcomp(\$seq) : \$seq; printf \"%s\t%d\t%.2f\t%s\t%s\\n\", out, g, pct, (rev ? \"true\" : \"false\"), copy_s }\'" }
+    }


One last question, very much optional, and which could have been prompted when you were doing the BIOAWK module before 😅 - would it be worth making the bioawk module more like the GAWK module and be able to take a program file? Then you could write this as a value channel in the subworkflow script?

I did think about that, it would definately clean it up. But chose the path of least resistance.

I don't know if it can take a file as input to be honest, I'll mock up a test and get back to you.

Edit: actually right in the help line -f progfile

I checked the bioawk command itself, it does also have the -f option to take an AWK program file.

Yeah looks good:

dp24@tol22-head1:[0c/80f5275761405e54eaf6864f57b83d] (telo_fix):$: bioawk -c fastx -f cli.awk telomere_motif.fasta CCTAA 2 40.00 true TTAGG

I'll open up the modules repo again

Rip it apart @prototaxites !
nf-core/modules#11060

yumisims · 2026-03-26T16:12:25Z

subworkflows/sanger-tol/telo_finder/tests/main.nf.test

    test("idFanCani4 - no split - fasta w/ index") {
        when {
            params {
+                bioawk_command = "-c fastx \'{s = toupper(\$seq); copy_s = s; g = gsub(/G/, \"\", s); pct = 100*g/length(copy_s); rev = (pct > 30); out = rev ? revcomp(\$seq) : \$seq; printf \"%s\\t%d\\t%.2f\\t%s\\t%s\\n\", out, g, pct, (rev ? \"true\" : \"false\"), copy_s }\'"


The benefit is that json could be reused in other workflows, but thinking about it, this should already be sufficient.

DLBPointon and others added 3 commits March 26, 2026 12:48

Update to use bioawk

e742ac4

Spelling mistake in file path

3b42e6f

Merge branch 'main' into telo_fix

9643ab3

prototaxites reviewed Mar 26, 2026

View reviewed changes

yumisims reviewed Mar 26, 2026

View reviewed changes

prototaxites reviewed Mar 26, 2026

View reviewed changes

subworkflows/sanger-tol/telo_finder/main.nf Show resolved Hide resolved

prototaxites reviewed Mar 26, 2026

View reviewed changes

yumisims reviewed Mar 26, 2026

View reviewed changes

DLBPointon mentioned this pull request Mar 26, 2026

Bioawk update to move closer to Gawk nf-core/modules#11060

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update to use bioawk#216

Update to use bioawk#216
DLBPointon wants to merge 3 commits intomainfrom
telo_fix

DLBPointon commented Mar 26, 2026 •

edited

Loading

Uh oh!

prototaxites Mar 26, 2026

Uh oh!

DLBPointon Mar 26, 2026

Uh oh!

Uh oh!

yumisims left a comment •

edited

Loading

Uh oh!

DLBPointon commented Mar 26, 2026

Uh oh!

Uh oh!

prototaxites Mar 26, 2026

Uh oh!

DLBPointon Mar 26, 2026 •

edited

Loading

Uh oh!

prototaxites Mar 26, 2026

Uh oh!

DLBPointon Mar 26, 2026

Uh oh!

DLBPointon Mar 26, 2026 •

edited

Loading

Uh oh!

yumisims Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

DLBPointon commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

prototaxites Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

DLBPointon Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yumisims left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DLBPointon commented Mar 26, 2026

Uh oh!

Uh oh!

prototaxites Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

DLBPointon Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

prototaxites Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

DLBPointon Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

DLBPointon Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yumisims Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

DLBPointon commented Mar 26, 2026 •

edited

Loading

yumisims left a comment •

edited

Loading

DLBPointon Mar 26, 2026 •

edited

Loading

DLBPointon Mar 26, 2026 •

edited

Loading