Skip to content

Update to use bioawk#216

Open
DLBPointon wants to merge 3 commits intomainfrom
telo_fix
Open

Update to use bioawk#216
DLBPointon wants to merge 3 commits intomainfrom
telo_fix

Conversation

@DLBPointon
Copy link
Copy Markdown
Contributor

@DLBPointon DLBPointon commented Mar 26, 2026

Input is now a fasta formatted file
BIOAWK is used to return a reverse compliment is G% is greater than 30%
Result is pulled from file and fed into downstream processes.
Implementation has not required any changes to files already produced.
also fixes strict syntax issue

TreeVal and CurationPretext will need to dump the params.telomotif into a fasta formatted file with a fake >seq header.

Comment on lines +36 to +44
corrected_telomere = BIOAWK.out.output
.map { _meta, file ->
def lines = file.toFile().readLines()
// Lines from bioawk are:
// corrected_sequence G_count G_percentage reversed? original_sequence
lines[0].split('\t')[0]
}
.filter { it != null }

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this keep meta in the output? When would it return null and what should the behaviour be in this case?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in all fairness there should never be a null. If there's a null you've forgotten to include a telomere and will have other issues.

Copy link
Copy Markdown
Contributor

@yumisims yumisims left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks ok, I am just wondering if the output can be in json or yaml?

@DLBPointon
Copy link
Copy Markdown
Contributor Author

this looks ok, I am just wondering if the output can be in json or yaml?

Why? What benefits would that offer? Especially when it would have such little data.

Comment on lines +5 to +7
withName: BIOAWK {
ext.args = { "-c fastx \'{s = toupper($seq); copy_s = s; g = gsub(/G/, \"\", s); pct = 100*g/length(copy_s); rev = (pct < 30); out = rev ? revcomp(\$seq) : \$seq; printf \"%s\t%d\t%.2f\t%s\t%s\\n\", out, g, pct, (rev ? \"true\" : \"false\"), copy_s }\'" }
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last question, very much optional, and which could have been prompted when you were doing the BIOAWK module before 😅 - would it be worth making the bioawk module more like the GAWK module and be able to take a program file? Then you could write this as a value channel in the subworkflow script?

Copy link
Copy Markdown
Contributor Author

@DLBPointon DLBPointon Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did think about that, it would definately clean it up. But chose the path of least resistance.

I don't know if it can take a file as input to be honest, I'll mock up a test and get back to you.

Edit: actually right in the help line -f progfile

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the bioawk command itself, it does also have the -f option to take an AWK program file.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah looks good:

dp24@tol22-head1:[0c/80f5275761405e54eaf6864f57b83d] (telo_fix):$: bioawk -c fastx -f cli.awk telomere_motif.fasta

CCTAA	2	40.00	true	TTAGG

I'll open up the modules repo again

Copy link
Copy Markdown
Contributor Author

@DLBPointon DLBPointon Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test("idFanCani4 - no split - fasta w/ index") {
when {
params {
bioawk_command = "-c fastx \'{s = toupper(\$seq); copy_s = s; g = gsub(/G/, \"\", s); pct = 100*g/length(copy_s); rev = (pct > 30); out = rev ? revcomp(\$seq) : \$seq; printf \"%s\\t%d\\t%.2f\\t%s\\t%s\\n\", out, g, pct, (rev ? \"true\" : \"false\"), copy_s }\'"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The benefit is that json could be reused in other workflows, but thinking about it, this should already be sufficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants