Conversation
| corrected_telomere = BIOAWK.out.output | ||
| .map { _meta, file -> | ||
| def lines = file.toFile().readLines() | ||
| // Lines from bioawk are: | ||
| // corrected_sequence G_count G_percentage reversed? original_sequence | ||
| lines[0].split('\t')[0] | ||
| } | ||
| .filter { it != null } | ||
|
|
There was a problem hiding this comment.
Should this keep meta in the output? When would it return null and what should the behaviour be in this case?
There was a problem hiding this comment.
in all fairness there should never be a null. If there's a null you've forgotten to include a telomere and will have other issues.
Why? What benefits would that offer? Especially when it would have such little data. |
| withName: BIOAWK { | ||
| ext.args = { "-c fastx \'{s = toupper($seq); copy_s = s; g = gsub(/G/, \"\", s); pct = 100*g/length(copy_s); rev = (pct < 30); out = rev ? revcomp(\$seq) : \$seq; printf \"%s\t%d\t%.2f\t%s\t%s\\n\", out, g, pct, (rev ? \"true\" : \"false\"), copy_s }\'" } | ||
| } |
There was a problem hiding this comment.
One last question, very much optional, and which could have been prompted when you were doing the BIOAWK module before 😅 - would it be worth making the bioawk module more like the GAWK module and be able to take a program file? Then you could write this as a value channel in the subworkflow script?
There was a problem hiding this comment.
I did think about that, it would definately clean it up. But chose the path of least resistance.
I don't know if it can take a file as input to be honest, I'll mock up a test and get back to you.
Edit: actually right in the help line -f progfile
There was a problem hiding this comment.
I checked the bioawk command itself, it does also have the -f option to take an AWK program file.
There was a problem hiding this comment.
Yeah looks good:
dp24@tol22-head1:[0c/80f5275761405e54eaf6864f57b83d] (telo_fix):$: bioawk -c fastx -f cli.awk telomere_motif.fasta
CCTAA 2 40.00 true TTAGG
I'll open up the modules repo again
There was a problem hiding this comment.
Rip it apart @prototaxites !
nf-core/modules#11060
| test("idFanCani4 - no split - fasta w/ index") { | ||
| when { | ||
| params { | ||
| bioawk_command = "-c fastx \'{s = toupper(\$seq); copy_s = s; g = gsub(/G/, \"\", s); pct = 100*g/length(copy_s); rev = (pct > 30); out = rev ? revcomp(\$seq) : \$seq; printf \"%s\\t%d\\t%.2f\\t%s\\t%s\\n\", out, g, pct, (rev ? \"true\" : \"false\"), copy_s }\'" |
There was a problem hiding this comment.
The benefit is that json could be reused in other workflows, but thinking about it, this should already be sufficient.
Input is now a fasta formatted file
BIOAWK is used to return a reverse compliment is G% is greater than 30%
Result is pulled from file and fed into downstream processes.
Implementation has not required any changes to files already produced.
also fixes strict syntax issue
TreeVal and CurationPretext will need to dump the params.telomotif into a fasta formatted file with a fake
>seqheader.