Hello,
Thanks for FastK, it is truly useful.
I'm looking at kmer coverage of contigs of a small assembly. It means that if I want to get kmer coverage/histogram for each single contig I need to create 1 fasta file per contig and apply FastK to it. It is quite involved.
I do trivial parallelisation over all files. My pipeline stop randomly as a consequence of a segfault on random contigs. Running incriminated step by itself outside of the pipeline does not allow to reproduce the issue. Restarting the pipeline from scratch does make it segfault again but not on the same files. My intuition is that there might be an issue with multiple FastK instance writing in the same temporary folder?
Here is an example of error message:
/bin/bash: line 1: 193787 Segmentation fault (core dumped) FastK -t1 -T2 seqs/folder43/contig_12.fa -P'seqs/folder43'
I am having another similar problem with random segfault. This time with Logex with
Logex -H 'result=A&.B' sample.ktab contigs.ktab
In all examples I've looked at, segfault happened when contigs.ktab was empty (well formed table with 0 kmer). Though running the same command line outside of the pipeline works without issue and indeed produce a working .hist file (even though with no kmer).
This second issue is less problematic in the sense that I can just pre-filter for empty contigs.ktab.
Additionally here are a few miscellaneous issues I encountered. I'm mostly puzzled by the first one:
- Looking at kmer in small individual genes/contigs, for sequences of size 100+bp and k=40, I get the following error message:
"FastK: Too much of the data is in reads less than the k-mer size".
If I append "NN" at the end of the sequence, I obtain expected results without failure. And some other smaller sequences do not show that issue. I joined an example.
- Fastmerge doesn't handle empty ktab. It segfaults.
- Fastq.gz are unziped in the working directory but not removed after
If it's useful, I'm working on Ubuntu 16.04.7 LTS with gcc version 9.4.0.
Best,
Seb
Hello,
Thanks for FastK, it is truly useful.
I'm looking at kmer coverage of contigs of a small assembly. It means that if I want to get kmer coverage/histogram for each single contig I need to create 1 fasta file per contig and apply FastK to it. It is quite involved.
I do trivial parallelisation over all files. My pipeline stop randomly as a consequence of a segfault on random contigs. Running incriminated step by itself outside of the pipeline does not allow to reproduce the issue. Restarting the pipeline from scratch does make it segfault again but not on the same files. My intuition is that there might be an issue with multiple FastK instance writing in the same temporary folder?
Here is an example of error message:
/bin/bash: line 1: 193787 Segmentation fault (core dumped) FastK -t1 -T2 seqs/folder43/contig_12.fa -P'seqs/folder43'I am having another similar problem with random segfault. This time with Logex with
Logex -H 'result=A&.B' sample.ktab contigs.ktabIn all examples I've looked at, segfault happened when contigs.ktab was empty (well formed table with 0 kmer). Though running the same command line outside of the pipeline works without issue and indeed produce a working .hist file (even though with no kmer).
This second issue is less problematic in the sense that I can just pre-filter for empty contigs.ktab.
Additionally here are a few miscellaneous issues I encountered. I'm mostly puzzled by the first one:
"FastK: Too much of the data is in reads less than the k-mer size".
If I append "NN" at the end of the sequence, I obtain expected results without failure. And some other smaller sequences do not show that issue. I joined an example.
If it's useful, I'm working on Ubuntu 16.04.7 LTS with gcc version 9.4.0.
Best,
Seb