Dear Gene,
I have a large sequencing library that I needed to split into 10 smaller files so I could run FastK on different nodes. Following the instructions in the README, I ran FastK on the split files with the following command:
for file in library_split_*; do mkdir tmp.${file}; FastK -v -t5 -k31 -M50 -T24 -Ptmp.${file} $file; done
This produced a *.hist and a *.ktab file for each *.split.fastq file. I looked at the k-mer count histogram for each split file:
Histex -G library_split_01.hist > library_split_01.histogram
$ head library_split_01.histogram
1 6062202409
2 3370987439
3 1728287765
4 894614808
5 482057568
I then merged the split files using Fastmerge, and generated histograms for the merged k-mer database:
Fastmerge -T12 -t -h library_fastmerged library_split_*ktab
Histex -G library_fastmerged.hist > library_fastmerged.histogram
$ head library_fastmerged.histogram
4 2032698049
5 522131235
6 134785342
7 33514609
8 420971175
I noticed that there are no k-mers with a count lower than 4 in the merged library histogram. I repeated the process a few times, combining different files, and the merged histograms consistently lack smaller k-mer counts (i.e., they start at 4 or 5). I’m unsure if this behavior is expected, as I do not understand why there are no single-occurrence k-mers. Is this a bug, or am I misunderstanding or misusing the tool?
Thanks for your assistance!
Dear Gene,
I have a large sequencing library that I needed to split into 10 smaller files so I could run
FastKon different nodes. Following the instructions in the README, I ranFastKon the split files with the following command:This produced a
*.histand a*.ktabfile for each*.split.fastqfile. I looked at the k-mer count histogram for each split file:Histex -G library_split_01.hist > library_split_01.histogramI then merged the split files using
Fastmerge, and generated histograms for the merged k-mer database:I noticed that there are no k-mers with a count lower than 4 in the merged library histogram. I repeated the process a few times, combining different files, and the merged histograms consistently lack smaller k-mer counts (i.e., they start at 4 or 5). I’m unsure if this behavior is expected, as I do not understand why there are no single-occurrence k-mers. Is this a bug, or am I misunderstanding or misusing the tool?
Thanks for your assistance!